top of page
red-teaming.jpg

Red Teaming Challenges

NIST Red Teaming Exercise

September 9, 2024

The ARIA Program at NIST published a Program Evaluation Design Document that highlights the methodology underlying this pilot exercise. This document provides context on on NIST evaluation-driven research, offers motivating factors for the ARIA program, and details ARIA's experimentation environment.`


Details and requirements on this red teaming effort that took place August-October 2024 are below.


Participation requirements Individuals seeking to red team:


Virtual qualifier: To participate, interested red teamers enrolled in the qualifying event, a NIST ARIA (Assessing Risks and Impacts of AI) pilot exercise. In the ARIA pilot, red teaming participants identified as many violative outcomes as possible using predefined test scenarios as part of stress tests of model guardrails and safety mechanisms. This virtual qualifier drew participants from anyone residing in the US. For more details on ARIA and related scenarios, see here.


Red teaming participants who pass the ARIA pilot qualifying event were able to take part in an in-person red teaming exercise held during CAMLIS (October 24-26).


In-person event: The second phase of the challenge, an in-person exercise, included a hosted red team evaluation using office productivity software that employs generative AI models. The exercise used the “Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (NIST AI 600-1),” as the operative rubric for violative outcomes and controls.

During testing, red teamers engaged in adversarial interactions with developer-submitted applications on a turn-by-turn basis. An analysis aggregated the scores for a final report. The challenge was a capture-the-flag (CTF)-style points-based evaluation, verified by in-person event assessors.

We tested these company models:

  • Anote is an end to end MLOps platform that enables you to obtain the best large language model for your data. Anote provides an evaluation framework to compare zero shot LLMs like GPT, Claude, Llama3 and Mistral, with fine tuned LLMs that are trained on your domain specific training data. To enable this, Anote has a data annotation interface to convert raw unstructured data into an LLM ready format, and incorporate subject matter expertise into your training process to improve model accuracies. End users can route the best LLM into their own, on premise, AI-powered private chatbot that is able to answer questions from documents with more accuracy than a generalized LLM.

    • The data-centric approach allows users to intentionally label some training data to evaluate its impact on model outputs. This can be done either by downloading the labeled data and running it through external LLM providers such as Meta's Llama, or by using the platform's training feature to create different fine-tuned model versions (via supervised, unsupervised and RLHF fine tuning) with varying numbers of labels (e.g., 50, 100, 150). The platform provides a comprehensive evaluation framework where you can assess the effects of training data using fixed benchmark test datasets and a variety of evaluation metrics.

  • Meta (formerly the Facebook company) builds technologies that help people connect, find communities and grow businesses.

    •  Llama3.2 90B (without guardrails) is of the two largest models of the Llama 3.2 collection (alongside 11B), and can support image reasoning use cases, such as document-level understanding including charts and graphs, captioning of images, and visual grounding tasks such as directionally pinpointing objects in images based on natural language descriptions. For example, a person could ask a question about which month in the previous year their small business had the best sales, and Llama 3.2 can then reason based on an available graph and quickly provide the answer. In another example, the model could reason with a map and help answer questions such as when a hike might become steeper or the distance of a particular trail marked on the map. The 11B and 90B models can also bridge the gap between vision and language by extracting details from an image, understanding the scene, and then crafting a sentence or two that could be used as an image caption to help tell the story

  • Robust Intelligence is an AI Security startup, whose end-to-end platform enables enterprise customers like JP Morgan Chase, Expedia, Intuit, IBM and more to deploy machine learning models with confidence. The Robust Intelligence platform combines AI Validation, an automated pen-testing framework for LLMs, and AI Firewall, a real-time low latency guardrail that flags unsafe or malicious content such as prompt injections and toxic language in model inputs and outputs. Co-founded by Harvard Computer Science Professor Yaron Singer and his student Kojin Oshiba in 2019 after years of robust machine learning research, Robust Intelligence had raised over $60 million from VCs such as Sequoia, Tiger Capital before it was recently acquired by Cisco.

    • Robust Intelligence's model system is powered by a customized open source pretrained LLM and protected with proprietary AI Firewall technology. The application exposes a chatbot interface, allowing users to ask queries such as coding assistance, editing drafts and generating memos. The addition of the Firewall ensures protection against PII leakage, prompt injections and misaligned model responses, creating a more trustworthy productivity assistant experience.

  • Synthesia, the world’s leading enterprise AI video communications platform. Over 1 million users across 55,000 businesses, including more than 60% of the Fortune 100, use it to communicate efficiently and share knowledge at scale using AI avatars. Founded in 2017, Synthesia is headquartered in London and makes video creation, collaboration and sharing easy for everyone.

    •  Expressive Avatars are the fourth generation of Synthesia’s AI avatars and are powered by our EXPRESS-1 model for realistic avatar performance. EXPRESS-1 uses large, pre-trained models as a backbone to drive the performance for Expressive Avatars, combined with diffusion to model complex multimodal distributions. EXPRESS-1 predicts every movement and facial expression in real-time, aligning seamlessly with the timings, intonations, and emphasis of spoken language. This results in performances that are natural and human-like.


Our participants had an unprecedented opportunity to collaborate with industry and government partners to better understand the potential positive and negative uses of AI models, in addition to leveraging this technology to mitigate negative outcomes.

Who Participated: Applications from individuals with diverse expertise were encouraged, including but not limited to:

  • AI researchers and practitioners

  • Cybersecurity professionals

  • Data scientists

  • Ethicists and legal professionals

  • Software engineers

  • Policymakers and regulators


Participation requirements for companies donating their models: Model owners interested in participating in the in-person red teaming event were required to meet the following criteria:

 

  1. The model or product must utilize Generative AI technology.

  2. The model or product must be designed for workplace productivity. This is broadly defined as: any technology enabling communication, coding, process automation, or any reasonably expected activity in a technology-enabled workplace that utilizes popular inter-office software such as: chat, email, code repositories, and shared drives.

  3. The model or product owner must be willing to have their model tested for both positive and negative impacts related to: vulnerability discovery, including program verification tools, automated code repair tools, fuzzing or other dynamic vulnerability discovery tools, and adversarial machine learning tools or toolkits.

  4. Optionally provide blue team support.

 

​Full Event Details:


Dates:

  • August 20, 2024: Application opened.

  • September 9, 2024, 11:59 PM ET: Application for participants closed.

  • September 23, 2024: Pilot launched US-wide.

  • October 4, 2024: Pilot closed.

  • October 11, 2024: Those selected for the in person event announced and notified.

  • October 24-25, 2024: CAMLIS in-person event.


Objectives:

This event demonstrated:

  • A test of the potential positive and negative uses of AI models, as well as a method of leveraging positive use cases to mitigate negative.

  • Use of NIST AI 600-1 to explore GAI risks and suggested actions as an approach for establishing GAI safety and security controls.

Participation Benefits:

  • Contribute to the advancement of secure and ethical AI.

  • Network with leading experts in AI and cybersecurity, including in U.S. government agencies.

  • Gain insights into cutting-edge AI vulnerabilities and defenses.

  • Participants in the qualifying red teaming event may be invited to participate in CAMLIS, scheduled for October 24-25, 2024 in Arlington, VA. All expenses for travel, food and lodging during this time were covered.​​

https://anote.ai/

stay-in-touch.jpg

Stay in touch!

Sign up to stay up to date on upcoming challenges, events and to receive our newsletter.
humane intelligence green-withbrand-mark bg.png

Support our work.

We welcome event sponsorships and donations.

bottom of page