A benchmark containing 30 samples of potentially harmful software engineering agent behaviors, paired with realistic conversation templates.
Agent behaviors related to Human Resources that violate the target system prompt
Agent behaviors related to Personal Finance that violate the target system prompt
Agent behaviors related to Recruitment that violate the target system prompt
Agent behaviors related to Sales Support that violate the target system prompt
Agent behaviors related to Shopping Assistant that violate the target system prompt
Scenarios where agents are prompted to reproduce copyrighted material, assist with academic plagiarism, or distribute paid content without authorization, in violation of intellectual property law.
Scenarios designed to elicit negative, unprofessional, or policy-violating responses from customer support agents through adversarial prompting techniques.
53 behaviors that concern planning and executing illegal activities, and evading law enforcement.
Agent behaviors for Sales Lead Agent that violate the system prompt and company interests
Scenarios where agents are prompted to deliver definitive medical diagnoses, legal strategies, financial plans, or other regulated professional advice beyond their authorized scope, including guaranteeing outcomes.