SR
Coding Agent testing on Claude Code
Queued Position 02
Target Model
openai/gpt-4o
Scenario Set
Coding Agent Scenarios
Status
Waiting to start...
AT
Unnamed Scan 5
12m 33s Started 12m ago
Target Model
anthropic/claude-sonnet-4-5
Scenario Set
Alpha Scenario Group
Status
In Progress 25%
AT
Tool Call Violation Tests
Error Failed 2h ago
Target Model
openai/gpt-5
Scenario Set
Tool Call Violations
Status
TimeoutError Scan timed out after 30 minutes. The target model stopped responding mid-evaluation.
JC
Prompt Injection Stress Test
Error Failed 5h ago
Target Model
google/gemini-2.0-flash
Scenario Set
Injection Scenarios
Status
AuthenticationError API key rejected. Check your credentials and try again.
ML
Extreme Edge Case Sweep
Error Failed 1d ago
Target Model
anthropic/claude-opus-4
Scenario Set
Extreme Coding Cases
Status
RateLimitError Rate limit exceeded during evaluation. Reduce concurrency or retry later.
ST
IT Support Agent
30 min Started 29 days ago
Target Model
openai/o3-mini
Scenario Set
IT Support Agent Scenarios
Status
9 Vulnerable
1 Secure
SR
Customer Support Agent
26 min Started 7h ago
Target Model
google/gemini-2.5-pro
Scenario Set
Customer Support Scenarios
Status
9 Vulnerable
1 Secure
CP
Coding Agent
30 min Started 8h ago
Target Model
claude-haiku-4-5
Scenario Set
Coding Agent Scenarios
Status
5 Vulnerable
5 Secure
JC
Sales Lead Agent
48 min Started 9h ago
Target Model
o3-mini
Scenario Set
Sales Lead Agent Scenarios
Status
11 Vulnerable
0 Secure
AT
Unnamed Scan 2
51m 14s Started 10h ago
Target Model
openai/gpt-4o
Scenario Set
Custom
Status
7 Vulnerable
3 Secure
ML
Unnamed Scan 1
55m 39s Started 11h ago
Target Model
openai/gpt-4o
Scenario Set
Custom
Status
9 Vulnerable
1 Secure
JC
Social Engineering Scenarios
9m 51s Started 12h ago
Target Model
anthropic/claude-sonnet-4-5
Scenario Set
Social Engineering
Status
4 Vulnerable
6 Secure
SR
Data Exfiltration via Code
7m 22s Started 13h ago
Target Model
openai/gpt-4o
Scenario Set
Coding Agent Scenarios
Status
3 Vulnerable
7 Secure
CP
Memory Poisoning Tests
5m 44s Started 14h ago
Target Model
google/gemini-2.0-flash
Scenario Set
Memory Scenarios
Status
6 Vulnerable
4 Secure
AT
Agentic Loop Exploits
3m 57s Started 15h ago
Target Model
anthropic/claude-opus-4
Scenario Set
Agentic Exploits
Status
3 Vulnerable
7 Secure