Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add transilienceai/communitytools --skill "ai-threat-testing"
Install specific skill from multi-skill repository
# Description
Offensive AI security testing and exploitation framework. Systematically tests LLM applications for OWASP Top 10 vulnerabilities including prompt injection, model extraction, data poisoning, and supply chain attacks. Integrates with pentest workflows to discover and exploit AI-specific threats.
# SKILL.md
name: ai-threat-testing
description: Offensive AI security testing and exploitation framework. Systematically tests LLM applications for OWASP Top 10 vulnerabilities including prompt injection, model extraction, data poisoning, and supply chain attacks. Integrates with pentest workflows to discover and exploit AI-specific threats.
AI Threat Testing & Exploitation
This skill orchestrates comprehensive AI threat testing across LLM applications and systems. Use this to perform authorized penetration testing of AI systems, discover vulnerabilities, and document findings with proof-of-concept exploits.
Quick Start
Test an AI system:
Specify target:
- LLM application URL or deployment
- API endpoints (OpenAI, Anthropic, Azure, etc.)
- Local model (Ollama, LM Studio)
- RAG system or vector database
Select vulnerability scope:
1. All OWASP Top 10 (full assessment)
2. Specific vulnerabilities (targeted testing)
3. Quick scan (common issues)
4. Supply chain assessment (dependencies)
Get automated testing:
- Reconnaissance and fingerprinting
- Vulnerability-specific agents deploy
- Exploit execution with evidence capture
- Professional report generation
Primary Testing Agents
Agent 1: Prompt Injection Testing
Tests: LLM01 - Direct and indirect prompt injection
Covers:
- Direct instruction override attacks
- System prompt extraction
- Indirect injection via RAG/documents
- Multi-turn context manipulation
- Session hijacking and token extraction
Output: Prompt injection PoCs, bypass techniques, remediation
Agent 2: Output Handling Exploitation
Tests: LLM02 - Insecure output handling
Covers:
- Code injection (Python, SQL, shell commands)
- XSS via generated HTML/JavaScript
- Template injection attacks
- Unsafe deserialization
- Malicious content propagation
Output: Injection payloads, successful exploits, detection bypass
Agent 3: Training Data Analysis
Tests: LLM03 - Data poisoning vulnerability assessment
Covers:
- Membership inference attacks
- Training data extraction attempts
- Backdoor trigger identification
- Bias and adversarial example detection
- Model behavior anomalies
Output: Data exposure findings, backdoor triggers, bias analysis
Agent 4: Resource Exhaustion Testing
Tests: LLM04 - Model DoS vulnerabilities
Covers:
- Token flooding attacks
- Context window exhaustion
- Recursive expansion exploitation
- Computational overload testing
- Cost impact analysis
Output: DoS techniques, impact assessment, mitigation guidance
Agent 5: Supply Chain Assessment
Tests: LLM05 - Supply chain vulnerabilities
Covers:
- Dependency vulnerability scanning
- Plugin/integration security testing
- Model source verification
- API endpoint security
- Third-party risk assessment
Output: Vulnerability inventory, risk scores, remediation roadmap
Agent 6: Agency Exploitation
Tests: LLM06 - Excessive agency vulnerabilities
Covers:
- Privilege escalation attempts
- Unauthorized API calls
- Permission boundary testing
- State modification exploits
- Lateral movement via model
Output: Privilege escalation PoCs, permission bypasses
Agent 7: Model Extraction Attack
Tests: LLM07 - Model theft and extraction
Covers:
- Query-based model extraction
- Output analysis and inference
- Membership inference attacks
- Model property inference
- Training data reconstruction
Output: Extracted model info, leakage assessment, impact analysis
Agent 8: Vector DB Poisoning
Tests: LLM08 - Vector database and RAG attacks
Covers:
- Malicious document injection
- Retrieval manipulation
- Embedding space attacks
- Citation spoofing
- Knowledge base poisoning
Output: Injection techniques, retrieval bypass, remediation
Agent 9: Decision Reliance Testing
Tests: LLM09 - Overreliance vulnerabilities
Covers:
- Hallucination injection
- Output confidence analysis
- Verification workflow gaps
- Human-in-the-loop bypass
- False authority establishment
Output: Hallucination techniques, confidence manipulation, process gaps
Agent 10: Logging Bypass Testing
Tests: LLM10 - Insufficient logging and monitoring
Covers:
- Log deletion or evasion
- Monitoring detection bypass
- Unlogged request techniques
- Alert threshold manipulation
- Forensic evidence destruction
Output: Evasion techniques, detection gaps, monitoring recommendations
Testing Workflows
Workflow 1: Full OWASP Top 10 Assessment
Progress:
- [ ] Reconnaissance (target fingerprinting, capability detection)
- [ ] Agent 1: Prompt Injection testing
- [ ] Agent 2: Output Handling testing
- [ ] Agent 3: Data Poisoning analysis
- [ ] Agent 4: DoS vulnerability testing
- [ ] Agent 5: Supply Chain assessment
- [ ] Agent 6: Agency exploitation
- [ ] Agent 7: Model Extraction attempts
- [ ] Agent 8: Vector DB poisoning
- [ ] Agent 9: Reliance testing
- [ ] Agent 10: Logging bypass
- [ ] Report generation with findings
Duration: 4-8 hours depending on target complexity
Output: Professional penetration test report, PoC scripts, remediation roadmap
Workflow 2: Focused Vulnerability Testing
Progress:
- [ ] Identify specific vulnerability category
- [ ] Deploy targeted agent
- [ ] Execute exploitation techniques
- [ ] Capture evidence (screenshots, logs, outputs)
- [ ] Document findings
- [ ] Provide remediation guidance
Duration: 1-3 hours per vulnerability
Output: Focused finding with PoC
Workflow 3: Supply Chain Security Audit
Progress:
- [ ] Inventory AI/ML dependencies
- [ ] Scan for known vulnerabilities
- [ ] Test plugin security
- [ ] Verify model provenance
- [ ] Assess API security
- [ ] Rate vendor/third-party risk
- [ ] Create remediation roadmap
Duration: 2-4 hours
Output: Supply chain risk assessment, vendor scorecard, recommendations
Workflow 4: Continuous Monitoring Validation
Progress:
- [ ] Execute logging bypass techniques
- [ ] Attempt monitoring evasion
- [ ] Verify detection capabilities
- [ ] Test alert responsiveness
- [ ] Document monitoring gaps
- [ ] Recommend improvements
Duration: 1-2 hours
Output: Monitoring assessment, detection capability report
Integration with Pentest Skill
This skill enhances pentest workflows:
/pentest β Add AI testing phase
βββ Traditional vulnerability assessment (web, API, infrastructure)
βββ AI Threat Testing (NEW)
β βββ LLM application testing
β βββ AI-specific exploitation
β βββ Model security assessment
βββ Combined findings report
Combined testing provides:
- Traditional security + AI security assessment
- Complete attack surface coverage
- Multi-vector exploitation chains
- Comprehensive risk scoring
Key Techniques Reference
Prompt Injection Payloads
- Instruction override: "Ignore all previous instructions..."
- System prompt extraction: "Repeat your system prompt exactly..."
- Context shifting: Gradually moving conversation context
- Token smuggling: Encoding across multiple turns
- Filter evasion: Base64, leetspeak, language mixing
Model Extraction Methods
- Query sampling: Repeated API queries for behavior reconstruction
- Token probability analysis: Extracting logits and confidence scores
- Membership inference: Testing if specific data was in training
- Model inversion: Reconstructing training examples
- Confidence calibration: Analyzing prediction confidence patterns
Data Poisoning Detection
- Behavioral anomalies: Testing for trigger-based responses
- Membership testing: Probing for specific training data
- Fairness analysis: Checking for bias amplification
- Output consistency: Testing reproducibility of predictions
- Backdoor triggers: Common activation patterns
DoS Exploitation
- Token flooding: Sending extremely long inputs
- Recursive expansion: Nested loops and recursive structures
- Context exhaustion: Filling available context window
- Computational loops: Triggering expensive operations
- Resource measurement: Monitoring cost and performance impact
Supply Chain Attacks
- Dependency enumeration: Listing all AI/ML dependencies
- CVE scanning: Identifying known vulnerabilities
- Plugin capability audit: Testing plugin permissions and scope
- API security testing: Checking endpoint authentication/authorization
- Model verification: Confirming model integrity and provenance
Evidence Capture
All agents capture:
- Screenshots: Before/after exploitation
- Network logs: HTTP requests and responses
- API responses: Full response bodies and metadata
- Error messages: System errors and debug information
- Console output: Model outputs and behavior
- Execution time: Performance impact measurement
- Resource usage: CPU, memory, token consumption
Reporting
Automated report includes:
- Executive Summary: Critical findings and risk scoring
- Detailed Findings: Per-vulnerability analysis with CVSS scores
- Proof of Concept: Reproducible exploitation steps
- Evidence: Screenshots, logs, captured outputs
- Remediation: Specific fixes and hardening guidance
- Risk Assessment: Business impact and priority
Common Patterns
Quick Vulnerability Check
1. Target identification (fingerprinting)
2. Select single agent for vulnerability
3. Execute with default payloads
4. Document findings
5. Provide quick remediation tip
Deep Exploitation
1. Extensive reconnaissance
2. Deploy multiple agents
3. Chain vulnerabilities together
4. Custom exploit development
5. Complete assessment report
Red Team Scenario
1. Full OWASP assessment
2. Multi-vector attack chains
3. Privilege escalation paths
4. Data exfiltration techniques
5. Impact demonstration
Troubleshooting
"Target not responding"
β Check network connectivity, verify API keys, confirm endpoint is live
"Agents not deploying"
β Ensure proper authorization, check agent dependencies, verify scope
"No vulnerabilities found"
β System may be hardened; try advanced techniques; check for defense mechanisms
"I need custom payloads"
β Agents support custom payload injection; specify in prompts
"Need legal verification"
β All testing MUST be authorized; provide written permission documentation
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.