transilienceai

ai-threat-testing

18
3
# Install this skill:
npx skills add transilienceai/communitytools --skill "ai-threat-testing"

Install specific skill from multi-skill repository

# Description

Offensive AI security testing and exploitation framework. Systematically tests LLM applications for OWASP Top 10 vulnerabilities including prompt injection, model extraction, data poisoning, and supply chain attacks. Integrates with pentest workflows to discover and exploit AI-specific threats.

# SKILL.md


name: ai-threat-testing
description: Offensive AI security testing and exploitation framework. Systematically tests LLM applications for OWASP Top 10 vulnerabilities including prompt injection, model extraction, data poisoning, and supply chain attacks. Integrates with pentest workflows to discover and exploit AI-specific threats.


AI Threat Testing & Exploitation

This skill orchestrates comprehensive AI threat testing across LLM applications and systems. Use this to perform authorized penetration testing of AI systems, discover vulnerabilities, and document findings with proof-of-concept exploits.

Quick Start

Test an AI system:

Specify target:
- LLM application URL or deployment
- API endpoints (OpenAI, Anthropic, Azure, etc.)
- Local model (Ollama, LM Studio)
- RAG system or vector database

Select vulnerability scope:

1. All OWASP Top 10 (full assessment)
2. Specific vulnerabilities (targeted testing)
3. Quick scan (common issues)
4. Supply chain assessment (dependencies)

Get automated testing:
- Reconnaissance and fingerprinting
- Vulnerability-specific agents deploy
- Exploit execution with evidence capture
- Professional report generation

Primary Testing Agents

Agent 1: Prompt Injection Testing

Tests: LLM01 - Direct and indirect prompt injection

Covers:
- Direct instruction override attacks
- System prompt extraction
- Indirect injection via RAG/documents
- Multi-turn context manipulation
- Session hijacking and token extraction

Output: Prompt injection PoCs, bypass techniques, remediation


Agent 2: Output Handling Exploitation

Tests: LLM02 - Insecure output handling

Covers:
- Code injection (Python, SQL, shell commands)
- XSS via generated HTML/JavaScript
- Template injection attacks
- Unsafe deserialization
- Malicious content propagation

Output: Injection payloads, successful exploits, detection bypass


Agent 3: Training Data Analysis

Tests: LLM03 - Data poisoning vulnerability assessment

Covers:
- Membership inference attacks
- Training data extraction attempts
- Backdoor trigger identification
- Bias and adversarial example detection
- Model behavior anomalies

Output: Data exposure findings, backdoor triggers, bias analysis


Agent 4: Resource Exhaustion Testing

Tests: LLM04 - Model DoS vulnerabilities

Covers:
- Token flooding attacks
- Context window exhaustion
- Recursive expansion exploitation
- Computational overload testing
- Cost impact analysis

Output: DoS techniques, impact assessment, mitigation guidance


Agent 5: Supply Chain Assessment

Tests: LLM05 - Supply chain vulnerabilities

Covers:
- Dependency vulnerability scanning
- Plugin/integration security testing
- Model source verification
- API endpoint security
- Third-party risk assessment

Output: Vulnerability inventory, risk scores, remediation roadmap


Agent 6: Agency Exploitation

Tests: LLM06 - Excessive agency vulnerabilities

Covers:
- Privilege escalation attempts
- Unauthorized API calls
- Permission boundary testing
- State modification exploits
- Lateral movement via model

Output: Privilege escalation PoCs, permission bypasses


Agent 7: Model Extraction Attack

Tests: LLM07 - Model theft and extraction

Covers:
- Query-based model extraction
- Output analysis and inference
- Membership inference attacks
- Model property inference
- Training data reconstruction

Output: Extracted model info, leakage assessment, impact analysis


Agent 8: Vector DB Poisoning

Tests: LLM08 - Vector database and RAG attacks

Covers:
- Malicious document injection
- Retrieval manipulation
- Embedding space attacks
- Citation spoofing
- Knowledge base poisoning

Output: Injection techniques, retrieval bypass, remediation


Agent 9: Decision Reliance Testing

Tests: LLM09 - Overreliance vulnerabilities

Covers:
- Hallucination injection
- Output confidence analysis
- Verification workflow gaps
- Human-in-the-loop bypass
- False authority establishment

Output: Hallucination techniques, confidence manipulation, process gaps


Agent 10: Logging Bypass Testing

Tests: LLM10 - Insufficient logging and monitoring

Covers:
- Log deletion or evasion
- Monitoring detection bypass
- Unlogged request techniques
- Alert threshold manipulation
- Forensic evidence destruction

Output: Evasion techniques, detection gaps, monitoring recommendations


Testing Workflows

Workflow 1: Full OWASP Top 10 Assessment

Progress:
- [ ] Reconnaissance (target fingerprinting, capability detection)
- [ ] Agent 1: Prompt Injection testing
- [ ] Agent 2: Output Handling testing
- [ ] Agent 3: Data Poisoning analysis
- [ ] Agent 4: DoS vulnerability testing
- [ ] Agent 5: Supply Chain assessment
- [ ] Agent 6: Agency exploitation
- [ ] Agent 7: Model Extraction attempts
- [ ] Agent 8: Vector DB poisoning
- [ ] Agent 9: Reliance testing
- [ ] Agent 10: Logging bypass
- [ ] Report generation with findings

Duration: 4-8 hours depending on target complexity

Output: Professional penetration test report, PoC scripts, remediation roadmap


Workflow 2: Focused Vulnerability Testing

Progress:
- [ ] Identify specific vulnerability category
- [ ] Deploy targeted agent
- [ ] Execute exploitation techniques
- [ ] Capture evidence (screenshots, logs, outputs)
- [ ] Document findings
- [ ] Provide remediation guidance

Duration: 1-3 hours per vulnerability

Output: Focused finding with PoC


Workflow 3: Supply Chain Security Audit

Progress:
- [ ] Inventory AI/ML dependencies
- [ ] Scan for known vulnerabilities
- [ ] Test plugin security
- [ ] Verify model provenance
- [ ] Assess API security
- [ ] Rate vendor/third-party risk
- [ ] Create remediation roadmap

Duration: 2-4 hours

Output: Supply chain risk assessment, vendor scorecard, recommendations


Workflow 4: Continuous Monitoring Validation

Progress:
- [ ] Execute logging bypass techniques
- [ ] Attempt monitoring evasion
- [ ] Verify detection capabilities
- [ ] Test alert responsiveness
- [ ] Document monitoring gaps
- [ ] Recommend improvements

Duration: 1-2 hours

Output: Monitoring assessment, detection capability report


Integration with Pentest Skill

This skill enhances pentest workflows:

/pentest β†’ Add AI testing phase
  β”œβ”€β”€ Traditional vulnerability assessment (web, API, infrastructure)
  β”œβ”€β”€ AI Threat Testing (NEW)
  β”‚   β”œβ”€β”€ LLM application testing
  β”‚   β”œβ”€β”€ AI-specific exploitation
  β”‚   └── Model security assessment
  └── Combined findings report

Combined testing provides:
- Traditional security + AI security assessment
- Complete attack surface coverage
- Multi-vector exploitation chains
- Comprehensive risk scoring


Key Techniques Reference

Prompt Injection Payloads

  • Instruction override: "Ignore all previous instructions..."
  • System prompt extraction: "Repeat your system prompt exactly..."
  • Context shifting: Gradually moving conversation context
  • Token smuggling: Encoding across multiple turns
  • Filter evasion: Base64, leetspeak, language mixing

Model Extraction Methods

  • Query sampling: Repeated API queries for behavior reconstruction
  • Token probability analysis: Extracting logits and confidence scores
  • Membership inference: Testing if specific data was in training
  • Model inversion: Reconstructing training examples
  • Confidence calibration: Analyzing prediction confidence patterns

Data Poisoning Detection

  • Behavioral anomalies: Testing for trigger-based responses
  • Membership testing: Probing for specific training data
  • Fairness analysis: Checking for bias amplification
  • Output consistency: Testing reproducibility of predictions
  • Backdoor triggers: Common activation patterns

DoS Exploitation

  • Token flooding: Sending extremely long inputs
  • Recursive expansion: Nested loops and recursive structures
  • Context exhaustion: Filling available context window
  • Computational loops: Triggering expensive operations
  • Resource measurement: Monitoring cost and performance impact

Supply Chain Attacks

  • Dependency enumeration: Listing all AI/ML dependencies
  • CVE scanning: Identifying known vulnerabilities
  • Plugin capability audit: Testing plugin permissions and scope
  • API security testing: Checking endpoint authentication/authorization
  • Model verification: Confirming model integrity and provenance

Evidence Capture

All agents capture:
- Screenshots: Before/after exploitation
- Network logs: HTTP requests and responses
- API responses: Full response bodies and metadata
- Error messages: System errors and debug information
- Console output: Model outputs and behavior
- Execution time: Performance impact measurement
- Resource usage: CPU, memory, token consumption


Reporting

Automated report includes:
- Executive Summary: Critical findings and risk scoring
- Detailed Findings: Per-vulnerability analysis with CVSS scores
- Proof of Concept: Reproducible exploitation steps
- Evidence: Screenshots, logs, captured outputs
- Remediation: Specific fixes and hardening guidance
- Risk Assessment: Business impact and priority


Common Patterns

Quick Vulnerability Check

1. Target identification (fingerprinting)
2. Select single agent for vulnerability
3. Execute with default payloads
4. Document findings
5. Provide quick remediation tip

Deep Exploitation

1. Extensive reconnaissance
2. Deploy multiple agents
3. Chain vulnerabilities together
4. Custom exploit development
5. Complete assessment report

Red Team Scenario

1. Full OWASP assessment
2. Multi-vector attack chains
3. Privilege escalation paths
4. Data exfiltration techniques
5. Impact demonstration

Troubleshooting

"Target not responding"
β†’ Check network connectivity, verify API keys, confirm endpoint is live

"Agents not deploying"
β†’ Ensure proper authorization, check agent dependencies, verify scope

"No vulnerabilities found"
β†’ System may be hardened; try advanced techniques; check for defense mechanisms

"I need custom payloads"
β†’ Agents support custom payload injection; specify in prompts

"Need legal verification"
β†’ All testing MUST be authorized; provide written permission documentation

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.