Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add semgrep/skills --skill "semgrep"
Install specific skill from multi-skill repository
# Description
Run Semgrep static analysis scans and create custom detection rules. Use when asked to scan code with Semgrep, find security vulnerabilities, write custom YAML rules, or detect specific bug patterns.
# SKILL.md
name: semgrep
description: Run Semgrep static analysis scans and create custom detection rules. Use when asked to scan code with Semgrep, find security vulnerabilities, write custom YAML rules, or detect specific bug patterns.
Semgrep Static Analysis
Fast, pattern-based static analysis for security scanning and custom rule creation.
When to Use Semgrep
Ideal scenarios:
- Quick security scans (minutes, not hours)
- Pattern-based bug and vulnerability detection
- Enforcing coding standards and best practices
- Finding known vulnerability patterns (OWASP, CWE)
- Creating custom detection rules for your codebase
- Data flow analysis with taint mode
Installation
# pip (recommended)
python3 -m pip install semgrep
# Homebrew
brew install semgrep
# Docker
docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src
Part 1: Running Scans
Quick Scan
semgrep --config auto . # Auto-detect rules
Using Rulesets
semgrep --config p/<RULESET> . # Single ruleset
semgrep --config p/security-audit --config p/trailofbits . # Multiple
| Ruleset | Description |
|---|---|
p/default |
General security and code quality |
p/security-audit |
Comprehensive security rules |
p/owasp-top-ten |
OWASP Top 10 vulnerabilities |
p/cwe-top-25 |
CWE Top 25 vulnerabilities |
p/trailofbits |
Trail of Bits security rules |
p/python |
Python-specific |
p/javascript |
JavaScript-specific |
p/golang |
Go-specific |
Output Formats
semgrep --config p/security-audit --sarif -o results.sarif . # SARIF
semgrep --config p/security-audit --json -o results.json . # JSON
Scan Specific Paths
semgrep --config p/python app.py # Single file
semgrep --config p/javascript src/ # Directory
semgrep --config auto --include='**/test/**' . # Include tests
Configuration
.semgrepignore
tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/
Suppress False Positives
password = get_from_vault() # nosemgrep: hardcoded-password
dangerous_but_safe() # nosemgrep
Part 2: Creating Custom Rules
When to Create Custom Rules
- Detecting project-specific vulnerability patterns
- Enforcing internal coding standards
- Building security checks for custom frameworks
- Creating taint-mode rules for data flow analysis
Approach Selection
| Approach | Use When |
|---|---|
| Taint mode | Data flows from untrusted source to dangerous sink (injection vulnerabilities) |
| Pattern matching | Syntactic patterns without data flow requirements (deprecated APIs, hardcoded values) |
Prioritize taint mode for injection vulnerabilities. Pattern matching alone can't distinguish between eval(user_input) (vulnerable) and eval("safe_literal") (safe).
Quick Start: Pattern Matching
rules:
- id: hardcoded-password
languages: [python]
message: "Hardcoded password detected: $PASSWORD"
severity: ERROR
pattern: password = "$PASSWORD"
Quick Start: Taint Mode
rules:
- id: command-injection
languages: [python]
message: User input flows to command execution
severity: ERROR
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
pattern-sinks:
- pattern: os.system(...)
- pattern: subprocess.call($CMD, shell=True, ...)
pattern-sanitizers:
- pattern: shlex.quote(...)
Pattern Syntax Quick Reference
| Syntax | Description | Example |
|---|---|---|
... |
Match anything | func(...) |
$VAR |
Capture metavariable | $FUNC($INPUT) |
<... ...> |
Deep expression match | <... user_input ...> |
| Operator | Description |
|---|---|
pattern |
Match exact pattern |
patterns |
All must match (AND) |
pattern-either |
Any matches (OR) |
pattern-not |
Exclude matches |
pattern-inside |
Match only inside context |
pattern-not-inside |
Match only outside context |
metavariable-regex |
Regex on captured value |
Testing Rules
Test-first is mandatory. Create test files with annotations:
# test_rule.py
def test_vulnerable():
user_input = request.args.get("id")
# ruleid: my-rule-id
cursor.execute("SELECT * FROM users WHERE id = " + user_input)
def test_safe():
user_input = request.args.get("id")
# ok: my-rule-id
cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))
Run tests:
semgrep --test --config rule.yaml test-file
Command Reference
| Task | Command |
|---|---|
| Run tests | semgrep --test --config rule.yaml test-file |
| Validate YAML | semgrep --validate --config rule.yaml |
| Dump AST | semgrep --dump-ast -l <lang> <file> |
| Debug taint flow | semgrep --dataflow-traces -f rule.yaml file |
Rule Creation Workflow
- Analyze the problem - Understand the bug pattern, determine taint vs pattern approach
- Create test cases first - Write
ruleid:andok:annotations before the rule - Analyze AST - Run
semgrep --dump-astto understand code structure - Write the rule - Start simple, iterate
- Test until 100% pass - No "missed lines" or "incorrect lines"
- Optimize patterns - Remove redundancies only after tests pass
Output structure:
<rule-id>/
βββ <rule-id>.yaml # Semgrep rule
βββ <rule-id>.<ext> # Test file
Detailed References
Official Semgrep Documentation:
- Rule Syntax - Complete YAML structure, operators, and options
- Rule Schema - Full JSON schema specification
Local References:
- Workflow Guide - Complete step-by-step rule creation process
- Quick Reference - Pattern operators and taint components
Anti-Patterns to Avoid
Too broad:
# BAD: Matches any function call
pattern: $FUNC(...)
# GOOD: Specific dangerous function
pattern: eval(...)
Missing safe cases:
# BAD: Only tests vulnerable case
# ruleid: my-rule
dangerous(user_input)
# GOOD: Include safe cases
# ruleid: my-rule
dangerous(user_input)
# ok: my-rule
dangerous(sanitize(user_input))
Rationalizations to Reject
| Shortcut | Why It's Wrong |
|---|---|
| "Semgrep found nothing, code is clean" | Semgrep is pattern-based; can't track complex cross-function data flow |
| "The pattern looks complete" | Untested rules have hidden false positives/negatives |
| "It matches the vulnerable case" | Matching vulnerabilities is half the job; verify safe cases don't match |
| "Taint mode is overkill" | For injection vulnerabilities, taint mode gives better precision |
| "One test case is enough" | Include edge cases: different coding styles, sanitized inputs, safe alternatives |
CI/CD Integration
GitHub Actions
name: Semgrep
on:
push:
branches: [main]
pull_request:
schedule:
- cron: '0 0 1 * *'
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run Semgrep
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
else
semgrep ci
fi
env:
SEMGREP_RULES: >-
p/security-audit
p/owasp-top-ten
p/trailofbits
Resources
Rule Writing:
- Rule Syntax: https://semgrep.dev/docs/writing-rules/rule-syntax
- Pattern Syntax: https://semgrep.dev/docs/writing-rules/pattern-syntax
- Rule Schema: https://github.com/semgrep/semgrep-interfaces/blob/main/rule_schema_v1.yaml
General:
- Registry: https://semgrep.dev/explore
- Playground: https://semgrep.dev/playground
- Docs: https://semgrep.dev/docs/
- Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.