ai-dev-security-check

by @oktsec in AI & LLM

# Install this skill:

npx skills add oktsec/ai-security-skills --skill "ai-dev-security-check"

Install specific skill from multi-skill repository

# Description

Security review for AI-assisted development. Checks for hardcoded secrets, overly permissive CORS/auth, missing input validation, unpinned dependencies, unsafe MCP configs, payment security, data privacy, and third-party integration issues in code generated by LLMs. Hub skill that routes to credential-leak-scanner, mcp-security-audit, or agent-threat-detection for deep dives. Use when user says "review my AI-generated code", "is my setup secure", "security check my project", "audit what my agents built", or "I built this with Claude/Cursor/Copilot, is it safe".

# SKILL.md

name: ai-dev-security-check
description: Security review for AI-assisted development. Checks for hardcoded secrets, overly permissive CORS/auth, missing input validation, unpinned dependencies, unsafe MCP configs, payment security, data privacy, and third-party integration issues in code generated by LLMs. Hub skill that routes to credential-leak-scanner, mcp-security-audit, or agent-threat-detection for deep dives. Use when user says "review my AI-generated code", "is my setup secure", "security check my project", "audit what my agents built", or "I built this with Claude/Cursor/Copilot, is it safe".
metadata:
author: oktsec
version: 1.0.0
license: Apache-2.0

AI Dev Security Check

You built it with AI. It works. It looks good. This skill gives you visibility into what's happening underneath.

LLMs generate functional code. But functional is not secure. The patterns below come from scanning real codebases built with AI. They are predictable and fixable.

Instructions

Step 1: Understand the project

Read the project structure first. Then ask the user if anything is unclear:
1. What did you build? (web app, API, CLI tool, agent, MCP server, mobile backend)
2. Which AI tools helped? (Claude, Cursor, Copilot, etc.)
3. What's the stack? (language, framework, database, hosting)
4. Is this in production or pre-launch?
5. Does it handle payments, user data, or auth?

Step 2: Run the security audit

Go through each category. Skip categories that don't apply (e.g., skip Payments if there are no payments). For each finding, explain WHY it matters in plain language, not just what's wrong.

A. Secrets and credentials (CRITICAL)

AI often generates example code with placeholder secrets that become real.

What to check:
1. Scan all files for hardcoded API keys, passwords, tokens, connection strings
2. Is .env in .gitignore? If not, secrets are in your git history forever
3. Does .env.example contain real values instead of placeholders?
4. Check docker-compose.yml, CI/CD configs, Dockerfiles for inline secrets
5. Check git history: git log --all -p -S 'password' shows if a secret was ever committed (even if deleted later, it's still in history)
6. Check MCP server configs for plaintext API keys

Why this matters: A leaked API key can be found by automated scanners within minutes. If it's a payment key, someone can charge your account. If it's a cloud key, someone can spin up resources on your bill or access your data.

Common AI mistakes:
- Generating API_KEY=sk-real-key-here in example code that the user copy-pastes with real values
- Putting database passwords directly in docker-compose.yml
- Hardcoding connection strings like postgres://admin:password123@db:5432/myapp
- Including real tokens in test files that get committed

B. Overly permissive configurations (HIGH)

AI defaults to "make it work" which often means "allow everything".

What to check:
1. CORS: Is it Access-Control-Allow-Origin: *? This means any website can make requests to your API. Should be your specific domain only
2. Database: Is the app connecting as root/admin? Create a dedicated user with only the permissions the app needs
3. API endpoints: Are admin routes (like /admin, /api/users, /api/delete) protected by authentication?
4. MCP servers: Do they have access to your entire filesystem or just the project directory?
5. Docker: Is the container running as root? Does it expose ports that should be internal only?
6. Network: Are services bound to 0.0.0.0 (accessible from anywhere) that should be 127.0.0.1 (localhost only)?
7. File permissions: Are config files readable by anyone on the system? (should be owner-only: 600)

Why this matters: Permissive configs work fine until someone discovers them. An open CORS policy lets any malicious website steal your users' data. A root database connection means a SQL injection can drop your entire database, not just read one table.

Common AI mistakes:
- CORS: * because "it works in development" and nobody changes it for production
- MCP filesystem server running without --allowed-dir, giving the AI access to everything
- Docker containers running as root because the Dockerfile didn't include a USER instruction
- All API routes public, with auth planned for "later"

C. Missing input validation (HIGH)

AI generates the happy path. What happens when someone sends unexpected input is usually not handled.

What to check:
1. API endpoints: Do they validate what's in the request? (correct types, reasonable lengths, expected values)
2. File uploads: Is there a file size limit? Type checking? Can someone upload a .php file instead of an image?
3. Database queries: Are they using parameterized queries? (If you see string concatenation like "SELECT * FROM users WHERE id = " + userId, that's SQL injection, meaning someone can read or delete your entire database by crafting a special input)
4. User-provided content: Is it sanitized before being shown on the page? (If not, someone can inject JavaScript that runs in other users' browsers, stealing their sessions)
5. Path parameters: Can someone use ../../etc/passwd to read files outside the intended directory?

Why this matters: Without input validation, anyone can send crafted requests to your API. SQL injection alone is responsible for the majority of data breaches. It's the difference between "only valid data gets through" and "anyone can run arbitrary database commands".

Common AI mistakes:
- db.query("SELECT * FROM users WHERE id = " + req.params.id) instead of db.query("SELECT * FROM users WHERE id = $1", [req.params.id])
- No input length limits (someone sends a 10MB string to every text field)
- File upload accepts anything, no type or size validation
- Using innerHTML or dangerouslySetInnerHTML with user data

D. Authentication and authorization (HIGH)

AI implements auth that looks right on the surface but has gaps.

What to check:
1. Passwords: Are they hashed with bcrypt or argon2? (NOT MD5, NOT SHA256, those can be cracked in seconds)
2. JWT tokens: Do they have an expiry time? Is the signature verified on every request? Is the secret key strong?
3. Rate limiting: Can someone try 10,000 passwords per second on your login endpoint?
4. Admin routes: Does checking if (user.role === 'admin') actually happen on every admin route, or just the dashboard?
5. Sessions: Are cookies set with httpOnly (prevents JavaScript access), secure (HTTPS only), and SameSite (prevents cross-site attacks)?
6. Password reset: Can someone reset any user's password by changing an ID in the URL?

Why this matters: Auth is the front door. If it's weak, nothing else matters. A login endpoint without rate limiting can be brute-forced. A JWT without expiry is a permanent access token if stolen.

Common AI mistakes:
- bcrypt in the README but SHA256 in the actual code
- JWT created with expiry but never verified on the server
- Auth middleware defined but not applied to all route groups
- Password reset endpoint that doesn't verify the token belongs to the requesting user

E. Dependency security (MEDIUM)

AI pulls in packages without checking them.

What to check:
1. Are dependencies pinned to specific versions? ("express": "4.18.2" not "express": "*")
2. Run the audit command: npm audit / pip audit / govulncheck ./...
3. Are there unused packages? Each one is extra code that could have vulnerabilities
4. MCP servers: are npx packages version-pinned? npx @company/[email protected] not npx @company/server
5. Are package names exactly right? (typosquatting: expres instead of express)

Why this matters: Your app is only as secure as its weakest dependency. An unpinned package means you might get a different (possibly compromised) version on every install.

F. Data and privacy (MEDIUM)

AI doesn't think about what happens to user data after it works.

What to check:
1. Are logs recording sensitive data? (passwords, tokens, credit card numbers in request logs)
2. Is user data encrypted at rest? (database encryption, file encryption)
3. Are API responses returning more data than the frontend needs? (sending full user objects when only name is needed)
4. Is there a way to delete user data? (GDPR, CCPA compliance)
5. Are analytics or error tracking tools collecting PII? (email addresses in Sentry, full URLs with tokens in analytics)
6. Are backups encrypted?

Why this matters: A data breach doesn't just lose data, it loses trust. Regulations like GDPR can fine up to 4% of global revenue. But more practically, if your logs contain passwords and someone accesses the logs, every user's password is compromised.

Common AI mistakes:
- console.log(req.body) in production, logging every request including passwords
- API endpoint returns SELECT * including password hashes and internal fields
- Error tracking captures the full URL including ?token=abc123 in the query string

G. Payments and financial (HIGH, if applicable)

If your app handles money, AI-generated payment code needs extra scrutiny.

What to check:
1. Webhook verification: Are Stripe/payment webhooks verifying the signature? (Without this, anyone can fake a "payment successful" event)
2. Amount validation: Is the payment amount set server-side? (If the client sends the amount, someone can pay $0.01 for a $100 item)
3. Idempotency: Can a payment be processed twice if the user double-clicks or the network retries?
4. PCI compliance: Are credit card numbers ever touching your server? (They shouldn't. Use Stripe Elements, not raw card inputs)
5. Refund logic: Can a user trigger unlimited refunds?

Why this matters: Payment bugs directly cost money. A missing webhook signature check means anyone who knows your endpoint can grant themselves a paid subscription.

H. Third-party integrations (MEDIUM)

AI connects to external services but often skips the security handshake.

What to check:
1. OAuth: Is the state parameter used? (Without it, someone can trick a user into linking the attacker's account)
2. Webhooks: Are incoming webhooks verifying signatures from the provider?
3. API keys: Are they scoped to minimum permissions? (A read-write key when you only need read)
4. Rate limits: Are you handling rate limit responses from APIs? (429 responses)
5. Timeouts: Do external API calls have timeouts? (Without them, a slow API can hang your entire server)

Step 3: Prioritize and report

Generate a clear report. The user may not know cybersecurity, so explain each finding in terms of impact:

## AI Dev Security Check

**Project:** [name]
**Stack:** [language/framework]
**AI tools used:** [list]
**Date:** [current date]

### Quick wins (fix in 5 minutes)
[Things that take one line to fix: add .env to .gitignore, set CORS origin, add rate limit]

### Critical findings
[Must fix before going live. Explain what could happen if not fixed.]

### High findings
[Fix this week. Explain the risk in plain language.]

### Medium findings
[Plan to address. Lower risk but still important.]

### What's good
[Things the AI got right. Positive reinforcement.]

### Security score: [A-F]
- A: No critical/high, minor medium issues
- B: No critical, few high issues
- C: Has high issues but no immediate data exposure
- D: Has critical issues
- F: Active credential exposure or exploitable vulnerabilities

### Top 3 actions
1. [Most impactful fix with exact code/config change]
2. [Second priority with exact fix]
3. [Third priority with exact fix]

### Learn more
[Link to relevant OWASP page or documentation for each critical/high finding]

Step 4: Provide exact fixes

For each finding, give the actual fix. Not "fix the CORS config" but the exact code change:

// Before (insecure)
app.use(cors())

// After (secure)
app.use(cors({ origin: 'https://yourdomain.com', credentials: true }))

Non-technical users need copy-pasteable solutions. If the fix requires multiple steps, number them.

Based on what you found:
- If MCP configs need deeper review, suggest: "Run /mcp-security-audit for a full audit of all your MCP servers"
- If credentials were found, suggest: "Run /credential-leak-scanner on the full codebase to catch any others"
- If the project handles agent communication, suggest: "Run /agent-threat-detection on sample messages"

Examples

Example 1: "I built a web app with Claude"

User: "I built this Next.js app with Claude, can you check if it's secure?"

Read project structure
Check .env is in .gitignore
Scan for hardcoded secrets in all files
Check API routes for auth middleware
Verify CORS config in middleware
Check database queries for parameterized queries
Review any payment integration
Check what's being logged
Generate graded report with exact fixes

Example 2: "I set up MCP servers with Cursor"

User: "Cursor helped me set up 4 MCP servers, is my config safe?"

Read MCP config files from all client locations
Check version pinning on each server
Check permission scoping
Verify no plaintext secrets in configs
Report findings, suggest /mcp-security-audit for full audit

Example 3: "I'm about to launch my SaaS"

User: "Launching next week, built most of it with Copilot. Quick security review?"

Prioritize: secrets, auth, payments (the things that cause immediate damage)
Check production config vs development config
Verify HTTPS, CORS, rate limiting
Check that debug mode is off
Verify error pages don't leak stack traces
Report with "quick wins" section first for pre-launch fixes

Common Issues

"It works fine, why change it?"

Frame findings in terms of impact, not jargon. Not "you have a SQL injection vulnerability" but "someone can use this form to read your entire user database, including passwords, by typing a specific string into the search box."

Large projects

For big codebases, prioritize: secrets first (biggest immediate risk), then auth, then payments, then input validation, then everything else. Don't try to audit everything at once.

False sense of security

AI-generated tests often test the happy path. Passing tests does not mean the code is secure. Make this clear in the report.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.