Use when you have a written implementation plan to execute in a separate session with review checkpoints
npx skills add michaelboeding/skills --skill "image-generation"
Install specific skill from multi-skill repository
# Description
>
# SKILL.md
name: image-generation
description: >
Use this skill for any image-related AI generation or editing task. Triggers include:
GENERATE: "generate image", "create image", "make picture", "draw", "visualize", "image of", "create art", "generate art"
EDIT: "edit image", "modify image", "change image", "update image", "fix image", "enhance image"
ADD/REMOVE: "add to image", "put in image", "remove from image", "delete from image", "add element"
STYLE: "style transfer", "make it look like", "convert style", "apply style", "in the style of"
PRODUCT: "product photo", "product placement", "place product", "mockup", "put product on"
COMPOSITE: "combine images", "merge images", "blend images", "create composite"
Supports text-to-image generation, image editing with references, product placement, style transfer, and multi-image composition using Google Gemini (Nano Banana Pro) or OpenAI DALL-E.
Image Generation & Editing Skill
Generate and edit images using AI (Google Gemini Nano Banana Pro, OpenAI DALL-E 3).
Capabilities:
- π¨ Generate: Create new images from text descriptions
- βοΈ Edit: Modify existing images (add/remove elements, change colors)
- ποΈ Product Placement: Put products into scenes
- π Style Transfer: Apply artistic styles to photos
- πΌοΈ Composite: Combine multiple images into one
Quick Examples
Users can specify what they want:
| User Says | Mode | What Happens |
|---|---|---|
| "Generate an image of a sunset" | Generate | Text-to-image, no reference needed |
| "Create a logo for my coffee shop" | Generate | Text-to-image with text rendering |
| "Edit this image: add a hat to the cat" | Edit | User provides image, AI modifies it |
| "Remove the background from this photo" | Edit | User provides image, AI edits it |
| "Put this product on a kitchen counter" | Product | User provides product + optional scene |
| "Make this photo look like Van Gogh painted it" | Style | User provides photo, AI applies style |
| "Combine these photos into a group shot" | Composite | User provides multiple images |
Prerequisites
Environment variables must be configured for the APIs to work. At least one API key is required:
OPENAI_API_KEY- For OpenAI DALL-E 3 image generationGOOGLE_API_KEY- For Google Gemini (Nano Banana / Nano Banana Pro)
See the repository README for setup instructions.
Available APIs
OpenAI GPT Image (Recommended for pure generation)
- Models:
gpt-image-1.5(state of the art, best quality)gpt-image-1(great quality, cost-effective)gpt-image-1-mini(fastest, most affordable)- Best for: High-quality generation, transparency, text rendering, image editing
- Sizes: 1024x1024 (square), 1536x1024 (landscape), 1024x1536 (portrait), or
auto - Quality: low (fast), medium (balanced), high (best), or
auto - Background: transparent, opaque, or
auto - Output formats: png (default), jpeg (faster), webp
- Compression: 0-100% (for jpeg/webp)
- Features:
- Image editing with up to 16 input images
- Transparent backgrounds
- Streaming with partial images
- High input fidelity for preserving faces/logos
- Inpainting with masks
- 32,000 character prompts
β οΈ Note: DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026.
Google Gemini Native Image Generation (Recommended for editing)
- Nano Banana (
gemini-2.5-flash-image): Fast, efficient, 1K resolution, up to 3 reference images - Nano Banana Pro (
gemini-3-pro-image-preview): Professional quality, up to 4K, thinking mode, up to 14 reference images (default) - Aspect ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
- Resolutions (Pro only): 1K, 2K, 4K
- Features:
- Image editing (add/remove elements, color changes)
- Product placement and composition
- Style transfer
- Advanced text rendering
- Google Search grounding (Pro only)
- Thinking mode for complex prompts (Pro only)
Workflow
Step 1: Gather Requirements (REQUIRED)
β οΈ Use interactive questioning β ask ONE question at a time.
Question Flow
β οΈ Use the AskUserQuestion tool for each question below. Do not just print questions in your response β use the tool to create interactive prompts with the options shown.
Q0: Model Selection
"Which image generation model would you like to use?
- Google Gemini (Nano Banana Pro) - Up to 4K, 14 reference images, style transfer, thinking mode (Recommended)
- OpenAI GPT Image 1.5 - State of the art, transparency, streaming, up to 16 input images
- OpenAI GPT Image 1 - Great quality, transparency, image editing
- OpenAI GPT Image 1 Mini - Fastest, most affordable"
Wait for response. If user doesn't have a preference, recommend Gemini for editing/reference tasks or GPT Image 1.5 for pure generation.
Q1: Reference
"I'll generate that image for you! First β do you have any reference images?
- Product photos to include
- Style references
- Images to edit
- No, generate from scratch"
Wait for response.
Q2: Aspect Ratio
"What aspect ratio?
- 1:1 (square)
- 16:9 (landscape/widescreen)
- 9:16 (portrait/vertical)
- 4:3 / 3:4 (classic)
- Other (2:3, 3:2, 4:5, 5:4, 21:9)
- Or specify"
Wait for response.
Q3: Resolution
"What resolution?
- 1K (fast)
- 2K (balanced)
- 4K (highest quality)"
Wait for response.
Q4: Style
"Any style preferences?
- Photorealistic
- Artistic/painterly
- Cartoon/illustration
- 3D render
- Or describe your own"
Wait for response.
Quick Reference
| Question | Determines |
|---|---|
| Reference | Generation vs editing mode |
| Aspect Ratio | Image dimensions |
| Resolution | Quality level |
| Style | Prompt enhancement direction |
Parsing:
- If user provides reference images β use image editing mode
- If user doesn't answer all questions β use sensible defaults and note assumptions
- Parse: subject, style, mood, special requirements (colors, text, composition)
Step 2: Craft the Prompt
Transform the user request into an effective image generation prompt:
- Be specific: Add details the user might not have mentioned
- Describe style: "digital art", "oil painting", "photograph", "3D render"
- Include lighting: "soft lighting", "dramatic shadows", "golden hour"
- Specify quality: "highly detailed", "8k", "professional"
Example transformation:
- User: "a cat in space"
- Enhanced: "A majestic orange tabby cat floating in outer space, surrounded by colorful nebulae and distant stars, wearing a small astronaut helmet, digital art style, highly detailed, vibrant colors, cinematic lighting"
Step 3: Select the API
Use the model selected by the user in Q0:
- Check which API keys are configured in environment:
OPENAI_API_KEYβ GPT Image models available-
GOOGLE_API_KEYβ Gemini (Nano Banana Pro) available -
If the user's selected model isn't available: Inform them and offer alternatives.
-
Model mapping from Q0:
- "Google Gemini (Nano Banana Pro)" β Use
gemini.pywithgemini-3-pro-image-preview - "OpenAI GPT Image 1.5" β Use
openai_image.pywithgpt-image-1.5 - "OpenAI GPT Image 1" β Use
openai_image.pywithgpt-image-1 - "OpenAI GPT Image 1 Mini" β Use
openai_image.pywithgpt-image-1-mini
Step 4: Generate the Image
Execute the appropriate script from ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/:
For OpenAI GPT Image - Text to Image:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "your enhanced prompt" \
--model "gpt-image-1" \
--size "1024x1024" \
--quality "high" \
--output "/path/to/output.png"
For OpenAI GPT Image - With Transparent Background:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "A product icon with no background" \
--model "gpt-image-1" \
--background "transparent" \
--quality "high" \
--output "/path/to/output.png"
For OpenAI GPT Image - Image Editing (with reference images):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Add a wizard hat to this cat" \
--model "gpt-image-1" \
--image "/path/to/cat.jpg" \
--input-fidelity "high" \
--output "/path/to/output.png"
For OpenAI GPT Image - Multiple Reference Images:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Create a gift basket containing these items" \
--model "gpt-image-1" \
--image "/path/to/item1.png" \
--image "/path/to/item2.png" \
--image "/path/to/item3.png" \
--output "/path/to/output.png"
For OpenAI GPT Image - With Mask (Inpainting):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Replace the pool with a garden" \
--model "gpt-image-1" \
--image "/path/to/scene.jpg" \
--mask "/path/to/mask.png" \
--output "/path/to/output.png"
For OpenAI GPT Image - Streaming with Partial Images:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "A beautiful sunset over mountains" \
--model "gpt-image-1" \
--stream \
--partial-images 2 \
--output "/path/to/output.png"
For Google Gemini (Nano Banana Pro) - Text to Image:
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "your enhanced prompt" \
--model "gemini-3-pro-image-preview" \
--aspect-ratio "1:1" \
--resolution "2K" \
--output "/path/to/output.png"
For Google Gemini - With Reference Images (editing, product placement, etc.):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "Add a wizard hat to this cat" \
--image "/path/to/cat.jpg" \
--aspect-ratio "1:1" \
--resolution "2K"
For Google Gemini - Multiple Reference Images (composition, style transfer):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "Place this product on the kitchen counter in this scene" \
--image "/path/to/product.png" \
--image "/path/to/kitchen.jpg" \
--aspect-ratio "16:9" \
--resolution "2K"
For Google Gemini (Nano Banana - faster, fewer features):
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "your enhanced prompt" \
--model "gemini-2.5-flash-image" \
--aspect-ratio "1:1"
Step 5: Deliver the Result
- Show the generated image to the user
- Provide the enhanced prompt used (so they can iterate)
- Offer to:
- Generate variations
- Try a different style
- Use a different API/model
- Refine the prompt
Error Handling
Missing API key: Inform the user which key is needed and how to set it up:
- OpenAI: https://platform.openai.com/api-keys
- Google: https://aistudio.google.com/apikey
API rate limit: Suggest waiting or trying the other API.
Content policy violation: Rephrase the prompt to be more appropriate.
Generation failed: Retry with simplified prompt or different API.
Reference Image Use Cases
Both OpenAI GPT Image and Google Gemini support reference images for advanced editing:
OpenAI GPT Image: Up to 16 input images, with input_fidelity: high for preserving faces/logos
Google Gemini: Nano Banana (up to 3), Nano Banana Pro (up to 14)
Image Editing
- "Add a santa hat to this person" + person.jpg
- "Remove the background and replace with a beach scene" + product.jpg
- "Change the sofa color to blue" + living_room.jpg
Product Placement
- "Place this product on a marble kitchen counter" + product.png + kitchen.jpg
- "Show this watch on a person's wrist" + watch.png + arm.jpg
Style Transfer
- "Transform this photo into Van Gogh's Starry Night style" + photo.jpg
- "Make this look like a watercolor painting" + landscape.jpg
Multi-Image Composition
- "Create a group photo of these people in an office" + person1.jpg + person2.jpg + person3.jpg
- "Combine these elements into a cohesive scene" + element1.png + element2.png + background.jpg
Character Consistency
- "Show this character from a different angle" + character.jpg
- "Put this person in a superhero costume" + person.jpg
Tip: For best results with reference images, be specific about what you want to preserve vs. change.
Prompt Engineering Tips
For Photorealism
- Include "photograph", "DSLR", "35mm film"
- Specify camera settings: "shallow depth of field", "bokeh"
- Add lighting: "natural light", "studio lighting"
For Artistic Styles
- Reference art movements: "impressionist", "art nouveau", "cyberpunk"
- Name artist styles: "in the style of Studio Ghibli", "Moebius style"
- Specify medium: "watercolor", "oil painting", "pencil sketch"
For Consistency
- Use seed values when available
- Save successful prompts for reference
- Note which API produced best results for similar requests
API Comparison
| Feature | GPT Image 1.5 | GPT Image 1 | GPT Image 1 Mini | Nano Banana | Nano Banana Pro |
|---|---|---|---|---|---|
| Provider | OpenAI | OpenAI | OpenAI | ||
| Model ID | gpt-image-1.5 | gpt-image-1 | gpt-image-1-mini | gemini-2.5-flash-image | gemini-3-pro-image-preview |
| Best for | State of the art | Quality + value | Speed + cost | Fast generation | Professional assets |
| Sizes | 1024Β², 1536x1024, 1024x1536, auto | Same | Same | 1K only | Up to 4K |
| Quality options | low, medium, high, auto | Same | Same | N/A | N/A |
| Aspect ratios | 3 + auto | Same | Same | 10 options | 10 options |
| Reference images | Up to 16 | Up to 16 | Up to 16 | Up to 3 | Up to 14 |
| Image editing | Yes | Yes | Yes | Yes | Yes |
| Inpainting (mask) | Yes | Yes | Yes | Yes | Yes |
| Transparent background | Yes | Yes | Yes | No | No |
| Streaming | Yes | Yes | Yes | No | No |
| Input fidelity | high/low | high/low | low only | N/A | N/A |
| Output formats | png, jpeg, webp | Same | Same | png | png |
| Compression | 0-100% | Same | Same | No | No |
| Text rendering | Excellent | Excellent | Good | Good | Excellent |
| Thinking mode | No | No | No | No | Yes |
| Max prompt length | 32,000 chars | 32,000 chars | 32,000 chars | N/A | N/A |
| Speed | ~30-60s | ~20-40s | ~10-20s | ~10-20s | ~30-60s |
β οΈ DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026. Use GPT Image models instead.
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.