Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add terry-li-hm/skills --skill "visual-browser"
Install specific skill from multi-skill repository
# Description
Browser automation using Computer Vision (Gemini 3 Flash). Use as fallback when agent-browser fails on icon-only buttons, complex Canvas/JS UIs, or visual audits.
# SKILL.md
name: visual-browser
description: Browser automation using Computer Vision (Gemini 3 Flash). Use as fallback when agent-browser fails on icon-only buttons, complex Canvas/JS UIs, or visual audits.
user_invocable: false
Visual Browser Skill
This skill allows the agent to interact with the web using Computer Vision and spatial reasoning via the browser-use library and Gemini 3 Flash. Unlike text-only scrapers, it can "see" the page, handle complex layouts, and interact with elements based on their visual appearance.
Usage
Invoke this skill when you need to perform a task that requires visual understanding of a website (e.g., clicking complex buttons, navigating non-standard UIs, or summarizing visual content).
uv run /Users/terry/skills/visual-browser/browse.py "Find the latest AI news on TechCrunch and summarize the top 3 stories."
Capabilities
- Visual Navigation: Identifies elements by their visual features, not just HTML selectors.
- Dynamic Interaction: Handles JS-heavy sites, modals, and complex state changes.
- Multimodal Reasoning: Uses Gemini 2.0 Flash's vision capabilities to plan and execute browser actions.
Configuration
- Requires
GOOGLE_API_KEYin the environment. - Uses
playwrightfor browser control. - Automatically installs dependencies via
uv run.
Best Practices
- Use for UI-heavy sites where standard scrapers fail.
- Provide clear, step-by-step or outcome-oriented tasks.
- For simple text retrieval, consider
WebSearchorWebFetchinstead to save tokens.
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.