terry-li-hm

visual-browser

0
0
# Install this skill:
npx skills add terry-li-hm/skills --skill "visual-browser"

Install specific skill from multi-skill repository

# Description

Browser automation using Computer Vision (Gemini 3 Flash). Use as fallback when agent-browser fails on icon-only buttons, complex Canvas/JS UIs, or visual audits.

# SKILL.md


name: visual-browser
description: Browser automation using Computer Vision (Gemini 3 Flash). Use as fallback when agent-browser fails on icon-only buttons, complex Canvas/JS UIs, or visual audits.
user_invocable: false


Visual Browser Skill

This skill allows the agent to interact with the web using Computer Vision and spatial reasoning via the browser-use library and Gemini 3 Flash. Unlike text-only scrapers, it can "see" the page, handle complex layouts, and interact with elements based on their visual appearance.

Usage

Invoke this skill when you need to perform a task that requires visual understanding of a website (e.g., clicking complex buttons, navigating non-standard UIs, or summarizing visual content).

uv run /Users/terry/skills/visual-browser/browse.py "Find the latest AI news on TechCrunch and summarize the top 3 stories."

Capabilities

  • Visual Navigation: Identifies elements by their visual features, not just HTML selectors.
  • Dynamic Interaction: Handles JS-heavy sites, modals, and complex state changes.
  • Multimodal Reasoning: Uses Gemini 2.0 Flash's vision capabilities to plan and execute browser actions.

Configuration

  • Requires GOOGLE_API_KEY in the environment.
  • Uses playwright for browser control.
  • Automatically installs dependencies via uv run.

Best Practices

  • Use for UI-heavy sites where standard scrapers fail.
  • Provide clear, step-by-step or outcome-oriented tasks.
  • For simple text retrieval, consider WebSearch or WebFetch instead to save tokens.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.