gemini-research-browser-use

by @Grasseed in Web & API

# Install this skill:

npx skills add Grasseed/google-search-browser-use --skill "gemini-research-browser-use"

Install specific skill from multi-skill repository

# Description

Use Chrome DevTools Protocol to allow the AI to "ask Gemini" or "research with Gemini" directly. This uses the user's logged-in Chrome session, bypassing API limits and leveraging the web interface's reasoning capabilities.

# SKILL.md

name: gemini-research-browser-use
description: Use Chrome DevTools Protocol to allow the AI to "ask Gemini" or "research with Gemini" directly. This uses the user's logged-in Chrome session, bypassing API limits and leveraging the web interface's reasoning capabilities.

Gemini Research Browser Use

Overview

Perform research or queries using Google Gemini via Chrome DevTools Protocol (CDP). This method reuses the user's existing Chrome login session to interact with the Gemini web interface (https://gemini.google.com/).

Prerequisites

Python + websockets
Verify:
bash python3 --version python3 -m pip show websockets
Install if missing:
bash python3 -m pip install websockets
Google Chrome
Verify:
bash "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --version
CDP Port Availability
Verify Chrome is listening (after launch in Step 2):
bash curl -s http://localhost:9222/json | python3 -m json.tool
Non-default user data directory (required by Chrome)
Chrome CDP requires a non-default profile path. Use a cloned profile so you keep login state.
bash rm -rf /tmp/chrome-gemini-profile rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/

Method Comparison

Method	Pros	Cons	Recommended
Chrome Remote Debugging (CDP)	Uses existing login, full automation, reliable	Requires Chrome restart with debugging flag	✅ Yes
`browser-use --browser real`	Simple CLI	Opens new session without login	❌ No
`browser_subagent`	Visual feedback	Rate limited, may fail	❌ No

✅ Recommended Method: Chrome Remote Debugging (CDP)

This is the most reliable method that uses your system Chrome with existing Google login.

Prerequisites

Python 3 with websockets library
Google Chrome installed at /Applications/Google Chrome.app/
User logged into Google in Chrome

Step 1: Install websockets (if needed)

pip3 install websockets
# Or in virtual environment:
python3 -m venv .venv && ./.venv/bin/pip install websockets

Step 2: Launch Chrome with Remote Debugging (Non-default profile)

Important: Close any existing Chrome windows first, or use a different debugging port.

"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="/tmp/chrome-gemini-profile" \
  "https://gemini.google.com/" &

Parameters explained:
- --remote-debugging-port=9222: Enables CDP on port 9222
- --user-data-dir: Points to your existing Chrome profile (with login session)
- The URL opens Gemini directly

Step 3: Verify Connection (CDP)

curl -s http://localhost:9222/json | python3 -m json.tool

Look for the Gemini page entry:

{
  "title": "Google Gemini",
  "url": "https://gemini.google.com/app",
  "webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/XXXXXXXX"
}

Note: If URL shows /app instead of just /, it means you're logged in.

Step 4: Send Query to Gemini

Save this as gemini_query.py or run inline:

import asyncio
import websockets
import json
import subprocess
import sys

async def query_gemini(query_text, wait_seconds=30):
    # Get the Gemini page WebSocket URL
    result = subprocess.run(
        ["curl", "-s", "http://localhost:9222/json"],
        capture_output=True, text=True
    )
    pages = json.loads(result.stdout)

    # Find Gemini page
    gemini_page = None
    for page in pages:
        if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
            gemini_page = page
            break

    if not gemini_page:
        print("Error: Gemini page not found. Make sure Chrome is open with Gemini.")
        return None

    ws_url = gemini_page["webSocketDebuggerUrl"]
    print(f"Connecting to: {ws_url}")

    async with websockets.connect(ws_url) as ws:
        # Step 1: Input the query
        input_js = f'''
        const editor = document.querySelector('div[contenteditable="true"]');
        if(editor) {{
            editor.focus();
            document.execCommand('insertText', false, `{query_text}`);
            editor.dispatchEvent(new Event('input', {{bubbles: true}}));
            'success';
        }} else {{
            'editor not found';
        }}
        '''

        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": input_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"Input result: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")

        # Step 2: Click send button
        await asyncio.sleep(1)
        click_js = '''
        const btn = document.querySelector('button[aria-label="傳送訊息"]');
        if(btn) { btn.click(); 'clicked'; } else { 'button not found'; }
        '''

        await ws.send(json.dumps({
            "id": 2,
            "method": "Runtime.evaluate",
            "params": {"expression": click_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"Click result: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")

        # Step 3: Wait for response
        print(f"Waiting {wait_seconds} seconds for Gemini to respond...")
        await asyncio.sleep(wait_seconds)

        # Step 4: Extract the response
        extract_js = '''
        const markdownEls = document.querySelectorAll('.markdown');
        if(markdownEls.length > 0) {
            markdownEls[markdownEls.length - 1].innerText;
        } else {
            'No response found';
        }
        '''

        await ws.send(json.dumps({
            "id": 3,
            "method": "Runtime.evaluate",
            "params": {"expression": extract_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        content = result.get('result', {}).get('result', {}).get('value', 'No content')

        return content

# Main execution
if __name__ == "__main__":
    query = sys.argv[1] if len(sys.argv) > 1 else "範例問題：請用繁體中文回答什麼是區塊鏈？"
    result = asyncio.run(query_gemini(query, wait_seconds=30))
    print("\n" + "="*50)
    print("GEMINI RESPONSE:")
    print("="*50)
    print(result)

Step 5: Run the Query

python3 gemini_query.py "範例問題：你的查詢問題"

Or inline for simple queries:

python3 << 'EOF'
import asyncio
import websockets
import json

async def send_to_gemini():
    # Get WebSocket URL
    import subprocess
    result = subprocess.run(["curl", "-s", "http://localhost:9222/json"], capture_output=True, text=True)
    pages = json.loads(result.stdout)
    ws_url = next(p["webSocketDebuggerUrl"] for p in pages if "gemini.google.com" in p.get("url", ""))

    async with websockets.connect(ws_url) as ws:
        # Input query
        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": '''
                const editor = document.querySelector('div[contenteditable="true"]');
                editor.focus();
                document.execCommand('insertText', false, '範例問題：請分析比特幣未來的價格走勢');
                editor.dispatchEvent(new Event('input', {bubbles: true}));
            '''}
        }))
        await ws.recv()

        # Click send
        await asyncio.sleep(1)
        await ws.send(json.dumps({
            "id": 2,
            "method": "Runtime.evaluate",
            "params": {"expression": '''document.querySelector('button[aria-label="傳送訊息"]').click()'''}
        }))
        await ws.recv()

        # Wait and extract
        await asyncio.sleep(30)
        await ws.send(json.dumps({
            "id": 3,
            "method": "Runtime.evaluate",
            "params": {"expression": '''
                document.querySelectorAll('.markdown')[document.querySelectorAll('.markdown').length - 1].innerText
            '''}
        }))
        response = await ws.recv()
        print(json.loads(response)['result']['result']['value'])

asyncio.run(send_to_gemini())
EOF

Alternative Method: browser-use CLI

This method is simpler but does not use your existing Chrome login. You'll need to log in manually each time.

Prerequisites

# Create virtual environment
python3 -m venv .venv

# Install browser-use
./.venv/bin/pip install browser-use

Workflow

1) Open Gemini

./.venv/bin/browser-use --browser real open "https://gemini.google.com/"

2) Get Page State

./.venv/bin/browser-use --browser real state

Look for:
- The input textbox: contenteditable=true role=textbox
- The send button: aria-label=傳送訊息

3) Input Text via JavaScript eval

./.venv/bin/browser-use --browser real eval "const editor = document.querySelector('div[contenteditable=\"true\"]'); editor.focus(); document.execCommand('insertText', false, 'YOUR QUERY HERE'); editor.dispatchEvent(new Event('input', {bubbles: true}));"

4) Click Send Button

# Get current state to find button index
./.venv/bin/browser-use --browser real state

# Click the send button (replace INDEX with actual number)
./.venv/bin/browser-use --browser real click INDEX

5) Close Session

./.venv/bin/browser-use close

Troubleshooting

Chrome Remote Debugging Issues

Problem	Cause	Solution
`curl: (7) Failed to connect`	Chrome not running with debugging	Restart Chrome with `--remote-debugging-port=9222`
WebSocket connection refused	Page ID changed	Re-fetch `/json` to get new WebSocket URL
"editor not found"	Page not fully loaded	Wait a few seconds before running script
"button not found"	Send button not visible	Check if text was actually input first
Login page instead of app	Wrong user-data-dir path	Verify path: `"$HOME/Library/Application Support/Google/Chrome"`
`DevTools remote debugging requires a non-default data directory`	Chrome disallows default profile for CDP	Launch with a cloned profile: `/tmp/chrome-gemini-profile`
`curl` shows connection refused even though Chrome is running	CDP not listening due to profile path	Ensure `--user-data-dir` is not default and the port is free
`No Gemini page found via CDP`	Gemini not loaded or not logged in	Open `https://gemini.google.com/` in the launched Chrome and wait for `/app`

browser-use Issues

Problem	Cause	Solution
Not logged in	browser-use creates isolated session	Use Chrome Remote Debugging method instead
`Unknown key: "請"` error	CLI doesn't support Unicode	Use `eval` with JavaScript `execCommand`
Click doesn't work	Element index changed	Re-run `state` before each click

Best Practices

Always use Chrome Remote Debugging for queries requiring authentication
Wait 30+ seconds for complex queries (Gemini's "Deep Think" mode takes longer)
Check for .markdown elements to verify response is complete
Use inline Python for one-off queries; use the full script for automation
Close Chrome debugging session when done to avoid port conflicts
Keep profile cloned in /tmp/chrome-gemini-profile to avoid CDP blocking the default profile

Complete Example: Crypto Price Analysis

完整工作流程

# Step 1: 準備 Chrome 設定檔副本 (避免 CDP 預設目錄限制)
rm -rf /tmp/chrome-gemini-profile
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/

# Step 2: 啟動 Chrome 遠端除錯模式
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="/tmp/chrome-gemini-profile" \
  "https://gemini.google.com/" > /dev/null 2>&1 &

# Step 3: 等待頁面載入並驗證連接
sleep 8
curl -s http://localhost:9222/json | python3 -c "import sys, json; pages = json.load(sys.stdin); gemini = [p for p in pages if p.get('type') == 'page' and 'gemini.google.com' in p.get('url', '')]; print(f\"找到 Gemini 頁面: {gemini[0]['url'] if gemini else '未找到'}\")"

方法 1: 完整查詢腳本 (query_gemini.py)

將以下內容儲存為 query_gemini.py:

import asyncio
import websockets
import json
import subprocess
import sys

async def query_gemini(query_text, wait_seconds=60):
    # Get the Gemini page WebSocket URL
    result = subprocess.run(
        ["curl", "-s", "http://localhost:9222/json"],
        capture_output=True, text=True
    )
    pages = json.loads(result.stdout)

    # Find Gemini page
    gemini_page = None
    for page in pages:
        if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
            gemini_page = page
            break

    if not gemini_page:
        print("錯誤:找不到 Gemini 頁面。請確保 Chrome 已開啟 Gemini。")
        return None

    ws_url = gemini_page["webSocketDebuggerUrl"]
    print(f"正在連接到: {ws_url}")

    async with websockets.connect(ws_url) as ws:
        # Step 1: Input the query
        input_js = f'''
        const editor = document.querySelector('div[contenteditable="true"]');
        if(editor) {{
            editor.focus();
            document.execCommand('insertText', false, `{query_text}`);
            editor.dispatchEvent(new Event('input', {{bubbles: true}}));
            'success';
        }} else {{
            'editor not found';
        }}
        '''

        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": input_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"輸入結果: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")

        # Step 2: Click send button
        await asyncio.sleep(1)
        click_js = '''
        const btn = document.querySelector('button[aria-label="傳送訊息"]');
        if(btn) { btn.click(); 'clicked'; } else { 'button not found'; }
        '''

        await ws.send(json.dumps({
            "id": 2,
            "method": "Runtime.evaluate",
            "params": {"expression": click_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"點擊結果: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")

        # Step 3: Wait for response
        print(f"等待 {wait_seconds} 秒讓 Gemini 回應...")
        await asyncio.sleep(wait_seconds)

        # Step 4: Extract the response - try to get complete content
        extract_js = '''
        const markdownEls = document.querySelectorAll('.markdown');
        if(markdownEls.length > 0) {
            const lastMarkdown = markdownEls[markdownEls.length - 1];
            // Get all text content including nested elements
            lastMarkdown.innerText || lastMarkdown.textContent || 'Empty response';
        } else {
            'No response found';
        }
        '''

        await ws.send(json.dumps({
            "id": 3,
            "method": "Runtime.evaluate",
            "params": {"expression": extract_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        content = result.get('result', {}).get('result', {}).get('value', 'No content')

        return content

# Main execution
if __name__ == "__main__":
    query = """範例問題：請詳細分析 BTC、ETH 的價格預測走勢。
需包含相關專業指標，並用繁體中文回答。"""

    result = asyncio.run(query_gemini(query, wait_seconds=60))
    print("\n" + "="*50)
    print("GEMINI 回應:")
    print("="*50)
    print(result)

執行方式:

python3 query_gemini.py

方法 2: 獲取已存在的回應 (get_gemini_response.py)

如果 Gemini 頁面已經有回應,可以使用此腳本直接提取:

import asyncio
import websockets
import json
import subprocess

async def get_all_gemini_content():
    # Get the Gemini page WebSocket URL
    result = subprocess.run(
        ["curl", "-s", "http://localhost:9222/json"],
        capture_output=True, text=True
    )
    pages = json.loads(result.stdout)

    # Find Gemini page
    gemini_page = None
    for page in pages:
        if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
            gemini_page = page
            break

    if not gemini_page:
        print("錯誤:找不到 Gemini 頁面。")
        return None

    ws_url = gemini_page["webSocketDebuggerUrl"]
    print(f"正在連接到: {ws_url}\n")

    async with websockets.connect(ws_url) as ws:
        # Extract all markdown content from the page
        extract_js = '''
        (function() {
            const markdownEls = document.querySelectorAll('.markdown');
            console.log('Found markdown elements:', markdownEls.length);

            if(markdownEls.length === 0) {
                return 'No markdown elements found';
            }

            // Get the last two markdown elements (user query and AI response)
            const responses = [];
            const startIdx = Math.max(0, markdownEls.length - 2);

            for(let i = startIdx; i < markdownEls.length; i++) {
                const text = markdownEls[i].innerText || markdownEls[i].textContent || '';
                if(text.trim()) {
                    responses.push(`[回應 ${i+1}]:\\n${text}`);
                }
            }

            return responses.join('\\n\\n' + '='.repeat(80) + '\\n\\n');
        })()
        '''

        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": extract_js, "returnByValue": True}
        }))
        response = await ws.recv()
        result = json.loads(response)
        content = result.get('result', {}).get('result', {}).get('value', 'No content')

        return content

# Main execution
if __name__ == "__main__":
    result = asyncio.run(get_all_gemini_content())
    print("="*80)
    print("GEMINI 對話內容:")
    print("="*80)
    print(result)

執行方式:

python3 get_gemini_response.py

實際使用範例

# 完整流程
rm -rf /tmp/chrome-gemini-profile && \
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/ && \
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="/tmp/chrome-gemini-profile" \
  "https://gemini.google.com/" > /dev/null 2>&1 &

# 等待並執行查詢
sleep 8 && python3 query_gemini.py

清理資源

完成查詢後,建議清理臨時文件和資源:

# 1. 關閉 Chrome 除錯會話
pkill -9 "Google Chrome"

# 2. 清理臨時設定檔 (可選,釋放磁碟空間)
rm -rf /tmp/chrome-gemini-profile

# 3. 清理測試過程中生成的臨時腳本和輸出文件
rm -f query_gemini.py get_gemini_response.py get_all_gemini_content.py
rm -f gemini_response.txt gemini_full_response.txt

最佳實踐:

每次使用後關閉 Chrome - 避免佔用 9222 端口
定期清理臨時設定檔 - /tmp/chrome-gemini-profile 可能佔用數百 MB
保持工作目錄整潔 - 刪除測試腳本,將常用腳本整合到專案中
使用完整腳本 - 將上述 query_gemini.py 儲存為專案文件,而非每次重新建立

注意事項

等待時間調整 - 複雜查詢(如深度分析)建議 wait_seconds=60 或更長
回應截斷問題 - 如果回應很長,可能需要多次提取或使用 get_all_gemini_content.py 方法
登入狀態 - 確保 Chrome 設定檔中已登入 Google 帳號
網路穩定性 - CDP 連接需要穩定的網路環境
並發限制 - 避免同時開啟多個 Chrome 除錯會話在同一端口

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

gemini-research-browser-use

# Description

# SKILL.md

Gemini Research Browser Use

Overview

Prerequisites

Method Comparison

✅ Recommended Method: Chrome Remote Debugging (CDP)

Prerequisites

Step 1: Install websockets (if needed)

Step 2: Launch Chrome with Remote Debugging (Non-default profile)

Step 3: Verify Connection (CDP)

Step 4: Send Query to Gemini

Step 5: Run the Query

Alternative Method: browser-use CLI

Prerequisites

Workflow

1) Open Gemini

2) Get Page State

3) Input Text via JavaScript eval

4) Click Send Button

5) Close Session

Troubleshooting

Chrome Remote Debugging Issues

browser-use Issues

Best Practices

Complete Example: Crypto Price Analysis

完整工作流程

方法 1: 完整查詢腳本 (query_gemini.py)

方法 2: 獲取已存在的回應 (get_gemini_response.py)

實際使用範例

清理資源

注意事項

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill