phone-agent

by @kesslerio in AI & LLM

# Install this skill:

npx skills add kesslerio/phone-agent-moltbot-skill

Or install specific skill: npx add-skill https://github.com/kesslerio/phone-agent-moltbot-skill

# Description

Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot.

# SKILL.md

name: phone-agent
description: "Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot."

Phone Agent Skill

Runs a local FastAPI server that acts as a real-time voice bridge.

Architecture

Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
                                                  |
                                                  +--> OpenAI (LLM)
                                                  +--> ElevenLabs (TTS)

Prerequisites

Twilio Account: Phone number + TwiML App.
Deepgram API Key: For fast speech-to-text.
OpenAI API Key: For the conversation logic.
ElevenLabs API Key: For realistic text-to-speech.
Ngrok (or similar): To expose your local port 8080 to Twilio.

Setup

Install Dependencies:
bash pip install -r scripts/requirements.txt
Set Environment Variables (in ~/.moltbot/.env, ~/.clawdbot/.env, or export):
bash export DEEPGRAM_API_KEY="your_key" export OPENAI_API_KEY="your_key" export ELEVENLABS_API_KEY="your_key" export TWILIO_ACCOUNT_SID="your_sid" export TWILIO_AUTH_TOKEN="your_token" export PORT=8080

Optional - System Prompt Customization (priority: file > env var > built-in):
```bash

Option 1: Load from file

export SYSTEM_PROMPT_FILE="/path/to/custom-prompt.txt"

Option 2: Set directly via env var

export SYSTEM_PROMPT="You are a helpful phone assistant. Be concise and friendly."

Option 3: Use built-in defaults with name customization

export AGENT_NAME="Niemand"
export OWNER_NAME="Martin's"
```
Start the Server:
bash python3 scripts/server.py
Expose to Internet:
bash ngrok http 8080
Configure Twilio:
- Go to your Phone Number settings.
- Set "Voice & Fax" -> "A Call Comes In" to Webhook.
- URL: https://<your-ngrok-url>.ngrok.io/incoming
- Method: POST

Usage

Call your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.

Customization

System Prompt: Configure via SYSTEM_PROMPT_FILE (load from file), SYSTEM_PROMPT (env var), or modify the built-in defaults with AGENT_NAME and OWNER_NAME.
Voice: Change ELEVENLABS_VOICE_ID to use different voices.
Model: Switch gpt-4o-mini to gpt-4 for smarter (but slower) responses.
Language: Set AGENT_LANGUAGE to en or de for English or German.

# README.md

Phone Agent Moltbot Skill

A real-time AI voice agent that handles incoming phone calls using Twilio, transcribes speech with Deepgram, generates responses via OpenAI, and speaks back with ElevenLabs text-to-speech.

Features

Real-time Voice Processing: Handles incoming Twilio calls with low-latency WebSocket audio
Automatic Speech Recognition: Deepgram for fast, accurate transcription
AI-Powered Responses: OpenAI GPT for intelligent conversation
Natural Speech Output: ElevenLabs for realistic, streaming TTS
Task-Based Automation: Configurable task definitions for specific agent behaviors
Recording & Logging: Automatic call recording and conversation logs

Architecture

Incoming Call (Twilio Phone)
         |
         v
  Twilio WebSocket (Audio Stream)
         |
         +---> Local FastAPI Server
         |           |
         |           +---> Deepgram (Speech-to-Text)
         |           |
         |           +---> OpenAI (LLM/Intelligence)
         |           |
         |           +---> ElevenLabs (Text-to-Speech)
         |           |
         +---------- (Audio Response)
         |
    Phone Speaker Output

Prerequisites

Before you begin, ensure you have:

Twilio Account
Active Twilio account with a phone number
TwiML App configured
Account SID and Auth Token
API Keys (free tier available for all)
Deepgram API Key (https://console.deepgram.com/)
OpenAI API Key (https://platform.openai.com/api-keys)
ElevenLabs API Key (https://elevenlabs.io/)
Local Network Access
Ngrok or similar tool to expose localhost to the internet
Ability to accept incoming webhooks from Twilio
Python 3.9+ and pip

Installation

# Clone the repository
git clone https://github.com/kesslerio/phone-agent-moltbot-skill.git
cd phone-agent-moltbot-skill

# Install dependencies
pip install -r scripts/requirements.txt

Configuration

Set Environment Variables

Create a .env file or set environment variables:

# API Keys (required)
export DEEPGRAM_API_KEY="your-deepgram-key"
export OPENAI_API_KEY="your-openai-key"
export ELEVENLABS_API_KEY="your-elevenlabs-key"

# Twilio (required)
export TWILIO_ACCOUNT_SID="your-account-sid"
export TWILIO_AUTH_TOKEN="your-auth-token"
export TWILIO_PHONE_NUMBER="+18665515246"  # Your Twilio number

# Server (optional)
export PORT=8080
export PUBLIC_URL="https://your-ngrok-url.ngrok.io"  # For webhooks

# Voice Customization (optional)
export ELEVENLABS_VOICE_ID="onwK4e9ZLuTAKqWW03F9"  # Daniel voice

# System Prompt Configuration (optional)
export SYSTEM_PROMPT_FILE="/path/to/custom-prompt.txt"  # Load prompt from file
export SYSTEM_PROMPT_FILE_REQUIRED="true"  # Exit if file missing (default: false)
export SYSTEM_PROMPT="Custom prompt text here"  # Override built-in prompt

Template Variables: When using SYSTEM_PROMPT_FILE, you can include these placeholders:
- {agent_name} - Replaced with AGENT_NAME env var (default: "Assistant")
- {owner_name} - Replaced with OWNER_NAME env var (default: "your")
- {language} - Replaced with AGENT_LANGUAGE env var (default: "en")

Example custom prompt file:

You are {agent_name}, {owner_name} personal assistant.
Speak in {language} with precision and clarity.

Or add to ~/.moltbot/.env or ~/.clawdbot/.env:

DEEPGRAM_API_KEY=your-key
OPENAI_API_KEY=your-key
ELEVENLABS_API_KEY=your-key
TWILIO_ACCOUNT_SID=your-sid
TWILIO_AUTH_TOKEN=your-token
TWILIO_PHONE_NUMBER=+1...

Startup & Configuration

1. Start the Local Server

python3 scripts/server.py

The server will start on http://localhost:8080 by default.

2. Expose to Internet with Ngrok

In another terminal:

ngrok http 8080

Note the HTTPS URL (e.g., https://abc123.ngrok.io)

3. Configure Twilio Webhook

In Twilio Console:

Go to Phone Numbers → Your number
Under Voice & Fax:
Set "A Call Comes In" to Webhook
URL: https://<your-ngrok-url>.ngrok.io/incoming
Method: POST
Save

4. Test Incoming Calls

Call your Twilio number. The agent will:
1. Answer and greet you
2. Listen to your speech
3. Transcribe your words
4. Generate a response via OpenAI
5. Speak the response back to you

Customization

Change Agent Persona

Edit SYSTEM_PROMPT in scripts/server.py:

SYSTEM_PROMPT = """You are a helpful customer service agent. Be friendly, concise, and professional."""

Change Voice

Set a different ElevenLabs voice ID:

export ELEVENLABS_VOICE_ID="g1r0eKKcGkk7Ep0RVcVn"  # Callum voice

Available ElevenLabs voices: https://elevenlabs.io/docs/getting-started/voices

Use Different Model

Edit scripts/server.py and change the OpenAI model:

response = await client.chat.completions.create(
    model="gpt-4",  # or "gpt-4-turbo" for faster responses
    messages=messages,
)

Task-Based Behaviors

Create YAML task definitions in the tasks/ directory:

name: book_restaurant
description: "Help the user book a restaurant reservation"
system_prompt: "You are a friendly restaurant reservation assistant..."
actions:
  - confirm_date
  - confirm_time
  - confirm_party_size
  - book_reservation

Integration with Moltbot

Add this skill to your Moltbot configuration:

{
  "skills": [
    {
      "name": "phone-agent",
      "path": "/path/to/phone-agent-moltbot-skill",
      "enabled": true
    }
  ]
}

Then reference it in workflows:
- "Set up an incoming voice agent"
- "Configure a customer service chatbot"
- "Test voice AI capabilities"

Project Structure

phone-agent-moltbot-skill/
├── scripts/
│   ├── server.py              # Main FastAPI server
│   ├── server_realtime.py     # Realtime processing variant
│   ├── requirements.txt       # Python dependencies
│   └── typing_sound.raw       # Typing sound effect
├── tasks/
│   ├── book_restaurant.yaml   # Example task definitions
│   └── get_quote.yaml         # Example task definitions
├── calls/                     # Recording storage directory
├── references/                # Supporting documentation
├── SKILL.md                   # Moltbot skill manifest
├── README.md                  # This file
└── LICENSE                    # MIT License

Troubleshooting

Server Won't Start

Check Python version: python3 --version (requires 3.9+)
Install dependencies: pip install -r scripts/requirements.txt
Check PORT variable: echo $PORT (should be 8080 or set value)

Twilio Webhook Not Connecting

Verify ngrok is running and the URL matches your Twilio webhook
Check server logs: python3 scripts/server.py (should show incoming requests)
Test ngrok tunnel: curl https://<your-ngrok-url>.ngrok.io/health

Poor Transcription Quality

Ensure DEEPGRAM_API_KEY is valid
Check microphone/audio quality on the calling phone
Deepgram is very accurate; poor results indicate audio issues

Slow Responses

OpenAI API latency varies; gpt-4o-mini is fast and cheap
Switch to "gpt-3.5-turbo" for faster responses (less capable)
Increase timeout in websocket settings if needed

Voice Not Speaking

Verify ELEVENLABS_API_KEY is valid
Check voice ID is correct: https://elevenlabs.io/docs/api-reference/voices
Confirm audio is not muted on the receiving phone

API Reference

Incoming Call Webhook

POST /incoming

Twilio sends call information to this endpoint. The server responds with TwiML to establish WebSocket connection.

WebSocket Audio Stream

WS /ws

Bidirectional audio stream for incoming call processing.

Health Check

GET /health

Returns {"status": "ok"} if the server is running.

Performance & Scaling

Current implementation handles:
- Single concurrent call per server instance
- ~100ms RTT for transcription + LLM + TTS
- Suitable for demo/testing, hobby projects, and low-volume use

For production:
- Run multiple server instances behind a load balancer
- Use Twilio's call queuing
- Implement connection pooling for API clients
- Consider dedicated hardware for Deepgram/ElevenLabs processing

Deployment Options

Local Development

python3 scripts/server.py
ngrok http 8080

Docker

FROM python:3.11-slim
WORKDIR /app
COPY scripts/requirements.txt .
RUN pip install -r requirements.txt
COPY scripts/ .
CMD ["python3", "server.py"]

Build and run:

docker build -t phone-agent .
docker run -p 8080:8080 \
  -e DEEPGRAM_API_KEY="..." \
  -e OPENAI_API_KEY="..." \
  -e ELEVENLABS_API_KEY="..." \
  -e TWILIO_ACCOUNT_SID="..." \
  -e TWILIO_AUTH_TOKEN="..." \
  phone-agent

Cloud Deployment

Heroku: Add Procfile → web: python3 scripts/server.py
Railway.app: Auto-detects Python and builds
AWS Lambda: Use WebSocket API Gateway + Lambda
Google Cloud Run: Containerize and deploy

License

MIT

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Test thoroughly
Submit a pull request

Support

MCP Server: Deepgram | OpenAI | ElevenLabs
Twilio Docs: Voice API
Moltbot: Documentation

Requirements

ffmpeg must be in PATH (for converting ElevenLabs MP3 to Twilio mu-law audio)
If running as a systemd service, ensure PATH includes ffmpeg location:
ini Environment=PATH=/home/art/.nix-profile/bin:/usr/bin:/bin

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

phone-agent

# Description

# SKILL.md

Phone Agent Skill

Architecture

Prerequisites

Setup

Option 1: Load from file

Option 2: Set directly via env var

Option 3: Use built-in defaults with name customization

Usage

Customization

# README.md

Phone Agent Moltbot Skill

Features

Architecture

Prerequisites

Installation

Configuration

Set Environment Variables

Startup & Configuration

1. Start the Local Server

2. Expose to Internet with Ngrok

3. Configure Twilio Webhook

4. Test Incoming Calls

Customization

Change Agent Persona

Change Voice

Use Different Model

Task-Based Behaviors

Integration with Moltbot

Project Structure

Troubleshooting

Server Won't Start

Twilio Webhook Not Connecting

Poor Transcription Quality

Slow Responses

Voice Not Speaking

API Reference

Incoming Call Webhook

WebSocket Audio Stream

Health Check

Performance & Scaling

Deployment Options

Local Development

Docker

Cloud Deployment

License

Contributing

Support

Requirements

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill