Local speech-to-text with the Whisper CLI (no API key).
Local speech-to-text with the Whisper CLI (no API key).
Generate and transcribe speech using Google's Gemini-TTS and Chirp 3 models. Supports Text-to-Speech (Single/Multi-speaker), Instant Custom Voice, and Speech-to-Text (Transcription/Diarization).
Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for...
Alibaba Cloud Text-to-Speech synthesis service.
ElevenLabs text-to-speech with mac-style say UX.
ElevenLabs text-to-speech with mac-style say UX.
ElevenLabs text-to-speech with mac-style say UX.
Read news from images using OCR and broadcast via TTS. Use when user sends news screenshots or images with text that needs to be read aloud. Requires: tesseract (OCR) and edge-tts (TTS).
Stream high-fidelity music on TIDAL, manage playlists, and access exclusive content
Expert in aggregating, processing, and synthesizing information from multiple sources into coherent insights. Use when building knowledge graphs, ontologies, RAG systems, or extracting insights...
Record high-quality podcasts and interviews with Riverside - manage recordings, transcripts, and exports
Best practices for scikit-learn machine learning, model development, evaluation, and deployment in Python
Master of Voice-First Interfaces, specialized in sub-300ms Latency, Spatial Hearing AI, and Multimodal Voice-Haptic feedback.
Expert in AR/VR, WebXR, and spatial computing for Vision Pro and web
>
Senior Architect for @google/genai v1.35.0+. Specialist in Structured Intelligence, Context Caching, and Agentic Orchestration in 2026.
>
Develop examples for AI SDK functions. Use when creating, running, or modifying examples under examples/ai-functions/src to validate provider support, demonstrate features, or create test fixtures.
|