erichowens

llm-streaming-response-handler

20
3
# Install this skill:
npx skills add erichowens/some_claude_skills --skill "llm-streaming-response-handler"

Install specific skill from multi-skill repository

# Description

Build production LLM streaming UIs with Server-Sent Events, real-time token display, cancellation, error recovery. Handles OpenAI/Anthropic/Claude streaming APIs. Use for chatbots, AI assistants, real-time text generation. Activate on "LLM streaming", "SSE", "token stream", "chat UI", "real-time AI". NOT for batch processing, non-streaming APIs, or WebSocket bidirectional chat.

# SKILL.md


name: llm-streaming-response-handler
description: Build production LLM streaming UIs with Server-Sent Events, real-time token display, cancellation, error recovery. Handles OpenAI/Anthropic/Claude streaming APIs. Use for chatbots, AI assistants, real-time text generation. Activate on "LLM streaming", "SSE", "token stream", "chat UI", "real-time AI". NOT for batch processing, non-streaming APIs, or WebSocket bidirectional chat.
allowed-tools: Read,Write,Edit,Bash(npm:*)


LLM Streaming Response Handler

Expert in building production-grade streaming interfaces for LLM responses that feel instant and responsive.

When to Use

βœ… Use for:
- Chat interfaces with typing animation
- Real-time AI assistants
- Code generation with live preview
- Document summarization with progressive display
- Any UI where users expect immediate feedback from LLMs

❌ NOT for:
- Batch document processing (no user watching)
- APIs that don't support streaming
- WebSocket-based bidirectional chat (use Socket.IO)
- Simple request/response (fetch is fine)

Quick Decision Tree

Does your LLM interaction:
β”œβ”€β”€ Need immediate visual feedback? β†’ Streaming
β”œβ”€β”€ Display long-form content (>100 words)? β†’ Streaming
β”œβ”€β”€ User expects typewriter effect? β†’ Streaming
β”œβ”€β”€ Short response (<50 words)? β†’ Regular fetch
└── Background processing? β†’ Regular fetch

Technology Selection

Why SSE over WebSockets for LLM streaming:
- Simplicity: HTTP-based, works with existing infrastructure
- Auto-reconnect: Built-in reconnection logic
- Firewall-friendly: Easier than WebSockets through proxies
- One-way perfect: LLMs only stream server β†’ client

Timeline:
- 2015-2020: WebSockets for everything
- 2020: SSE adoption for streaming APIs
- 2023+: SSE standard for LLM streaming (OpenAI, Anthropic)
- 2024: Vercel AI SDK popularizes SSE patterns

Streaming APIs

Provider Streaming Method Response Format
OpenAI SSE data: {"choices":[{"delta":{"content":"token"}}]}
Anthropic SSE data: {"type":"content_block_delta","delta":{"text":"token"}}
Claude (API) SSE data: {"delta":{"text":"token"}}
Vercel AI SDK SSE Normalized across providers

Common Anti-Patterns

Anti-Pattern 1: Buffering Before Display

Novice thinking: "Collect all tokens, then show complete response"

Problem: Defeats the entire purpose of streaming.

Wrong approach:

// ❌ Waits for entire response before showing anything
const response = await fetch('/api/chat', { method: 'POST', body: prompt });
const fullText = await response.text();
setMessage(fullText); // User sees nothing until done

Correct approach:

// βœ… Display tokens as they arrive
const response = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify({ prompt })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.trim());

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      setMessage(prev => prev + data.content); // Update immediately
    }
  }
}

Timeline:
- Pre-2023: Many apps buffered entire response
- 2023+: Token-by-token display expected


Anti-Pattern 2: No Stream Cancellation

Problem: User can't stop generation, wasting tokens and money.

Symptom: "Stop" button doesn't work or doesn't exist.

Correct approach:

// βœ… AbortController for cancellation
const [abortController, setAbortController] = useState<AbortController | null>(null);

const streamResponse = async () => {
  const controller = new AbortController();
  setAbortController(controller);

  try {
    const response = await fetch('/api/chat', {
      signal: controller.signal,
      method: 'POST',
      body: JSON.stringify({ prompt })
    });

    // Stream handling...
  } catch (error) {
    if (error.name === 'AbortError') {
      console.log('Stream cancelled by user');
    }
  } finally {
    setAbortController(null);
  }
};

const cancelStream = () => {
  abortController?.abort();
};

return (
  <button onClick={cancelStream} disabled={!abortController}>
    Stop Generating
  </button>
);

Anti-Pattern 3: No Error Recovery

Problem: Stream fails mid-response, user sees partial text with no indication of failure.

Correct approach:

// βœ… Error states and recovery
const [streamState, setStreamState] = useState<'idle' | 'streaming' | 'error' | 'complete'>('idle');
const [errorMessage, setErrorMessage] = useState<string | null>(null);

try {
  setStreamState('streaming');

  // Streaming logic...

  setStreamState('complete');
} catch (error) {
  setStreamState('error');

  if (error.name === 'AbortError') {
    setErrorMessage('Generation stopped');
  } else if (error.message.includes('429')) {
    setErrorMessage('Rate limit exceeded. Try again in a moment.');
  } else {
    setErrorMessage('Something went wrong. Please retry.');
  }
}

// UI feedback
{streamState === 'error' && (
  <div className="error-banner">
    {errorMessage}
    <button onClick={retryStream}>Retry</button>
  </div>
)}

Anti-Pattern 4: Memory Leaks from Unclosed Streams

Problem: Streams not cleaned up, causing memory leaks.

Symptom: Browser slows down after multiple requests.

Correct approach:

// βœ… Cleanup with useEffect
useEffect(() => {
  let reader: ReadableStreamDefaultReader | null = null;

  const streamResponse = async () => {
    const response = await fetch('/api/chat', { ... });
    reader = response.body.getReader();

    // Streaming...
  };

  streamResponse();

  // Cleanup on unmount
  return () => {
    reader?.cancel();
  };
}, [prompt]);

Anti-Pattern 5: No Typing Indicator Between Tokens

Problem: UI feels frozen between slow tokens.

Correct approach:

// βœ… Animated cursor during generation
<div className="message">
  {content}
  {isStreaming && <span className="typing-cursor">β–Š</span>}
</div>
.typing-cursor {
  animation: blink 1s step-end infinite;
}

@keyframes blink {
  50% { opacity: 0; }
}

Implementation Patterns

Pattern 1: Basic SSE Stream Handler

async function* streamCompletion(prompt: string) {
  const response = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt })
  });

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6));

        if (data.content) {
          yield data.content;
        }

        if (data.done) {
          return;
        }
      }
    }
  }
}

// Usage
for await (const token of streamCompletion('Hello')) {
  console.log(token);
}

Pattern 2: React Hook for Streaming

import { useState, useCallback } from 'react';

interface UseStreamingOptions {
  onToken?: (token: string) => void;
  onComplete?: (fullText: string) => void;
  onError?: (error: Error) => void;
}

export function useStreaming(options: UseStreamingOptions = {}) {
  const [content, setContent] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [error, setError] = useState<Error | null>(null);
  const [abortController, setAbortController] = useState<AbortController | null>(null);

  const stream = useCallback(async (prompt: string) => {
    const controller = new AbortController();
    setAbortController(controller);
    setIsStreaming(true);
    setError(null);
    setContent('');

    try {
      const response = await fetch('/api/chat', {
        method: 'POST',
        signal: controller.signal,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ prompt })
      });

      const reader = response.body!.getReader();
      const decoder = new TextDecoder();

      let accumulated = '';

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n').filter(line => line.trim());

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = JSON.parse(line.slice(6));

            if (data.content) {
              accumulated += data.content;
              setContent(accumulated);
              options.onToken?.(data.content);
            }
          }
        }
      }

      options.onComplete?.(accumulated);
    } catch (err) {
      if (err.name !== 'AbortError') {
        setError(err as Error);
        options.onError?.(err as Error);
      }
    } finally {
      setIsStreaming(false);
      setAbortController(null);
    }
  }, [options]);

  const cancel = useCallback(() => {
    abortController?.abort();
  }, [abortController]);

  return { content, isStreaming, error, stream, cancel };
}

// Usage in component
function ChatInterface() {
  const { content, isStreaming, stream, cancel } = useStreaming({
    onToken: (token) => console.log('New token:', token),
    onComplete: (text) => console.log('Done:', text)
  });

  return (
    <div>
      <div className="message">
        {content}
        {isStreaming && <span className="cursor">β–Š</span>}
      </div>

      <button onClick={() => stream('Tell me a story')} disabled={isStreaming}>
        Generate
      </button>

      {isStreaming && <button onClick={cancel}>Stop</button>}
    </div>
  );
}

Pattern 3: Server-Side Streaming (Next.js)

// app/api/chat/route.ts
import { OpenAI } from 'openai';

export const runtime = 'edge'; // Required for streaming

export async function POST(req: Request) {
  const { prompt } = await req.json();

  const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY
  });

  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: prompt }],
    stream: true
  });

  // Convert OpenAI stream to SSE format
  const encoder = new TextEncoder();

  const readable = new ReadableStream({
    async start(controller) {
      try {
        for await (const chunk of stream) {
          const content = chunk.choices[0]?.delta?.content;

          if (content) {
            const sseMessage = `data: ${JSON.stringify({ content })}\n\n`;
            controller.enqueue(encoder.encode(sseMessage));
          }
        }

        // Send completion signal
        controller.enqueue(encoder.encode('data: {"done":true}\n\n'));
        controller.close();
      } catch (error) {
        controller.error(error);
      }
    }
  });

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive'
    }
  });
}

Production Checklist

β–‘ AbortController for cancellation
β–‘ Error states with retry capability
β–‘ Typing indicator during generation
β–‘ Cleanup on component unmount
β–‘ Rate limiting on API route
β–‘ Token usage tracking
β–‘ Streaming fallback (if API fails)
β–‘ Accessibility (screen reader announces updates)
β–‘ Mobile-friendly (touch targets for stop button)
β–‘ Network error recovery (auto-retry on disconnect)
β–‘ Max response length enforcement
β–‘ Cost estimation before generation

When to Use vs Avoid

Scenario Use Streaming?
Chat interface βœ… Yes
Long-form content generation βœ… Yes
Code generation with preview βœ… Yes
Short completions (<50 words) ❌ No - regular fetch
Background jobs ❌ No - use job queue
Bidirectional chat ⚠️ Use WebSockets instead

Technology Comparison

Feature SSE WebSockets Long Polling
Complexity Low Medium High
Auto-reconnect βœ… ❌ ❌
Bidirectional ❌ βœ… ❌
Firewall-friendly βœ… ⚠️ βœ…
Browser support βœ… All modern βœ… All modern βœ… Universal
LLM API support βœ… Standard ❌ Rare ❌ Not used

References

  • /references/sse-protocol.md - Server-Sent Events specification details
  • /references/vercel-ai-sdk.md - Vercel AI SDK integration patterns
  • /references/error-recovery.md - Stream error handling strategies

Scripts

  • scripts/stream_tester.ts - Test SSE endpoints locally
  • scripts/token_counter.ts - Estimate costs before generation

This skill guides: LLM streaming implementation | SSE protocol | Real-time UI updates | Cancellation | Error recovery | Token-by-token display

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.