Table of Contents
🔗 Base URLs
| Environment | API Base URL | WebSocket URL |
|---|---|---|
| Local Development | http://localhost:8000/v1 |
ws://localhost:8001/ws |
| Production | https://api.yourdomain.com/v1 |
wss://ws.yourdomain.com/ws |
🔐 Authentication
Note: Authentication is currently optional for development. Production deployments should implement proper authentication.
// Example: Bearer token authentication (when implemented)
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
}
📍 Core Endpoints
Session Management
POST /v1/streaming.new
Create a new avatar session
Request Body
{
"quality": "medium", // "low" | "medium" | "high"
"avatar_id": "avatar_1", // Optional, defaults to "avatar_1"
"voice_id": "EXAVITQu4vr4xnSDxMaL" // Optional 11Labs voice ID
}
Response
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"ice_servers": [
{
"urls": "stun:stun.l.google.com:19302"
}
],
"access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9..."
}
POST /v1/streaming.start
Start avatar streaming for a session
Request Body
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"sdp": {
"type": "offer",
"sdp": "v=0\r\no=- 46117..."
}
}
Response
{
"status": "started",
"sdp": {
"type": "answer",
"sdp": "v=0\r\no=- 46117..."
}
}
POST /v1/streaming.stop
Stop avatar streaming
Request Body
{
"session_id": "550e8400-e29b-41d4-a716-446655440000"
}
Response
{
"status": "stopped",
"duration": 125.4 // Session duration in seconds
}
GET /v1/streaming.sessions
List all active sessions
Response
{
"sessions": [
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "active",
"created_at": "2024-01-22T10:30:00Z",
"duration": 45.2
}
],
"total": 1
}
Conversational AI
POST /v1/chat.completions
Send a message to the AI avatar
Request Body
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"messages": [
{
"role": "user",
"content": "Hello, how are you today?"
}
],
"stream": true, // Enable streaming response
"model": "gpt-4" // Optional, defaults to configured model
}
Response (Streaming)
data: {"choices":[{"delta":{"content":"Hello!"}}]}
data: {"choices":[{"delta":{"content":" I'm"}}]}
data: {"choices":[{"delta":{"content":" doing"}}]}
data: [DONE]
POST /v1/avatar.speak
Make the avatar speak specific text
Request Body
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Welcome to our interactive AI avatar system!",
"voice_id": "EXAVITQu4vr4xnSDxMaL", // Optional
"voice_settings": { // Optional
"stability": 0.75,
"similarity_boost": 0.75
}
}
Response
{
"task_id": "task_123456",
"status": "processing",
"duration_estimate": 2.5 // Estimated speech duration
}
🔄 WebSocket Events
Connection
// Connect to WebSocket
const ws = new WebSocket('ws://localhost:8001/ws/SESSION_ID');
// Connection established
ws.onopen = () => {
console.log('Connected to avatar stream');
};
Client → Server Events
audio_buffer_append
Send audio data to be spoken by the avatar
{
"type": "audio_buffer_append",
"data": {
"audio": "base64_encoded_audio_data"
}
}
audio_buffer_commit
Commit the audio buffer and start playback
{
"type": "audio_buffer_commit"
}
audio_buffer_clear
Clear the audio buffer
{
"type": "audio_buffer_clear"
}
start_listening
Start listening for user speech
{
"type": "start_listening"
}
stop_listening
Stop listening for user speech
{
"type": "stop_listening"
}
interrupt
Interrupt current avatar speech
{
"type": "interrupt"
}
Server → Client Events
frame
Video frame data
{
"type": "frame",
"data": "base64_encoded_jpeg",
"frame_id": 12345,
"timestamp": 1705920123.456
}
audio
Audio data to be played
{
"type": "audio",
"data": "base64_encoded_wav",
"duration": 2.5,
"sample_rate": 24000
}
chat
Chat messages (user/agent)
{
"type": "chat",
"role": "agent",
"content": "Hello! How can I help you today?"
}
status
Status updates
{
"type": "status",
"status": "listening" | "speaking" | "idle",
"message": "Avatar is now listening..."
}
error
Error messages
{
"type": "error",
"code": "AUDIO_PROCESSING_ERROR",
"message": "Failed to process audio input"
}
🛠️ Utility Endpoints
GET /health
Health check endpoint
Response
{
"status": "healthy",
"version": "1.0.0",
"uptime": 3600,
"services": {
"api": "healthy",
"websocket": "healthy",
"redis": "healthy",
"gpu": "available"
}
}
GET /metrics
System metrics
Response
{
"active_sessions": 5,
"total_sessions": 142,
"cpu_usage": 45.2,
"memory_usage": 62.8,
"gpu_usage": 38.5,
"websocket_connections": 5,
"redis_memory": "245MB",
"average_latency": 125 // milliseconds
}
❌ Error Responses
| Status Code | Error Type | Description |
|---|---|---|
| 400 | Bad Request | Invalid request parameters |
| 401 | Unauthorized | Missing or invalid authentication |
| 404 | Not Found | Session or resource not found |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Server error, check logs |
| 503 | Service Unavailable | Service temporarily unavailable |
Error Response Format
{
"error": {
"code": "SESSION_NOT_FOUND",
"message": "The specified session does not exist",
"details": {
"session_id": "550e8400-e29b-41d4-a716-446655440000"
}
}
}
💻 SDK Examples
JavaScript/TypeScript
import { AvatarClient } from '@avatar/sdk';
// Initialize client
const client = new AvatarClient({
apiUrl: 'http://localhost:8000/v1',
wsUrl: 'ws://localhost:8001/ws'
});
// Create session
const session = await client.createSession({
quality: 'high',
voice_id: 'EXAVITQu4vr4xnSDxMaL'
});
// Connect to WebSocket
await session.connect();
// Handle events
session.on('frame', (frameData) => {
// Display video frame
videoElement.src = `data:image/jpeg;base64,${frameData}`;
});
session.on('audio', (audioData) => {
// Play audio
const audioBlob = base64ToBlob(audioData);
audioElement.src = URL.createObjectURL(audioBlob);
audioElement.play();
});
// Send chat message
await session.chat('Hello, how are you?');
// Make avatar speak
await session.speak('Welcome to our demo!');
// Clean up
await session.disconnect();
Python
import asyncio
from avatar_sdk import AvatarClient
async def main():
# Initialize client
client = AvatarClient(
api_url="http://localhost:8000/v1",
ws_url="ws://localhost:8001/ws"
)
# Create session
session = await client.create_session(
quality="high",
voice_id="EXAVITQu4vr4xnSDxMaL"
)
# Connect to WebSocket
await session.connect()
# Handle events
@session.on("frame")
async def on_frame(frame_data):
# Process video frame
pass
@session.on("audio")
async def on_audio(audio_data):
# Process audio
pass
# Send chat message
response = await session.chat("Hello, how are you?")
print(f"Avatar: {response}")
# Make avatar speak
await session.speak("Welcome to our demo!")
# Keep running
await session.run_forever()
if __name__ == "__main__":
asyncio.run(main())
cURL Examples
# Create session
curl -X POST http://localhost:8000/v1/streaming.new \
-H "Content-Type: application/json" \
-d '{
"quality": "medium",
"avatar_id": "avatar_1"
}'
# Get session list
curl http://localhost:8000/v1/streaming.sessions
# Make avatar speak
curl -X POST http://localhost:8000/v1/avatar.speak \
-H "Content-Type: application/json" \
-d '{
"session_id": "YOUR_SESSION_ID",
"text": "Hello from cURL!"
}'
# Health check
curl http://localhost:8000/health
✨ Best Practices
1. Session Management
- Always clean up sessions when done
- Implement reconnection logic for WebSocket
- Handle session timeouts gracefully
2. Performance
- Use appropriate quality settings for your use case
- Implement frame dropping for slow connections
- Buffer audio data before committing
3. Error Handling
- Implement exponential backoff for retries
- Log all errors with context
- Provide user-friendly error messages
4. Security
- Use HTTPS/WSS in production
- Implement rate limiting
- Validate all input data
- Use authentication tokens
Rate Limits
| Endpoint | Rate Limit | Window |
|---|---|---|
| Session Creation | 10 requests | per minute |
| Avatar Speak | 30 requests | per minute |
| Chat Messages | 60 requests | per minute |
| WebSocket Messages | 100 messages | per second |