AvaTar - Getting Started Guide

Overview

AvaTar is a real-time interactive AI avatar system that provides HeyGen-like functionality for creating conversational AI avatars with lip-sync, voice synthesis, and streaming video capabilities.

🚀 Quick Start

Prerequisites

Docker and Docker Compose
8GB+ RAM recommended
NVIDIA GPU (optional, for advanced features)
Valid API keys for:
- 11Labs (for voice synthesis)
- OpenAI/Anthropic (for conversational AI)

Local Development Setup

# Clone the repository
git clone https://github.com/yourusername/AvaTar.git
cd AvaTar

# Copy environment template
cp env.example .env

# Edit .env and add your API keys
nano .env

# Start the system
docker-compose -f docker-compose-simple.yml up -d

# Check health
curl http://localhost:8000/health

Access the System

Interactive Demo: http://localhost:8080/conversational-avatar.html
API Documentation: http://localhost:8000/docs
WebSocket Endpoint: ws://localhost:8001/ws/{session_id}

🏗️ Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Web Client    │────▶│   FastAPI       │────▶│  Redis Queue    │
│  (JavaScript)   │◀────│   Backend       │◀────│  & Pub/Sub      │
└─────────────────┘     └─────────────────┘     └─────────────────┘
         │                       │                         │
         │                       ▼                         ▼
         │              ┌─────────────────┐     ┌─────────────────┐
         └─────────────▶│  WebSocket      │     │  Linly-Talker   │
                        │   Server        │     │  (Lip Sync)     │
                        └─────────────────┘     └─────────────────┘

🎯 Key Features

Real-time Streaming

WebSocket-based video and audio streaming

Voice Synthesis

11Labs integration for natural voice generation

Lip Sync

Automated lip synchronization with speech

Conversational AI

Integration with OpenAI/Anthropic

Scalable Architecture

Microservices design with Redis queue

HeyGen Compatibility

Compatible with HeyGen Avatar API

🔧 Configuration

Environment Variables

# API Keys
ELEVENLABS_API_KEY=your_elevenlabs_key
OPENAI_API_KEY=your_openai_key

# Service URLs
REDIS_URL=redis://redis:6379
API_BASE_URL=http://localhost:8000

# Performance
FRAME_BUFFER_SIZE=100
MAX_CONCURRENT_SESSIONS=10

📊 Performance & Scaling

Recommended EC2 Instance Types

Development: t3.large (2 vCPU, 8GB RAM)
Production: g4dn.xlarge (4 vCPU, 16GB RAM, GPU)
High Traffic: g4dn.2xlarge (8 vCPU, 32GB RAM, GPU)

Capacity Planning

Each session requires ~100MB RAM
GPU recommended for 5+ concurrent sessions
Network bandwidth: ~2Mbps per active session

🛠️ Troubleshooting

Common Issues

1. Black screen / No video

Check WebSocket connection
Verify Redis is running
Check browser console for errors

2. No audio playback

Verify 11Labs API key
Check browser audio permissions
Test with demo audio endpoint

3. High latency

Check Redis performance
Monitor CPU/RAM usage
Consider upgrading instance type

📈 Monitoring

Health endpoint: GET /health
Metrics endpoint: GET /metrics
WebSocket status: GET /v1/streaming.sessions

🎭 AvaTar - Getting Started Guide