Overview
AvaTar is a real-time interactive AI avatar system that provides HeyGen-like functionality for creating conversational AI avatars with lip-sync, voice synthesis, and streaming video capabilities.
π Quick Start
Prerequisites
- Docker and Docker Compose
- 8GB+ RAM recommended
- NVIDIA GPU (optional, for advanced features)
- Valid API keys for:
- 11Labs (for voice synthesis)
- OpenAI/Anthropic (for conversational AI)
Local Development Setup
# Clone the repository
git clone https://github.com/yourusername/AvaTar.git
cd AvaTar
# Copy environment template
cp env.example .env
# Edit .env and add your API keys
nano .env
# Start the system
docker-compose -f docker-compose-simple.yml up -d
# Check health
curl http://localhost:8000/health
Access the System
- Interactive Demo: http://localhost:8080/conversational-avatar.html
- API Documentation: http://localhost:8000/docs
- WebSocket Endpoint: ws://localhost:8001/ws/{session_id}
ποΈ Architecture
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Web Client ββββββΆβ FastAPI ββββββΆβ Redis Queue β
β (JavaScript) βββββββ Backend βββββββ & Pub/Sub β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
β βΌ βΌ
β βββββββββββββββββββ βββββββββββββββββββ
βββββββββββββββΆβ WebSocket β β Linly-Talker β
β Server β β (Lip Sync) β
βββββββββββββββββββ βββββββββββββββββββ
π― Key Features
Real-time Streaming
WebSocket-based video and audio streaming
Voice Synthesis
11Labs integration for natural voice generation
Lip Sync
Automated lip synchronization with speech
Conversational AI
Integration with OpenAI/Anthropic
Scalable Architecture
Microservices design with Redis queue
HeyGen Compatibility
Compatible with HeyGen Avatar API
π§ Configuration
Environment Variables
# API Keys
ELEVENLABS_API_KEY=your_elevenlabs_key
OPENAI_API_KEY=your_openai_key
# Service URLs
REDIS_URL=redis://redis:6379
API_BASE_URL=http://localhost:8000
# Performance
FRAME_BUFFER_SIZE=100
MAX_CONCURRENT_SESSIONS=10
π Performance & Scaling
Recommended EC2 Instance Types
- Development: t3.large (2 vCPU, 8GB RAM)
- Production: g4dn.xlarge (4 vCPU, 16GB RAM, GPU)
- High Traffic: g4dn.2xlarge (8 vCPU, 32GB RAM, GPU)
Capacity Planning
- Each session requires ~100MB RAM
- GPU recommended for 5+ concurrent sessions
- Network bandwidth: ~2Mbps per active session
π οΈ Troubleshooting
Common Issues
1. Black screen / No video
- Check WebSocket connection
- Verify Redis is running
- Check browser console for errors
2. No audio playback
- Verify 11Labs API key
- Check browser audio permissions
- Test with demo audio endpoint
3. High latency
- Check Redis performance
- Monitor CPU/RAM usage
- Consider upgrading instance type
π Monitoring
- Health endpoint:
GET /health - Metrics endpoint:
GET /metrics - WebSocket status:
GET /v1/streaming.sessions