File size: 6,941 Bytes
# Chatterbox-TTS Apple Silicon Adaptation Guide

## Overview
This document summarizes the key adaptations made to run Chatterbox-TTS successfully on Apple Silicon (M1/M2/M3) MacBooks with MPS GPU acceleration. The original Chatterbox-TTS models were trained on CUDA devices, requiring specific device mapping strategies for Apple Silicon compatibility.

## ✅ Confirmed Working Status
- **App Status**: ✅ Running successfully on port 7861
- **Device**: MPS (Apple Silicon GPU) 
- **Model Loading**: ✅ All components loaded successfully
- **Performance**: Optimized with text chunking for longer inputs

## Key Technical Challenges & Solutions

### 1. CUDA → MPS Device Mapping
**Problem**: Chatterbox-TTS models were saved with CUDA device references, causing loading failures on MPS-only systems.

**Solution**: Comprehensive `torch.load` monkey patch:
```python
# Monkey patch torch.load to handle device mapping for Chatterbox-TTS
original_torch_load = torch.load

def patched_torch_load(f, map_location=None, **kwargs):
    """Patched torch.load that automatically maps CUDA tensors to CPU/MPS"""
    if map_location is None:
        map_location = 'cpu'  # Default to CPU for compatibility
    logger.info(f"🔧 Loading with map_location={map_location}")
    return original_torch_load(f, map_location=map_location, **kwargs)

# Apply the patch immediately after torch import
torch.load = patched_torch_load
```

### 2. Device Detection & Model Placement
**Implementation**: Intelligent device detection with fallback hierarchy:
```python
# Device detection with MPS support
if torch.backends.mps.is_available():
    DEVICE = "mps"
    logger.info("🚀 Running on MPS (Apple Silicon GPU)")
elif torch.cuda.is_available():
    DEVICE = "cuda" 
    logger.info("🚀 Running on CUDA GPU")
else:
    DEVICE = "cpu"
    logger.info("🚀 Running on CPU")
```

### 3. Safe Model Loading Strategy
**Approach**: Load to CPU first, then move to target device:
```python
# Load model to CPU first to avoid device issues
MODEL = ChatterboxTTS.from_pretrained("cpu")

# Move to target device if not CPU
if DEVICE != "cpu":
    logger.info(f"Moving model components to {DEVICE}...")
    if hasattr(MODEL, 't3'):
        MODEL.t3 = MODEL.t3.to(DEVICE)
    if hasattr(MODEL, 's3gen'):
        MODEL.s3gen = MODEL.s3gen.to(DEVICE)
    if hasattr(MODEL, 've'):
        MODEL.ve = MODEL.ve.to(DEVICE)
    MODEL.device = DEVICE
```

### 4. Text Chunking for Performance
**Enhancement**: Intelligent text splitting at sentence boundaries:
```python
def split_text_into_chunks(text: str, max_chars: int = 250) -> List[str]:
    """Split text into chunks at sentence boundaries, respecting max character limit."""
    if len(text) <= max_chars:
        return [text]
    
    # Split by sentences first (period, exclamation, question mark)
    sentences = re.split(r'(?<=[.!?])\s+', text)
    # ... chunking logic
```

## Implementation Architecture

### Core Components
1. **Device Compatibility Layer**: Handles CUDA→MPS mapping
2. **Model Management**: Safe loading and device placement
3. **Text Processing**: Intelligent chunking for longer texts
4. **Gradio Interface**: Modern UI with progress tracking

### File Structure
```
app.py                 # Main application (PyTorch + MPS)
requirements.txt       # Dependencies with MPS-compatible PyTorch
README.md             # Setup and usage instructions
```

## Dependencies & Installation

### Key Requirements
```txt
torch>=2.0.0           # MPS support requires PyTorch 2.0+
torchaudio>=2.0.0      # Audio processing
chatterbox-tts         # Core TTS model
gradio>=4.0.0          # Web interface
numpy>=1.21.0          # Numerical operations
```

### Installation Commands
```bash
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install PyTorch with MPS support
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install remaining dependencies
pip install -r requirements.txt
```

## Performance Optimizations

### 1. MPS GPU Acceleration
- **Benefit**: ~2-3x faster inference vs CPU-only
- **Memory**: Efficient GPU memory usage on Apple Silicon
- **Compatibility**: Works across M1, M2, M3 chip families

### 2. Text Chunking Strategy
- **Smart Splitting**: Preserves sentence boundaries
- **Fallback Logic**: Handles long sentences gracefully
- **User Experience**: Progress tracking for long texts

### 3. Model Caching
- **Singleton Pattern**: Model loaded once, reused across requests
- **Device Persistence**: Maintains GPU placement between calls
- **Memory Efficiency**: Avoids repeated model loading

## Gradio Interface Features

### User Interface
- **Modern Design**: Clean, intuitive layout
- **Real-time Feedback**: Loading states and progress bars
- **Error Handling**: Graceful failure with helpful messages
- **Audio Preview**: Inline audio player for generated speech

### Parameters
- **Voice Cloning**: Reference audio upload support
- **Quality Control**: Temperature, exaggeration, CFG weight
- **Reproducibility**: Seed control for consistent outputs
- **Chunking**: Configurable text chunk size

## Deployment Notes

### Port Configuration
- **Default Port**: 7861 (configurable)
- **Conflict Resolution**: Automatic port detection
- **Local Access**: http://localhost:7861

### System Requirements
- **macOS**: 12.0+ (Monterey or later)
- **Python**: 3.9-3.11 (tested on 3.11)
- **RAM**: 8GB minimum, 16GB recommended
- **Storage**: ~5GB for models and dependencies

## Troubleshooting

### Common Issues
1. **Port Conflicts**: Use `GRADIO_SERVER_PORT` environment variable
2. **Memory Issues**: Reduce chunk size or use CPU fallback
3. **Audio Dependencies**: Install ffmpeg if audio processing fails
4. **Model Loading**: Check internet connection for initial download

### Debug Commands
```bash
# Check MPS availability
python -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"

# Monitor GPU usage
sudo powermetrics --samplers gpu_power -n 1

# Check port usage
lsof -i :7861
```

## Success Metrics
- ✅ **Model Loading**: All components load without CUDA errors
- ✅ **Device Utilization**: MPS GPU acceleration active
- ✅ **Audio Generation**: High-quality speech synthesis
- ✅ **Performance**: Responsive interface with chunked processing
- ✅ **Stability**: Reliable operation across different text inputs

## Future Enhancements
- **MLX Integration**: Native Apple Silicon optimization (separate implementation available)
- **Batch Processing**: Multiple text inputs simultaneously
- **Voice Library**: Pre-configured voice presets
- **API Endpoint**: REST API for programmatic access

---

**Note**: This adaptation maintains full compatibility with the original Chatterbox-TTS functionality while adding Apple Silicon optimizations. The core model weights and inference logic remain unchanged, ensuring consistent audio quality across platforms.