File size: 6,941 Bytes
3836582
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# Chatterbox-TTS Apple Silicon Adaptation Guide

## Overview
This document summarizes the key adaptations made to run Chatterbox-TTS successfully on Apple Silicon (M1/M2/M3) MacBooks with MPS GPU acceleration. The original Chatterbox-TTS models were trained on CUDA devices, requiring specific device mapping strategies for Apple Silicon compatibility.

## βœ… Confirmed Working Status
- **App Status**: βœ… Running successfully on port 7861
- **Device**: MPS (Apple Silicon GPU) 
- **Model Loading**: βœ… All components loaded successfully
- **Performance**: Optimized with text chunking for longer inputs

## Key Technical Challenges & Solutions

### 1. CUDA β†’ MPS Device Mapping
**Problem**: Chatterbox-TTS models were saved with CUDA device references, causing loading failures on MPS-only systems.

**Solution**: Comprehensive `torch.load` monkey patch:
```python
# Monkey patch torch.load to handle device mapping for Chatterbox-TTS
original_torch_load = torch.load

def patched_torch_load(f, map_location=None, **kwargs):
    """Patched torch.load that automatically maps CUDA tensors to CPU/MPS"""
    if map_location is None:
        map_location = 'cpu'  # Default to CPU for compatibility
    logger.info(f"πŸ”§ Loading with map_location={map_location}")
    return original_torch_load(f, map_location=map_location, **kwargs)

# Apply the patch immediately after torch import
torch.load = patched_torch_load
```

### 2. Device Detection & Model Placement
**Implementation**: Intelligent device detection with fallback hierarchy:
```python
# Device detection with MPS support
if torch.backends.mps.is_available():
    DEVICE = "mps"
    logger.info("πŸš€ Running on MPS (Apple Silicon GPU)")
elif torch.cuda.is_available():
    DEVICE = "cuda" 
    logger.info("πŸš€ Running on CUDA GPU")
else:
    DEVICE = "cpu"
    logger.info("πŸš€ Running on CPU")
```

### 3. Safe Model Loading Strategy
**Approach**: Load to CPU first, then move to target device:
```python
# Load model to CPU first to avoid device issues
MODEL = ChatterboxTTS.from_pretrained("cpu")

# Move to target device if not CPU
if DEVICE != "cpu":
    logger.info(f"Moving model components to {DEVICE}...")
    if hasattr(MODEL, 't3'):
        MODEL.t3 = MODEL.t3.to(DEVICE)
    if hasattr(MODEL, 's3gen'):
        MODEL.s3gen = MODEL.s3gen.to(DEVICE)
    if hasattr(MODEL, 've'):
        MODEL.ve = MODEL.ve.to(DEVICE)
    MODEL.device = DEVICE
```

### 4. Text Chunking for Performance
**Enhancement**: Intelligent text splitting at sentence boundaries:
```python
def split_text_into_chunks(text: str, max_chars: int = 250) -> List[str]:
    """Split text into chunks at sentence boundaries, respecting max character limit."""
    if len(text) <= max_chars:
        return [text]
    
    # Split by sentences first (period, exclamation, question mark)
    sentences = re.split(r'(?<=[.!?])\s+', text)
    # ... chunking logic
```

## Implementation Architecture

### Core Components
1. **Device Compatibility Layer**: Handles CUDA→MPS mapping
2. **Model Management**: Safe loading and device placement
3. **Text Processing**: Intelligent chunking for longer texts
4. **Gradio Interface**: Modern UI with progress tracking

### File Structure
```
app.py                 # Main application (PyTorch + MPS)
requirements.txt       # Dependencies with MPS-compatible PyTorch
README.md             # Setup and usage instructions
```

## Dependencies & Installation

### Key Requirements
```txt
torch>=2.0.0           # MPS support requires PyTorch 2.0+
torchaudio>=2.0.0      # Audio processing
chatterbox-tts         # Core TTS model
gradio>=4.0.0          # Web interface
numpy>=1.21.0          # Numerical operations
```

### Installation Commands
```bash
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install PyTorch with MPS support
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install remaining dependencies
pip install -r requirements.txt
```

## Performance Optimizations

### 1. MPS GPU Acceleration
- **Benefit**: ~2-3x faster inference vs CPU-only
- **Memory**: Efficient GPU memory usage on Apple Silicon
- **Compatibility**: Works across M1, M2, M3 chip families

### 2. Text Chunking Strategy
- **Smart Splitting**: Preserves sentence boundaries
- **Fallback Logic**: Handles long sentences gracefully
- **User Experience**: Progress tracking for long texts

### 3. Model Caching
- **Singleton Pattern**: Model loaded once, reused across requests
- **Device Persistence**: Maintains GPU placement between calls
- **Memory Efficiency**: Avoids repeated model loading

## Gradio Interface Features

### User Interface
- **Modern Design**: Clean, intuitive layout
- **Real-time Feedback**: Loading states and progress bars
- **Error Handling**: Graceful failure with helpful messages
- **Audio Preview**: Inline audio player for generated speech

### Parameters
- **Voice Cloning**: Reference audio upload support
- **Quality Control**: Temperature, exaggeration, CFG weight
- **Reproducibility**: Seed control for consistent outputs
- **Chunking**: Configurable text chunk size

## Deployment Notes

### Port Configuration
- **Default Port**: 7861 (configurable)
- **Conflict Resolution**: Automatic port detection
- **Local Access**: http://localhost:7861

### System Requirements
- **macOS**: 12.0+ (Monterey or later)
- **Python**: 3.9-3.11 (tested on 3.11)
- **RAM**: 8GB minimum, 16GB recommended
- **Storage**: ~5GB for models and dependencies

## Troubleshooting

### Common Issues
1. **Port Conflicts**: Use `GRADIO_SERVER_PORT` environment variable
2. **Memory Issues**: Reduce chunk size or use CPU fallback
3. **Audio Dependencies**: Install ffmpeg if audio processing fails
4. **Model Loading**: Check internet connection for initial download

### Debug Commands
```bash
# Check MPS availability
python -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"

# Monitor GPU usage
sudo powermetrics --samplers gpu_power -n 1

# Check port usage
lsof -i :7861
```

## Success Metrics
- βœ… **Model Loading**: All components load without CUDA errors
- βœ… **Device Utilization**: MPS GPU acceleration active
- βœ… **Audio Generation**: High-quality speech synthesis
- βœ… **Performance**: Responsive interface with chunked processing
- βœ… **Stability**: Reliable operation across different text inputs

## Future Enhancements
- **MLX Integration**: Native Apple Silicon optimization (separate implementation available)
- **Batch Processing**: Multiple text inputs simultaneously
- **Voice Library**: Pre-configured voice presets
- **API Endpoint**: REST API for programmatic access

---

**Note**: This adaptation maintains full compatibility with the original Chatterbox-TTS functionality while adding Apple Silicon optimizations. The core model weights and inference logic remain unchanged, ensuring consistent audio quality across platforms.