# Model Name Updates in five_step_analysis.ipynb ## Changes Made Updated the notebook to display cleaner, more readable model names in all plots while maintaining the correct cost lookups. ## Before → After Transformations | Original Name | Display Name (with Version Date) | |---------------|----------------------------------| | `anthropic/claude-haiku:latest` | **Claude Haiku 4.5 (2025-10-01)** | | `anthropic/claude-opus:latest` | **Claude Opus 4.1 (2025-08-05)** | | `anthropic/claude-sonnet:latest` | **Claude Sonnet 4.5 (2025-09-29)** | | `claude-3-5-haiku-latest` | **Claude 3.5 Haiku (2024-10-22)** | | `google/gemini:latest` | **Gemini 2.5 Pro** | | `google/gemini-flash` | **Gemini Flash** | | `gemini-2.0-flash-lite` | **Gemini 2.0 Flash Lite** | | `openai/o:latest` | **O3 (2025-04-16, Azure)** | | `openai/gpt-5` | **GPT-5 (2025-08-07)** | | `openai/gpt-5-mini` | **GPT-5 Mini** | | `openai/o3` | **O3** | | `openai/o3-mini` | **O3 Mini** | | `openai/o4-mini` | **O4 Mini** | | `xai/grok:latest` | **Grok-3** | | `xai/grok-mini` | **Grok Mini** | | `xai/grok-code-fast-1` | **Grok Code Fast 1** | | `aws/llama-4-maverick` | **Llama-4 Maverick** | | `aws/llama-4-scout` | **Llama-4 Scout** | | `gpt-oss-120b` | **GPT-OSS-120B** | | `gpt-5-codex` | **GPT-5 Codex** | | `deepseek-r1` | **DeepSeek-R1** | | `gcp/qwen-3` | **Qwen-3** | **Note:** Version dates (e.g., 2025-10-01) reflect the actual underlying model versions discovered through CBORG API testing on October 29, 2025. ## Technical Implementation ### What Changed - Added `MODEL_NAME_MAPPING` dictionary based on CBORG API testing results - Added `resolve_model_name()` function to convert aliases to display names - Updated `create_pair_label()` to use resolved names instead of raw strings ### What Stayed the Same - Cost tables still use original model names (correct behavior) - Data loading and filtering logic unchanged - Plot generation code unchanged - Cost calculations work correctly with original column values ### Key Design Decision The mapping only affects the `pair` column used for display in plots. The original `supervisor` and `coder` columns remain unchanged, ensuring cost lookups continue to work correctly: ```python # Cost lookup uses original columns (correct) sup_model = row['supervisor'] # e.g., "anthropic/claude-haiku:latest" sup_icost = input_cost.get(sup_model, 0) # Finds correct price # Display uses mapped pair column pair_name = row['pair'] # e.g., "Claude Haiku 4.5" ``` ## Benefits 1. **Clearer plot titles**: "Claude Haiku 4.5" instead of "anthropic/claude-haiku:latest" 2. **Easier comparison**: Names highlight the actual model versions 3. **Based on real data**: Names reflect actual underlying models from CBORG API testing 4. **Maintains correctness**: Cost calculations still work properly with original names ## Example Output Before: - `anthropic/claude-sonnet:latest` - `xai/grok:latest` - `openai/o:latest` - `openai/gpt-5` After (with version dates): - `Claude Sonnet 4.5 (2025-09-29)` - `Grok-3` - `O3 (2025-04-16, Azure)` - `GPT-5 (2025-08-07)` Much more readable in plot titles and legends, with version dates showing exactly which model snapshot was used!