File size: 14,707 Bytes
37a8fd4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 |
---
language: en
license: mit
library_name: sklearn
tags:
- trading
- finance
- gold
- xauusd
- forex
- algorithmic-trading
- smart-money-concepts
- smc
- xgboost
- machine-learning
- backtesting
- technical-analysis
- multi-timeframe
- intraday-trading
- high-frequency-trading
datasets:
- yahoo-finance-gc-f
metrics:
- accuracy
- precision
- recall
- f1
model-index:
- name: xauusd-trading-ai-smc-daily
results:
- task:
type: binary-classification
name: Daily Price Direction Prediction
dataset:
type: yahoo-finance-gc-f
name: Gold Futures (GC=F)
metrics:
- type: accuracy
value: 80.3
name: Accuracy
- type: precision
value: 71
name: Precision (Class 1)
- type: recall
value: 81
name: Recall (Class 1)
- type: f1
value: 76
name: F1-Score
- name: xauusd-trading-ai-smc-15m
results:
- task:
type: binary-classification
name: 15-Minute Price Direction Prediction
dataset:
type: yahoo-finance-gc-f
name: Gold Futures (GC=F)
metrics:
- type: accuracy
value: 77.0
name: Accuracy
- type: precision
value: 76
name: Precision (Class 1)
- type: recall
value: 77
name: Recall (Class 1)
- type: f1
value: 76
name: F1-Score
---
---
# XAUUSD Multi-Timeframe Trading AI Model
## Files Included
### Core Models
- `trading_model.pkl` - Original daily timeframe XGBoost model (85.4% win rate)
- `trading_model_15m.pkl` - 15-minute intraday model (77% validation accuracy)
- `trading_model_1m.pkl` - 1-minute intraday model (partially trained)
- `trading_model_30m.pkl` - 30-minute intraday model (ready for training)
### Documentation
- `README.md` - This comprehensive model card
- `XAUUSD_Trading_AI_Paper.md` - **Research paper with academic structure, literature review, and methodology**
- `XAUUSD_Trading_AI_Paper.docx` - **Word document version (professional format)**
- `XAUUSD_Trading_AI_Paper.html` - **HTML web version (styled and readable)**
- `XAUUSD_Trading_AI_Paper.tex` - **LaTeX source (for academic publishing)**
- `XAUUSD_Trading_AI_Technical_Whitepaper.md` - **Technical whitepaper with mathematical formulations and implementation details**
- `XAUUSD_Trading_AI_Technical_Whitepaper.docx` - **Word document version (professional format)**
- `XAUUSD_Trading_AI_Technical_Whitepaper.html` - **HTML web version (styled and readable)**
- `XAUUSD_Trading_AI_Technical_Whitepaper.tex` - **LaTeX source (for academic publishing)**
### Performance & Analysis
- `backtest_report.csv` - Daily model yearly backtesting performance results
- `backtest_multi_timeframe_results.csv` - Intraday model backtesting results
- `feature_importance_15m.csv` - 15-minute model feature importance analysis
### Scripts & Tools
- `train_multi_timeframe.py` - Multi-timeframe model training script
- `backtest_multi_timeframe.py` - Intraday model backtesting framework
- `multi_timeframe_summary.py` - Comprehensive performance analysis tool
- `fetch_data.py` - Enhanced data acquisition for multiple timeframes
### Dataset Files
- **Daily Data**: `daily_data.csv`, `processed_daily_data.csv`, `smc_features_dataset.csv`, `X_features.csv`, `y_target.csv`
- **Intraday Data**: `1m_data.csv` (5,204 samples), `15m_data.csv` (3,814 samples), `30m_data.csv` (1,910 samples)
## Recent Enhancements (v2.0)
### Visual Documentation
- **Dataset Flow Diagram**: Complete data processing pipeline from raw Yahoo Finance data to model training
- **Model Architecture Diagram**: XGBoost ensemble structure with decision flow visualization
- **Buy/Sell Workflow Diagram**: End-to-end trading execution process with risk management
### Advanced Formulas & Techniques
- **Position Sizing Formula**: Risk-adjusted position calculation with Kelly Criterion adaptation
- **Risk Metrics**: Sharpe Ratio, Sortino Ratio, Calmar Ratio, and Maximum Drawdown calculations
- **SMC Techniques**: Advanced Order Block detection with volume profile analysis
- **Dynamic Thresholds**: Market volatility-based prediction threshold adjustment
- **Ensemble Signals**: Multi-source signal confirmation (ML + Technical + SMC)
### Performance Analytics
- **Monthly Performance Heatmap**: Visual representation of returns across all test years
- **Risk-Return Scatter Plot**: Performance comparison across different risk levels
- **Market Regime Analysis**: Performance breakdown by trending vs sideways markets
### Documentation Updates
- **Enhanced Technical Whitepaper**: Added comprehensive visual diagrams and mathematical formulations
- **Enhanced Research Paper**: Added Mermaid diagrams, advanced algorithms, and detailed performance analysis
- **Professional Exports**: Both documents now available in HTML, Word, and LaTeX formats
## Multi-Timeframe Trading System (Latest Addition)
### Overview
The system has been extended to support intraday trading across multiple timeframes, enabling higher-frequency trading strategies while maintaining the proven SMC + technical indicator approach.
### Supported Timeframes
- **1-minute (1m)**: Ultra-short-term scalping opportunities
- **15-minute (15m)**: Short-term swing trading
- **30-minute (30m)**: Medium-term position trading
- **Daily (1d)**: Original baseline model (85.4% win rate)
### Data Acquisition
- **Source**: Yahoo Finance API with enhanced intraday data fetching
- **Limitations**: Historical intraday data restricted (recent periods only)
- **Current Datasets**:
- 1m: 5,204 samples (7 days of recent data)
- 15m: 3,814 samples (60 days of recent data)
- 30m: 1,910 samples (60 days of recent data)
### Model Architecture
- **Base Algorithm**: XGBoost Classifier (same as daily model)
- **Features**: 23 features (technical indicators + SMC elements)
- **Training**: Grid search hyperparameter optimization
- **Validation**: 80/20 train/test split with stratification
### Training Results
- **15m Model**: Successfully trained with 77% validation accuracy
- **Feature Importance**: Technical indicators dominant (SMA_50, EMA_12, BB_lower)
- **Training Status**: 1m model partially trained, 30m model interrupted (available for completion)
### Backtesting Performance
- **Framework**: Backtrader with realistic commission modeling
- **Risk Management**: Fixed stake sizing ($1,000 per trade)
- **15m Results**: -0.83% return with 1 trade (conservative strategy)
- **Analysis**: Models show conservative behavior to avoid overtrading
### Key Insights
- ✅ Successfully scaled daily model architecture to intraday timeframes
- ✅ Technical indicators remain most important across all timeframes
- ✅ Conservative prediction thresholds prevent excessive trading
- ⚠️ Limited historical data affects backtesting statistical significance
- ⚠️ Yahoo Finance API constraints limit comprehensive validation
### Files Added
- `train_multi_timeframe.py` - Multi-timeframe model training script
- `backtest_multi_timeframe.py` - Intraday model backtesting framework
- `multi_timeframe_summary.py` - Comprehensive performance analysis
- `trading_model_15m.pkl` - Trained 15-minute model
- `feature_importance_15m.csv` - Feature importance analysis
- `backtest_multi_timeframe_results.csv` - Backtesting performance data
### Next Steps
1. Complete 30m model training
2. Implement walk-forward optimization
3. Add extended historical data sources
4. Deploy best performing intraday model
5. Compare intraday vs daily performance
## Model Description
This is an AI-powered trading model for XAUUSD (Gold vs US Dollar) futures, trained using Smart Money Concepts (SMC) strategy elements. The model uses machine learning to predict 5-day ahead price movements and generate trading signals with high win rates.
### Key Features
- **Asset**: XAUUSD (Gold Futures)
- **Strategy**: Smart Money Concepts (SMC) with technical indicators
- **Prediction Horizon**: 5-day ahead price direction
- **Model Type**: XGBoost Classifier
## Romeo (V5) — Ensemble model
Romeo (codename V5) is the latest ensemble model combining tree-based learners (XGBoost / LightGBM) and an optional Keras head. The artifacts live in `models_romeo/` and include a canonical feature list used by the backtester to align unseen data.
Artifacts
- `models_romeo/trading_model_romeo_daily.pkl` — ensemble artifact (joblib) with `models`, `weights`, and `features` keys.
- `models_romeo/romeo_keras_daily.keras` — optional Keras model file when included in training.
- `models_romeo/MODEL_CARD.md` — this model's card with evaluation and transparency notes.
Evaluation (selected run on unseen daily data)
- Initial capital: 100
- Final capital: 484.8199
- CAGR: 0.0444
- Annual volatility: 0.4118
- Sharpe: 0.3119
- Max Drawdown: -47.66%
- Total trades: 3610
- Win rate: 49.47%
Uploading to Hugging Face
-------------------------
There is a helper script to upload the model artifacts to Hugging Face Hub:
1. Install dependencies:
```bash
pip install huggingface_hub
```
2. Set your HF token in the environment (Windows cmd.exe):
```cmd
set HF_TOKEN=hf_YourTokenHere
```
3. Upload:
```cmd
python v5\upload_model_v5_to_hf.py --repo-name your-username/romeo-v5 --model-dir models_romeo
```
The script will create the repo (if it doesn't exist) and upload all files from `models_romeo/`.
Usage example
-------------
Load the artifact and run predictions:
```python
import joblib
artifact = joblib.load('models_romeo/trading_model_romeo_daily.pkl')
features = artifact['features']
# prepare X matching features
# model usage depends on artifact['models'] layout; check MODEL_CARD.md for details
```
Notes & Next Steps
------------------
- Position sizing is simplified in the backtester; consider implementing fixed-risk sizing before live use.
- Consider re-running the robustness scan using the M2M metric as primary evaluation (recommended).
- **Accuracy**: 80.3% on test data
- **Win Rate**: 85.4% in backtesting
## Intended Use
This model is designed for:
- Educational purposes in algorithmic trading
- Research on SMC strategies
- Backtesting trading strategies
- Understanding ML applications in financial markets
**⚠️ Warning**: This is not financial advice. Trading involves risk of loss. Use at your own discretion.
## Training Data
- **Source**: Yahoo Finance (GC=F - Gold Futures)
- **Period**: 2000-2020 (excluding recent months for efficiency)
- **Features**: 23 features including:
- Price data (Open, High, Low, Close, Volume)
- Technical indicators (SMA, EMA, RSI, MACD, Bollinger Bands)
- SMC features (Fair Value Gaps, Order Blocks, Recovery patterns)
- Lag features (Close prices from previous days)
- **Target**: Binary classification (1 if price rises in 5 days, 0 otherwise)
- **Dataset Size**: 8,816 samples
- **Class Distribution**: 54% down, 46% up (balanced with scale_pos_weight)
## Performance Metrics
### Model Performance
- **Accuracy**: 80.3%
- **Precision (Class 1)**: 71%
- **Recall (Class 1)**: 81%
- **F1-Score**: 76%
### Backtesting Results (2015-2020)
- **Overall Win Rate**: 85.4%
- **Total Return**: 18.2%
- **Sharpe Ratio**: 1.41
- **Yearly Win Rates**:
- 2015: 62.5%
- 2016: 100.0%
- 2017: 100.0%
- 2018: 72.7%
- 2019: 76.9%
- 2020: 94.1%
## Limitations
- Trained on historical data only (2000-2020)
- May not perform well in unprecedented market conditions
- Requires proper risk management
- No consideration of transaction costs, slippage, or market impact
- Model predictions are probabilistic, not guaranteed
## Usage
### Prerequisites
```python
pip install joblib scikit-learn pandas numpy
```
### Loading the Model
```python
import joblib
import pandas as pd
from sklearn.preprocessing import StandardScaler
# Load model
model = joblib.load('trading_model.pkl')
# Load scalers (you need to recreate or save them)
# ... preprocessing code ...
# Prepare features
features = prepare_features(your_data)
prediction = model.predict(features)
probability = model.predict_proba(features)
```
### Features Required
The model expects 23 features in this order:
1. Close
2. High
3. Low
4. Open
5. Volume
6. SMA_20
7. SMA_50
8. EMA_12
9. EMA_26
10. RSI
11. MACD
12. MACD_signal
13. MACD_hist
14. BB_upper
15. BB_middle
16. BB_lower
17. FVG_Size
18. FVG_Type_Encoded
19. OB_Type_Encoded
20. Recovery_Type_Encoded
21. Close_lag1
22. Close_lag2
23. Close_lag3
## Training Details
- **Algorithm**: XGBoost Classifier
- **Hyperparameters**:
- n_estimators: 200
- max_depth: 7
- learning_rate: 0.2
- scale_pos_weight: 1.17 (for class balancing)
- **Cross-validation**: 3-fold
- **Optimization**: Grid search on hyperparameters
## SMC Strategy Elements
The model incorporates Smart Money Concepts:
- **Fair Value Gaps (FVG)**: Price imbalances between candles
- **Order Blocks (OB)**: Areas of significant buying/selling
- **Recovery Patterns**: Pullbacks in trending markets
## Upload to Hugging Face
To share this model on Hugging Face:
1. Create a Hugging Face account at https://huggingface.co/join
2. Generate an access token at https://huggingface.co/settings/tokens with "Write" permissions
3. Test your token: `python test_token.py YOUR_TOKEN`
4. Upload: `python upload_to_hf.py YOUR_TOKEN`
The script will upload:
- `trading_model.pkl` - The trained XGBoost model
- `README.md` - This model card with metadata
- All dataset files (CSV format)
## Citation
If you use this model in your research, please cite:
```
@misc{xauusd-trading-ai,
title={XAUUSD Trading AI Model with SMC Strategy},
author={AI Trading System},
year={2025},
url={https://huggingface.co/JonusNattapong/xauusd-trading-ai-smc}
}
```
### Academic Paper
For the complete academic research paper with methodology, results, and analysis:
**arXiv Paper**: [XAUUSD Trading AI: A Machine Learning Approach Using Smart Money Concepts](https://arxiv.org/abs/XXXX.XXXXX)
## License
This model is released under the MIT License. See LICENSE file for details.
## Contact
For questions or issues, please open an issue on the Hugging Face repository. |