JonusNattapong's picture
Upload README.md with huggingface_hub
37a8fd4 verified
|
raw
history blame
14.7 kB
---
language: en
license: mit
library_name: sklearn
tags:
- trading
- finance
- gold
- xauusd
- forex
- algorithmic-trading
- smart-money-concepts
- smc
- xgboost
- machine-learning
- backtesting
- technical-analysis
- multi-timeframe
- intraday-trading
- high-frequency-trading
datasets:
- yahoo-finance-gc-f
metrics:
- accuracy
- precision
- recall
- f1
model-index:
- name: xauusd-trading-ai-smc-daily
results:
- task:
type: binary-classification
name: Daily Price Direction Prediction
dataset:
type: yahoo-finance-gc-f
name: Gold Futures (GC=F)
metrics:
- type: accuracy
value: 80.3
name: Accuracy
- type: precision
value: 71
name: Precision (Class 1)
- type: recall
value: 81
name: Recall (Class 1)
- type: f1
value: 76
name: F1-Score
- name: xauusd-trading-ai-smc-15m
results:
- task:
type: binary-classification
name: 15-Minute Price Direction Prediction
dataset:
type: yahoo-finance-gc-f
name: Gold Futures (GC=F)
metrics:
- type: accuracy
value: 77.0
name: Accuracy
- type: precision
value: 76
name: Precision (Class 1)
- type: recall
value: 77
name: Recall (Class 1)
- type: f1
value: 76
name: F1-Score
---
---
# XAUUSD Multi-Timeframe Trading AI Model
## Files Included
### Core Models
- `trading_model.pkl` - Original daily timeframe XGBoost model (85.4% win rate)
- `trading_model_15m.pkl` - 15-minute intraday model (77% validation accuracy)
- `trading_model_1m.pkl` - 1-minute intraday model (partially trained)
- `trading_model_30m.pkl` - 30-minute intraday model (ready for training)
### Documentation
- `README.md` - This comprehensive model card
- `XAUUSD_Trading_AI_Paper.md` - **Research paper with academic structure, literature review, and methodology**
- `XAUUSD_Trading_AI_Paper.docx` - **Word document version (professional format)**
- `XAUUSD_Trading_AI_Paper.html` - **HTML web version (styled and readable)**
- `XAUUSD_Trading_AI_Paper.tex` - **LaTeX source (for academic publishing)**
- `XAUUSD_Trading_AI_Technical_Whitepaper.md` - **Technical whitepaper with mathematical formulations and implementation details**
- `XAUUSD_Trading_AI_Technical_Whitepaper.docx` - **Word document version (professional format)**
- `XAUUSD_Trading_AI_Technical_Whitepaper.html` - **HTML web version (styled and readable)**
- `XAUUSD_Trading_AI_Technical_Whitepaper.tex` - **LaTeX source (for academic publishing)**
### Performance & Analysis
- `backtest_report.csv` - Daily model yearly backtesting performance results
- `backtest_multi_timeframe_results.csv` - Intraday model backtesting results
- `feature_importance_15m.csv` - 15-minute model feature importance analysis
### Scripts & Tools
- `train_multi_timeframe.py` - Multi-timeframe model training script
- `backtest_multi_timeframe.py` - Intraday model backtesting framework
- `multi_timeframe_summary.py` - Comprehensive performance analysis tool
- `fetch_data.py` - Enhanced data acquisition for multiple timeframes
### Dataset Files
- **Daily Data**: `daily_data.csv`, `processed_daily_data.csv`, `smc_features_dataset.csv`, `X_features.csv`, `y_target.csv`
- **Intraday Data**: `1m_data.csv` (5,204 samples), `15m_data.csv` (3,814 samples), `30m_data.csv` (1,910 samples)
## Recent Enhancements (v2.0)
### Visual Documentation
- **Dataset Flow Diagram**: Complete data processing pipeline from raw Yahoo Finance data to model training
- **Model Architecture Diagram**: XGBoost ensemble structure with decision flow visualization
- **Buy/Sell Workflow Diagram**: End-to-end trading execution process with risk management
### Advanced Formulas & Techniques
- **Position Sizing Formula**: Risk-adjusted position calculation with Kelly Criterion adaptation
- **Risk Metrics**: Sharpe Ratio, Sortino Ratio, Calmar Ratio, and Maximum Drawdown calculations
- **SMC Techniques**: Advanced Order Block detection with volume profile analysis
- **Dynamic Thresholds**: Market volatility-based prediction threshold adjustment
- **Ensemble Signals**: Multi-source signal confirmation (ML + Technical + SMC)
### Performance Analytics
- **Monthly Performance Heatmap**: Visual representation of returns across all test years
- **Risk-Return Scatter Plot**: Performance comparison across different risk levels
- **Market Regime Analysis**: Performance breakdown by trending vs sideways markets
### Documentation Updates
- **Enhanced Technical Whitepaper**: Added comprehensive visual diagrams and mathematical formulations
- **Enhanced Research Paper**: Added Mermaid diagrams, advanced algorithms, and detailed performance analysis
- **Professional Exports**: Both documents now available in HTML, Word, and LaTeX formats
## Multi-Timeframe Trading System (Latest Addition)
### Overview
The system has been extended to support intraday trading across multiple timeframes, enabling higher-frequency trading strategies while maintaining the proven SMC + technical indicator approach.
### Supported Timeframes
- **1-minute (1m)**: Ultra-short-term scalping opportunities
- **15-minute (15m)**: Short-term swing trading
- **30-minute (30m)**: Medium-term position trading
- **Daily (1d)**: Original baseline model (85.4% win rate)
### Data Acquisition
- **Source**: Yahoo Finance API with enhanced intraday data fetching
- **Limitations**: Historical intraday data restricted (recent periods only)
- **Current Datasets**:
- 1m: 5,204 samples (7 days of recent data)
- 15m: 3,814 samples (60 days of recent data)
- 30m: 1,910 samples (60 days of recent data)
### Model Architecture
- **Base Algorithm**: XGBoost Classifier (same as daily model)
- **Features**: 23 features (technical indicators + SMC elements)
- **Training**: Grid search hyperparameter optimization
- **Validation**: 80/20 train/test split with stratification
### Training Results
- **15m Model**: Successfully trained with 77% validation accuracy
- **Feature Importance**: Technical indicators dominant (SMA_50, EMA_12, BB_lower)
- **Training Status**: 1m model partially trained, 30m model interrupted (available for completion)
### Backtesting Performance
- **Framework**: Backtrader with realistic commission modeling
- **Risk Management**: Fixed stake sizing ($1,000 per trade)
- **15m Results**: -0.83% return with 1 trade (conservative strategy)
- **Analysis**: Models show conservative behavior to avoid overtrading
### Key Insights
- ✅ Successfully scaled daily model architecture to intraday timeframes
- ✅ Technical indicators remain most important across all timeframes
- ✅ Conservative prediction thresholds prevent excessive trading
- ⚠️ Limited historical data affects backtesting statistical significance
- ⚠️ Yahoo Finance API constraints limit comprehensive validation
### Files Added
- `train_multi_timeframe.py` - Multi-timeframe model training script
- `backtest_multi_timeframe.py` - Intraday model backtesting framework
- `multi_timeframe_summary.py` - Comprehensive performance analysis
- `trading_model_15m.pkl` - Trained 15-minute model
- `feature_importance_15m.csv` - Feature importance analysis
- `backtest_multi_timeframe_results.csv` - Backtesting performance data
### Next Steps
1. Complete 30m model training
2. Implement walk-forward optimization
3. Add extended historical data sources
4. Deploy best performing intraday model
5. Compare intraday vs daily performance
## Model Description
This is an AI-powered trading model for XAUUSD (Gold vs US Dollar) futures, trained using Smart Money Concepts (SMC) strategy elements. The model uses machine learning to predict 5-day ahead price movements and generate trading signals with high win rates.
### Key Features
- **Asset**: XAUUSD (Gold Futures)
- **Strategy**: Smart Money Concepts (SMC) with technical indicators
- **Prediction Horizon**: 5-day ahead price direction
- **Model Type**: XGBoost Classifier
## Romeo (V5) — Ensemble model
Romeo (codename V5) is the latest ensemble model combining tree-based learners (XGBoost / LightGBM) and an optional Keras head. The artifacts live in `models_romeo/` and include a canonical feature list used by the backtester to align unseen data.
Artifacts
- `models_romeo/trading_model_romeo_daily.pkl` — ensemble artifact (joblib) with `models`, `weights`, and `features` keys.
- `models_romeo/romeo_keras_daily.keras` — optional Keras model file when included in training.
- `models_romeo/MODEL_CARD.md` — this model's card with evaluation and transparency notes.
Evaluation (selected run on unseen daily data)
- Initial capital: 100
- Final capital: 484.8199
- CAGR: 0.0444
- Annual volatility: 0.4118
- Sharpe: 0.3119
- Max Drawdown: -47.66%
- Total trades: 3610
- Win rate: 49.47%
Uploading to Hugging Face
-------------------------
There is a helper script to upload the model artifacts to Hugging Face Hub:
1. Install dependencies:
```bash
pip install huggingface_hub
```
2. Set your HF token in the environment (Windows cmd.exe):
```cmd
set HF_TOKEN=hf_YourTokenHere
```
3. Upload:
```cmd
python v5\upload_model_v5_to_hf.py --repo-name your-username/romeo-v5 --model-dir models_romeo
```
The script will create the repo (if it doesn't exist) and upload all files from `models_romeo/`.
Usage example
-------------
Load the artifact and run predictions:
```python
import joblib
artifact = joblib.load('models_romeo/trading_model_romeo_daily.pkl')
features = artifact['features']
# prepare X matching features
# model usage depends on artifact['models'] layout; check MODEL_CARD.md for details
```
Notes & Next Steps
------------------
- Position sizing is simplified in the backtester; consider implementing fixed-risk sizing before live use.
- Consider re-running the robustness scan using the M2M metric as primary evaluation (recommended).
- **Accuracy**: 80.3% on test data
- **Win Rate**: 85.4% in backtesting
## Intended Use
This model is designed for:
- Educational purposes in algorithmic trading
- Research on SMC strategies
- Backtesting trading strategies
- Understanding ML applications in financial markets
**⚠️ Warning**: This is not financial advice. Trading involves risk of loss. Use at your own discretion.
## Training Data
- **Source**: Yahoo Finance (GC=F - Gold Futures)
- **Period**: 2000-2020 (excluding recent months for efficiency)
- **Features**: 23 features including:
- Price data (Open, High, Low, Close, Volume)
- Technical indicators (SMA, EMA, RSI, MACD, Bollinger Bands)
- SMC features (Fair Value Gaps, Order Blocks, Recovery patterns)
- Lag features (Close prices from previous days)
- **Target**: Binary classification (1 if price rises in 5 days, 0 otherwise)
- **Dataset Size**: 8,816 samples
- **Class Distribution**: 54% down, 46% up (balanced with scale_pos_weight)
## Performance Metrics
### Model Performance
- **Accuracy**: 80.3%
- **Precision (Class 1)**: 71%
- **Recall (Class 1)**: 81%
- **F1-Score**: 76%
### Backtesting Results (2015-2020)
- **Overall Win Rate**: 85.4%
- **Total Return**: 18.2%
- **Sharpe Ratio**: 1.41
- **Yearly Win Rates**:
- 2015: 62.5%
- 2016: 100.0%
- 2017: 100.0%
- 2018: 72.7%
- 2019: 76.9%
- 2020: 94.1%
## Limitations
- Trained on historical data only (2000-2020)
- May not perform well in unprecedented market conditions
- Requires proper risk management
- No consideration of transaction costs, slippage, or market impact
- Model predictions are probabilistic, not guaranteed
## Usage
### Prerequisites
```python
pip install joblib scikit-learn pandas numpy
```
### Loading the Model
```python
import joblib
import pandas as pd
from sklearn.preprocessing import StandardScaler
# Load model
model = joblib.load('trading_model.pkl')
# Load scalers (you need to recreate or save them)
# ... preprocessing code ...
# Prepare features
features = prepare_features(your_data)
prediction = model.predict(features)
probability = model.predict_proba(features)
```
### Features Required
The model expects 23 features in this order:
1. Close
2. High
3. Low
4. Open
5. Volume
6. SMA_20
7. SMA_50
8. EMA_12
9. EMA_26
10. RSI
11. MACD
12. MACD_signal
13. MACD_hist
14. BB_upper
15. BB_middle
16. BB_lower
17. FVG_Size
18. FVG_Type_Encoded
19. OB_Type_Encoded
20. Recovery_Type_Encoded
21. Close_lag1
22. Close_lag2
23. Close_lag3
## Training Details
- **Algorithm**: XGBoost Classifier
- **Hyperparameters**:
- n_estimators: 200
- max_depth: 7
- learning_rate: 0.2
- scale_pos_weight: 1.17 (for class balancing)
- **Cross-validation**: 3-fold
- **Optimization**: Grid search on hyperparameters
## SMC Strategy Elements
The model incorporates Smart Money Concepts:
- **Fair Value Gaps (FVG)**: Price imbalances between candles
- **Order Blocks (OB)**: Areas of significant buying/selling
- **Recovery Patterns**: Pullbacks in trending markets
## Upload to Hugging Face
To share this model on Hugging Face:
1. Create a Hugging Face account at https://huggingface.co/join
2. Generate an access token at https://huggingface.co/settings/tokens with "Write" permissions
3. Test your token: `python test_token.py YOUR_TOKEN`
4. Upload: `python upload_to_hf.py YOUR_TOKEN`
The script will upload:
- `trading_model.pkl` - The trained XGBoost model
- `README.md` - This model card with metadata
- All dataset files (CSV format)
## Citation
If you use this model in your research, please cite:
```
@misc{xauusd-trading-ai,
title={XAUUSD Trading AI Model with SMC Strategy},
author={AI Trading System},
year={2025},
url={https://huggingface.co/JonusNattapong/xauusd-trading-ai-smc}
}
```
### Academic Paper
For the complete academic research paper with methodology, results, and analysis:
**arXiv Paper**: [XAUUSD Trading AI: A Machine Learning Approach Using Smart Money Concepts](https://arxiv.org/abs/XXXX.XXXXX)
## License
This model is released under the MIT License. See LICENSE file for details.
## Contact
For questions or issues, please open an issue on the Hugging Face repository.