README.md · JonusNattapong/romeo-v8-super-ensemble-trading-ai at 94f7cd2996d15c110e8d2ae3a5ff8fff263065e6

romeo-v8-super-ensemble-trading-ai / README.md

JonusNattapong

Upload README.md with huggingface_hub

37a8fd4 verified 3 months ago

preview code

raw

history blame

14.7 kB

	---
	language: en
	license: mit
	library_name: sklearn
	tags:
	- trading
	- finance
	- gold
	- xauusd
	- forex
	- algorithmic-trading
	- smart-money-concepts
	- smc
	- xgboost
	- machine-learning
	- backtesting
	- technical-analysis
	- multi-timeframe
	- intraday-trading
	- high-frequency-trading
	datasets:
	- yahoo-finance-gc-f
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	model-index:
	- name: xauusd-trading-ai-smc-daily
	results:
	- task:
	type: binary-classification
	name: Daily Price Direction Prediction
	dataset:
	type: yahoo-finance-gc-f
	name: Gold Futures (GC=F)
	metrics:
	- type: accuracy
	value: 80.3
	name: Accuracy
	- type: precision
	value: 71
	name: Precision (Class 1)
	- type: recall
	value: 81
	name: Recall (Class 1)
	- type: f1
	value: 76
	name: F1-Score
	- name: xauusd-trading-ai-smc-15m
	results:
	- task:
	type: binary-classification
	name: 15-Minute Price Direction Prediction
	dataset:
	type: yahoo-finance-gc-f
	name: Gold Futures (GC=F)
	metrics:
	- type: accuracy
	value: 77.0
	name: Accuracy
	- type: precision
	value: 76
	name: Precision (Class 1)
	- type: recall
	value: 77
	name: Recall (Class 1)
	- type: f1
	value: 76
	name: F1-Score
	---
	---

	# XAUUSD Multi-Timeframe Trading AI Model

	## Files Included

	### Core Models
	- `trading_model.pkl` - Original daily timeframe XGBoost model (85.4% win rate)
	- `trading_model_15m.pkl` - 15-minute intraday model (77% validation accuracy)
	- `trading_model_1m.pkl` - 1-minute intraday model (partially trained)
	- `trading_model_30m.pkl` - 30-minute intraday model (ready for training)

	### Documentation
	- `README.md` - This comprehensive model card
	- `XAUUSD_Trading_AI_Paper.md` - Research paper with academic structure, literature review, and methodology
	- `XAUUSD_Trading_AI_Paper.docx` - Word document version (professional format)
	- `XAUUSD_Trading_AI_Paper.html` - HTML web version (styled and readable)
	- `XAUUSD_Trading_AI_Paper.tex` - LaTeX source (for academic publishing)
	- `XAUUSD_Trading_AI_Technical_Whitepaper.md` - Technical whitepaper with mathematical formulations and implementation details
	- `XAUUSD_Trading_AI_Technical_Whitepaper.docx` - Word document version (professional format)
	- `XAUUSD_Trading_AI_Technical_Whitepaper.html` - HTML web version (styled and readable)
	- `XAUUSD_Trading_AI_Technical_Whitepaper.tex` - LaTeX source (for academic publishing)

	### Performance & Analysis
	- `backtest_report.csv` - Daily model yearly backtesting performance results
	- `backtest_multi_timeframe_results.csv` - Intraday model backtesting results
	- `feature_importance_15m.csv` - 15-minute model feature importance analysis

	### Scripts & Tools
	- `train_multi_timeframe.py` - Multi-timeframe model training script
	- `backtest_multi_timeframe.py` - Intraday model backtesting framework
	- `multi_timeframe_summary.py` - Comprehensive performance analysis tool
	- `fetch_data.py` - Enhanced data acquisition for multiple timeframes

	### Dataset Files
	- Daily Data: `daily_data.csv`, `processed_daily_data.csv`, `smc_features_dataset.csv`, `X_features.csv`, `y_target.csv`
	- Intraday Data: `1m_data.csv` (5,204 samples), `15m_data.csv` (3,814 samples), `30m_data.csv` (1,910 samples)

	## Recent Enhancements (v2.0)

	### Visual Documentation
	- Dataset Flow Diagram: Complete data processing pipeline from raw Yahoo Finance data to model training
	- Model Architecture Diagram: XGBoost ensemble structure with decision flow visualization
	- Buy/Sell Workflow Diagram: End-to-end trading execution process with risk management

	### Advanced Formulas & Techniques
	- Position Sizing Formula: Risk-adjusted position calculation with Kelly Criterion adaptation
	- Risk Metrics: Sharpe Ratio, Sortino Ratio, Calmar Ratio, and Maximum Drawdown calculations
	- SMC Techniques: Advanced Order Block detection with volume profile analysis
	- Dynamic Thresholds: Market volatility-based prediction threshold adjustment
	- Ensemble Signals: Multi-source signal confirmation (ML + Technical + SMC)

	### Performance Analytics
	- Monthly Performance Heatmap: Visual representation of returns across all test years
	- Risk-Return Scatter Plot: Performance comparison across different risk levels
	- Market Regime Analysis: Performance breakdown by trending vs sideways markets

	### Documentation Updates
	- Enhanced Technical Whitepaper: Added comprehensive visual diagrams and mathematical formulations
	- Enhanced Research Paper: Added Mermaid diagrams, advanced algorithms, and detailed performance analysis
	- Professional Exports: Both documents now available in HTML, Word, and LaTeX formats

	## Multi-Timeframe Trading System (Latest Addition)

	### Overview
	The system has been extended to support intraday trading across multiple timeframes, enabling higher-frequency trading strategies while maintaining the proven SMC + technical indicator approach.

	### Supported Timeframes
	- 1-minute (1m): Ultra-short-term scalping opportunities
	- 15-minute (15m): Short-term swing trading
	- 30-minute (30m): Medium-term position trading
	- Daily (1d): Original baseline model (85.4% win rate)

	### Data Acquisition
	- Source: Yahoo Finance API with enhanced intraday data fetching
	- Limitations: Historical intraday data restricted (recent periods only)
	- Current Datasets:
	- 1m: 5,204 samples (7 days of recent data)
	- 15m: 3,814 samples (60 days of recent data)
	- 30m: 1,910 samples (60 days of recent data)

	### Model Architecture
	- Base Algorithm: XGBoost Classifier (same as daily model)
	- Features: 23 features (technical indicators + SMC elements)
	- Training: Grid search hyperparameter optimization
	- Validation: 80/20 train/test split with stratification

	### Training Results
	- 15m Model: Successfully trained with 77% validation accuracy
	- Feature Importance: Technical indicators dominant (SMA_50, EMA_12, BB_lower)
	- Training Status: 1m model partially trained, 30m model interrupted (available for completion)

	### Backtesting Performance
	- Framework: Backtrader with realistic commission modeling
	- Risk Management: Fixed stake sizing ($1,000 per trade)
	- 15m Results: -0.83% return with 1 trade (conservative strategy)
	- Analysis: Models show conservative behavior to avoid overtrading

	### Key Insights
	- ✅ Successfully scaled daily model architecture to intraday timeframes
	- ✅ Technical indicators remain most important across all timeframes
	- ✅ Conservative prediction thresholds prevent excessive trading
	- ⚠️ Limited historical data affects backtesting statistical significance
	- ⚠️ Yahoo Finance API constraints limit comprehensive validation

	### Files Added
	- `train_multi_timeframe.py` - Multi-timeframe model training script
	- `backtest_multi_timeframe.py` - Intraday model backtesting framework
	- `multi_timeframe_summary.py` - Comprehensive performance analysis
	- `trading_model_15m.pkl` - Trained 15-minute model
	- `feature_importance_15m.csv` - Feature importance analysis
	- `backtest_multi_timeframe_results.csv` - Backtesting performance data

	### Next Steps
	1. Complete 30m model training
	2. Implement walk-forward optimization
	3. Add extended historical data sources
	4. Deploy best performing intraday model
	5. Compare intraday vs daily performance

	## Model Description

	This is an AI-powered trading model for XAUUSD (Gold vs US Dollar) futures, trained using Smart Money Concepts (SMC) strategy elements. The model uses machine learning to predict 5-day ahead price movements and generate trading signals with high win rates.

	### Key Features
	- Asset: XAUUSD (Gold Futures)
	- Strategy: Smart Money Concepts (SMC) with technical indicators
	- Prediction Horizon: 5-day ahead price direction
	- Model Type: XGBoost Classifier

	## Romeo (V5) — Ensemble model

	Romeo (codename V5) is the latest ensemble model combining tree-based learners (XGBoost / LightGBM) and an optional Keras head. The artifacts live in `models_romeo/` and include a canonical feature list used by the backtester to align unseen data.

	Artifacts
	- `models_romeo/trading_model_romeo_daily.pkl` — ensemble artifact (joblib) with `models`, `weights`, and `features` keys.
	- `models_romeo/romeo_keras_daily.keras` — optional Keras model file when included in training.
	- `models_romeo/MODEL_CARD.md` — this model's card with evaluation and transparency notes.

	Evaluation (selected run on unseen daily data)
	- Initial capital: 100
	- Final capital: 484.8199
	- CAGR: 0.0444
	- Annual volatility: 0.4118
	- Sharpe: 0.3119
	- Max Drawdown: -47.66%
	- Total trades: 3610
	- Win rate: 49.47%

	Uploading to Hugging Face
	-------------------------
	There is a helper script to upload the model artifacts to Hugging Face Hub:

	1. Install dependencies:
	```bash
	pip install huggingface_hub
	```

	2. Set your HF token in the environment (Windows cmd.exe):
	```cmd
	set HF_TOKEN=hf_YourTokenHere
	```

	3. Upload:
	```cmd
	python v5\upload_model_v5_to_hf.py --repo-name your-username/romeo-v5 --model-dir models_romeo
	```

	The script will create the repo (if it doesn't exist) and upload all files from `models_romeo/`.

	Usage example
	-------------
	Load the artifact and run predictions:

	```python
	import joblib
	artifact = joblib.load('models_romeo/trading_model_romeo_daily.pkl')
	features = artifact['features']
	# prepare X matching features
	# model usage depends on artifact['models'] layout; check MODEL_CARD.md for details
	```

	Notes & Next Steps
	------------------
	- Position sizing is simplified in the backtester; consider implementing fixed-risk sizing before live use.
	- Consider re-running the robustness scan using the M2M metric as primary evaluation (recommended).

	- Accuracy: 80.3% on test data
	- Win Rate: 85.4% in backtesting

	## Intended Use

	This model is designed for:
	- Educational purposes in algorithmic trading
	- Research on SMC strategies
	- Backtesting trading strategies
	- Understanding ML applications in financial markets

	⚠️ Warning: This is not financial advice. Trading involves risk of loss. Use at your own discretion.

	## Training Data

	- Source: Yahoo Finance (GC=F - Gold Futures)
	- Period: 2000-2020 (excluding recent months for efficiency)
	- Features: 23 features including:
	- Price data (Open, High, Low, Close, Volume)
	- Technical indicators (SMA, EMA, RSI, MACD, Bollinger Bands)
	- SMC features (Fair Value Gaps, Order Blocks, Recovery patterns)
	- Lag features (Close prices from previous days)
	- Target: Binary classification (1 if price rises in 5 days, 0 otherwise)
	- Dataset Size: 8,816 samples
	- Class Distribution: 54% down, 46% up (balanced with scale_pos_weight)

	## Performance Metrics

	### Model Performance
	- Accuracy: 80.3%
	- Precision (Class 1): 71%
	- Recall (Class 1): 81%
	- F1-Score: 76%

	### Backtesting Results (2015-2020)
	- Overall Win Rate: 85.4%
	- Total Return: 18.2%
	- Sharpe Ratio: 1.41
	- Yearly Win Rates:
	- 2015: 62.5%
	- 2016: 100.0%
	- 2017: 100.0%
	- 2018: 72.7%
	- 2019: 76.9%
	- 2020: 94.1%

	## Limitations

	- Trained on historical data only (2000-2020)
	- May not perform well in unprecedented market conditions
	- Requires proper risk management
	- No consideration of transaction costs, slippage, or market impact
	- Model predictions are probabilistic, not guaranteed

	## Usage

	### Prerequisites
	```python
	pip install joblib scikit-learn pandas numpy
	```

	### Loading the Model
	```python
	import joblib
	import pandas as pd
	from sklearn.preprocessing import StandardScaler

	# Load model
	model = joblib.load('trading_model.pkl')

	# Load scalers (you need to recreate or save them)
	# ... preprocessing code ...

	# Prepare features
	features = prepare_features(your_data)
	prediction = model.predict(features)
	probability = model.predict_proba(features)
	```

	### Features Required
	The model expects 23 features in this order:
	1. Close
	2. High
	3. Low
	4. Open
	5. Volume
	6. SMA_20
	7. SMA_50
	8. EMA_12
	9. EMA_26
	10. RSI
	11. MACD
	12. MACD_signal
	13. MACD_hist
	14. BB_upper
	15. BB_middle
	16. BB_lower
	17. FVG_Size
	18. FVG_Type_Encoded
	19. OB_Type_Encoded
	20. Recovery_Type_Encoded
	21. Close_lag1
	22. Close_lag2
	23. Close_lag3

	## Training Details

	- Algorithm: XGBoost Classifier
	- Hyperparameters:
	- n_estimators: 200
	- max_depth: 7
	- learning_rate: 0.2
	- scale_pos_weight: 1.17 (for class balancing)
	- Cross-validation: 3-fold
	- Optimization: Grid search on hyperparameters

	## SMC Strategy Elements

	The model incorporates Smart Money Concepts:
	- Fair Value Gaps (FVG): Price imbalances between candles
	- Order Blocks (OB): Areas of significant buying/selling
	- Recovery Patterns: Pullbacks in trending markets

	## Upload to Hugging Face

	To share this model on Hugging Face:

	1. Create a Hugging Face account at https://huggingface.co/join
	2. Generate an access token at https://huggingface.co/settings/tokens with "Write" permissions
	3. Test your token: `python test_token.py YOUR_TOKEN`
	4. Upload: `python upload_to_hf.py YOUR_TOKEN`

	The script will upload:
	- `trading_model.pkl` - The trained XGBoost model
	- `README.md` - This model card with metadata
	- All dataset files (CSV format)

	## Citation

	If you use this model in your research, please cite:

	```
	@misc{xauusd-trading-ai,
	title={XAUUSD Trading AI Model with SMC Strategy},
	author={AI Trading System},
	year={2025},
	url={https://huggingface.co/JonusNattapong/xauusd-trading-ai-smc}
	}
	```

	### Academic Paper
	For the complete academic research paper with methodology, results, and analysis:

	arXiv Paper: [XAUUSD Trading AI: A Machine Learning Approach Using Smart Money Concepts](https://arxiv.org/abs/XXXX.XXXXX)

	## License

	This model is released under the MIT License. See LICENSE file for details.

	## Contact

	For questions or issues, please open an issue on the Hugging Face repository.