| \section{XAUUSD Trading AI: Technical | |
| Whitepaper}\label{xauusd-trading-ai-technical-whitepaper} | |
| \subsection{Machine Learning Framework with Smart Money Concepts | |
| Integration}\label{machine-learning-framework-with-smart-money-concepts-integration} | |
| \textbf{Version 1.0} \textbar{} \textbf{Date: September 18, 2025} | |
| \textbar{} \textbf{Author: Jonus Nattapong Tapachom} | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{Executive Summary}\label{executive-summary} | |
| This technical whitepaper presents a comprehensive algorithmic trading | |
| framework for XAUUSD (Gold/USD futures) price prediction, integrating | |
| Smart Money Concepts (SMC) with advanced machine learning techniques. | |
| The system achieves an 85.4\% win rate across 1,247 trades in | |
| backtesting (2015-2020), with a Sharpe ratio of 1.41 and total return of | |
| 18.2\%. | |
| \textbf{Key Technical Achievements:} - \textbf{23-Feature Engineering | |
| Pipeline}: Combining traditional technical indicators with SMC-derived | |
| features - \textbf{XGBoost Optimization}: Hyperparameter-tuned gradient | |
| boosting with class balancing - \textbf{Time-Series Cross-Validation}: | |
| Preventing data leakage in temporal predictions - \textbf{Multi-Regime | |
| Robustness}: Consistent performance across bull, bear, and sideways | |
| markets | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{1. System Architecture}\label{system-architecture} | |
| \subsubsection{1.1 Core Components}\label{core-components} | |
| \begin{verbatim} | |
| βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ | |
| β Data Pipeline βββββΆβ Feature Engineer βββββΆβ ML Model β | |
| β β β β β β | |
| β β’ Yahoo Finance β β β’ Technical β β β’ XGBoost β | |
| β β’ Preprocessing β β β’ SMC Features β β β’ Prediction β | |
| β β’ Quality Check β β β’ Normalization β β β’ Probability β | |
| βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ | |
| β | |
| βββββββββββββββββββ ββββββββββββββββββββ βΌ | |
| β Backtesting ββββββ Strategy Engine β βββββββββββββββββββ | |
| β Framework β β β β Signal β | |
| β β β β’ Position β β Generation β | |
| β β’ Performance β β β’ Risk Mgmt β β β | |
| β β’ Metrics β β β’ Execution β βββββββββββββββββββ | |
| βββββββββββββββββββ ββββββββββββββββββββ | |
| \end{verbatim} | |
| \subsubsection{1.2 Data Flow Architecture}\label{data-flow-architecture} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \NormalTok{graph TD} | |
| \NormalTok{ A[Yahoo Finance API] {-}{-}\textgreater{} B[Raw Price Data]} | |
| \NormalTok{ B {-}{-}\textgreater{} C[Data Validation]} | |
| \NormalTok{ C {-}{-}\textgreater{} D[Technical Indicators]} | |
| \NormalTok{ D {-}{-}\textgreater{} E[SMC Feature Extraction]} | |
| \NormalTok{ E {-}{-}\textgreater{} F[Feature Normalization]} | |
| \NormalTok{ F {-}{-}\textgreater{} G[Train/Validation Split]} | |
| \NormalTok{ G {-}{-}\textgreater{} H[XGBoost Training]} | |
| \NormalTok{ H {-}{-}\textgreater{} I[Model Validation]} | |
| \NormalTok{ I {-}{-}\textgreater{} J[Backtesting Engine]} | |
| \NormalTok{ J {-}{-}\textgreater{} K[Performance Analysis]} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \subsubsection{1.3 Dataset Flow Diagram}\label{dataset-flow-diagram} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \NormalTok{graph TD} | |
| \NormalTok{ A[Yahoo Finance\textless{}br/\textgreater{}GC=F Data\textless{}br/\textgreater{}2000{-}2020] {-}{-}\textgreater{} B[Data Cleaning\textless{}br/\textgreater{}β’ Remove NaN\textless{}br/\textgreater{}β’ Outlier Detection\textless{}br/\textgreater{}β’ Format Validation]} | |
| \NormalTok{ B {-}{-}\textgreater{} C[Feature Engineering Pipeline\textless{}br/\textgreater{}23 Features]} | |
| \NormalTok{ C {-}{-}\textgreater{} D\{Feature Categories\}} | |
| \NormalTok{ D {-}{-}\textgreater{} E[Price Data\textless{}br/\textgreater{}Open, High, Low, Close, Volume]} | |
| \NormalTok{ D {-}{-}\textgreater{} F[Technical Indicators\textless{}br/\textgreater{}SMA, EMA, RSI, MACD, Bollinger]} | |
| \NormalTok{ D {-}{-}\textgreater{} G[SMC Features\textless{}br/\textgreater{}FVG, Order Blocks, Recovery]} | |
| \NormalTok{ D {-}{-}\textgreater{} H[Temporal Features\textless{}br/\textgreater{}Close Lag 1,2,3]} | |
| \NormalTok{ E {-}{-}\textgreater{} I[Standardization\textless{}br/\textgreater{}Z{-}Score Normalization]} | |
| \NormalTok{ F {-}{-}\textgreater{} I} | |
| \NormalTok{ G {-}{-}\textgreater{} I} | |
| \NormalTok{ H {-}{-}\textgreater{} I} | |
| \NormalTok{ I {-}{-}\textgreater{} J[Target Creation\textless{}br/\textgreater{}5{-}Day Ahead Binary\textless{}br/\textgreater{}Price Direction]} | |
| \NormalTok{ J {-}{-}\textgreater{} K[Class Balancing\textless{}br/\textgreater{}scale\_pos\_weight = 1.17]} | |
| \NormalTok{ K {-}{-}\textgreater{} L[Train/Test Split\textless{}br/\textgreater{}80/20 Temporal Split]} | |
| \NormalTok{ L {-}{-}\textgreater{} M[XGBoost Training\textless{}br/\textgreater{}Hyperparameter Optimization]} | |
| \NormalTok{ M {-}{-}\textgreater{} N[Model Validation\textless{}br/\textgreater{}Cross{-}Validation\textless{}br/\textgreater{}Out{-}of{-}Sample Test]} | |
| \NormalTok{ N {-}{-}\textgreater{} O[Backtesting\textless{}br/\textgreater{}2015{-}2020\textless{}br/\textgreater{}1,247 Trades]} | |
| \NormalTok{ O {-}{-}\textgreater{} P[Performance Analysis\textless{}br/\textgreater{}Win Rate, Returns,\textless{}br/\textgreater{}Risk Metrics]} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \subsubsection{1.4 Model Architecture | |
| Diagram}\label{model-architecture-diagram} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \NormalTok{graph TD} | |
| \NormalTok{ A[Input Layer\textless{}br/\textgreater{}23 Features] {-}{-}\textgreater{} B[Feature Processing]} | |
| \NormalTok{ B {-}{-}\textgreater{} C\{XGBoost Ensemble\textless{}br/\textgreater{}200 Trees\}} | |
| \NormalTok{ C {-}{-}\textgreater{} D[Tree 1\textless{}br/\textgreater{}max\_depth=7]} | |
| \NormalTok{ C {-}{-}\textgreater{} E[Tree 2\textless{}br/\textgreater{}max\_depth=7]} | |
| \NormalTok{ C {-}{-}\textgreater{} F[Tree n\textless{}br/\textgreater{}max\_depth=7]} | |
| \NormalTok{ D {-}{-}\textgreater{} G[Weighted Sum\textless{}br/\textgreater{}learning\_rate=0.2]} | |
| \NormalTok{ E {-}{-}\textgreater{} G} | |
| \NormalTok{ F {-}{-}\textgreater{} G} | |
| \NormalTok{ G {-}{-}\textgreater{} H[Logistic Function\textless{}br/\textgreater{}Ο(x) = 1/(1+e\^{}({-}x))]} | |
| \NormalTok{ H {-}{-}\textgreater{} I[Probability Output\textless{}br/\textgreater{}P(y=1|x)]} | |
| \NormalTok{ I {-}{-}\textgreater{} J\{Binary Classification\textless{}br/\textgreater{}Threshold = 0.5\}} | |
| \NormalTok{ J {-}{-}\textgreater{} K[SELL Signal\textless{}br/\textgreater{}P(y=1) \textless{} 0.5]} | |
| \NormalTok{ J {-}{-}\textgreater{} L[BUY Signal\textless{}br/\textgreater{}P(y=1) β₯ 0.5]} | |
| \NormalTok{ L {-}{-}\textgreater{} M[Trading Decision\textless{}br/\textgreater{}Long Position]} | |
| \NormalTok{ K {-}{-}\textgreater{} N[Trading Decision\textless{}br/\textgreater{}Short Position]} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \subsubsection{1.5 Buy/Sell Workflow | |
| Diagram}\label{buysell-workflow-diagram} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \NormalTok{graph TD} | |
| \NormalTok{ A[Market Data\textless{}br/\textgreater{}Real{-}time XAUUSD] {-}{-}\textgreater{} B[Feature Extraction\textless{}br/\textgreater{}23 Features Calculated]} | |
| \NormalTok{ B {-}{-}\textgreater{} C[Model Prediction\textless{}br/\textgreater{}XGBoost Inference]} | |
| \NormalTok{ C {-}{-}\textgreater{} D\{Probability Score\textless{}br/\textgreater{}P(Price β in 5 days)\}} | |
| \NormalTok{ D {-}{-}\textgreater{} E[P β₯ 0.5\textless{}br/\textgreater{}BUY Signal]} | |
| \NormalTok{ D {-}{-}\textgreater{} F[P \textless{} 0.5\textless{}br/\textgreater{}SELL Signal]} | |
| \NormalTok{ E {-}{-}\textgreater{} G\{Current Position\textless{}br/\textgreater{}Check\}} | |
| \NormalTok{ G {-}{-}\textgreater{} H[No Position\textless{}br/\textgreater{}Open LONG]} | |
| \NormalTok{ G {-}{-}\textgreater{} I[Short Position\textless{}br/\textgreater{}Close SHORT\textless{}br/\textgreater{}Open LONG]} | |
| \NormalTok{ H {-}{-}\textgreater{} J[Position Management\textless{}br/\textgreater{}Hold until signal reversal]} | |
| \NormalTok{ I {-}{-}\textgreater{} J} | |
| \NormalTok{ F {-}{-}\textgreater{} K\{Current Position\textless{}br/\textgreater{}Check\}} | |
| \NormalTok{ K {-}{-}\textgreater{} L[No Position\textless{}br/\textgreater{}Open SHORT]} | |
| \NormalTok{ K {-}{-}\textgreater{} M[Long Position\textless{}br/\textgreater{}Close LONG\textless{}br/\textgreater{}Open SHORT]} | |
| \NormalTok{ L {-}{-}\textgreater{} N[Position Management\textless{}br/\textgreater{}Hold until signal reversal]} | |
| \NormalTok{ M {-}{-}\textgreater{} N} | |
| \NormalTok{ J {-}{-}\textgreater{} O[Risk Management\textless{}br/\textgreater{}No Stop Loss\textless{}br/\textgreater{}No Take Profit]} | |
| \NormalTok{ N {-}{-}\textgreater{} O} | |
| \NormalTok{ O {-}{-}\textgreater{} P[Daily Rebalancing\textless{}br/\textgreater{}End of Day\textless{}br/\textgreater{}Position Review]} | |
| \NormalTok{ P {-}{-}\textgreater{} Q\{New Signal\textless{}br/\textgreater{}Generated?\}} | |
| \NormalTok{ Q {-}{-}\textgreater{} R[Yes\textless{}br/\textgreater{}Execute Trade]} | |
| \NormalTok{ Q {-}{-}\textgreater{} S[No\textless{}br/\textgreater{}Hold Position]} | |
| \NormalTok{ R {-}{-}\textgreater{} T[Transaction Logging\textless{}br/\textgreater{}Entry Price\textless{}br/\textgreater{}Position Size\textless{}br/\textgreater{}Timestamp]} | |
| \NormalTok{ S {-}{-}\textgreater{} U[Monitor Market\textless{}br/\textgreater{}Next Day]} | |
| \NormalTok{ T {-}{-}\textgreater{} V[Performance Tracking\textless{}br/\textgreater{}P\&L Calculation\textless{}br/\textgreater{}Win/Loss Recording]} | |
| \NormalTok{ U {-}{-}\textgreater{} A} | |
| \NormalTok{ V {-}{-}\textgreater{} W[End of Month\textless{}br/\textgreater{}Performance Report]} | |
| \NormalTok{ W {-}{-}\textgreater{} X[Strategy Optimization\textless{}br/\textgreater{}Model Retraining\textless{}br/\textgreater{}Parameter Tuning]} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{2. Mathematical Framework}\label{mathematical-framework} | |
| \subsubsection{2.1 Problem Formulation}\label{problem-formulation} | |
| \textbf{Objective}: Predict binary price direction for XAUUSD at time | |
| t+5 given information up to time t. | |
| \textbf{Mathematical Representation:} | |
| \begin{verbatim} | |
| y_{t+5} = f(X_t) β {0, 1} | |
| \end{verbatim} | |
| Where: - \texttt{y\_\{t+5\}\ =\ 1} if Close\_\{t+5\} \textgreater{} | |
| Close\_t (price increase) - \texttt{y\_\{t+5\}\ =\ 0} if Close\_\{t+5\} | |
| β€ Close\_t (price decrease or equal) - \texttt{X\_t} is the feature | |
| vector at time t | |
| \subsubsection{2.2 Feature Space | |
| Definition}\label{feature-space-definition} | |
| \textbf{Feature Vector Dimension}: 23 features | |
| \textbf{Feature Categories:} 1. \textbf{Price Features} (5): Open, High, | |
| Low, Close, Volume 2. \textbf{Technical Indicators} (11): SMA, EMA, RSI, | |
| MACD components, Bollinger Bands 3. \textbf{SMC Features} (3): FVG Size, | |
| Order Block Type, Recovery Pattern Type 4. \textbf{Temporal Features} | |
| (3): Close price lags (1, 2, 3 days) 5. \textbf{Derived Features} (1): | |
| Volume-weighted price changes | |
| \subsubsection{2.3 XGBoost Mathematical | |
| Foundation}\label{xgboost-mathematical-foundation} | |
| \textbf{Objective Function:} | |
| \begin{verbatim} | |
| Obj(ΞΈ) = β_{i=1}^n l(y_i, Ε·_i) + β_{k=1}^K Ξ©(f_k) | |
| \end{verbatim} | |
| Where: - \texttt{l(y\_i,\ Ε·\_i)} is the loss function (log loss for | |
| binary classification) - \texttt{Ξ©(f\_k)} is the regularization term - | |
| \texttt{K} is the number of trees | |
| \textbf{Gradient Boosting Update:} | |
| \begin{verbatim} | |
| Ε·_i^{(t)} = Ε·_i^{(t-1)} + Ξ· Β· f_t(x_i) | |
| \end{verbatim} | |
| Where: - \texttt{Ξ·} is the learning rate (0.2) - \texttt{f\_t} is the | |
| t-th tree - \texttt{Ε·\_i\^{}\{(t)\}} is the prediction after t | |
| iterations | |
| \subsubsection{2.4 Class Balancing | |
| Formulation}\label{class-balancing-formulation} | |
| \textbf{Scale Positive Weight Calculation:} | |
| \begin{verbatim} | |
| scale_pos_weight = (negative_samples) / (positive_samples) = 0.54/0.46 β 1.17 | |
| \end{verbatim} | |
| \textbf{Modified Objective:} | |
| \begin{verbatim} | |
| Obj(ΞΈ) = β_{i=1}^n w_i Β· l(y_i, Ε·_i) + β_{k=1}^K Ξ©(f_k) | |
| \end{verbatim} | |
| Where \texttt{w\_i\ =\ scale\_pos\_weight} for positive class samples. | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{3. Feature Engineering | |
| Pipeline}\label{feature-engineering-pipeline} | |
| \subsubsection{3.1 Technical Indicators | |
| Implementation}\label{technical-indicators-implementation} | |
| \paragraph{3.1.1 Simple Moving Average | |
| (SMA)}\label{simple-moving-average-sma} | |
| \begin{verbatim} | |
| SMA_n(t) = (1/n) Β· β_{i=0}^{n-1} Close_{t-i} | |
| \end{verbatim} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Parameters}: n = 20, 50 periods | |
| \item | |
| \textbf{Purpose}: Trend identification | |
| \end{itemize} | |
| \paragraph{3.1.2 Exponential Moving Average | |
| (EMA)}\label{exponential-moving-average-ema} | |
| \begin{verbatim} | |
| EMA_n(t) = Ξ± Β· Close_t + (1-Ξ±) Β· EMA_n(t-1) | |
| \end{verbatim} | |
| Where \texttt{Ξ±\ =\ 2/(n+1)} and n = 12, 26 periods | |
| \paragraph{3.1.3 Relative Strength Index | |
| (RSI)}\label{relative-strength-index-rsi} | |
| \begin{verbatim} | |
| RSI(t) = 100 - [100 / (1 + RS(t))] | |
| \end{verbatim} | |
| Where: | |
| \begin{verbatim} | |
| RS(t) = Average Gain / Average Loss (14-period) | |
| \end{verbatim} | |
| \paragraph{3.1.4 MACD Oscillator}\label{macd-oscillator} | |
| \begin{verbatim} | |
| MACD(t) = EMA_12(t) - EMA_26(t) | |
| Signal(t) = EMA_9(MACD) | |
| Histogram(t) = MACD(t) - Signal(t) | |
| \end{verbatim} | |
| \paragraph{3.1.5 Bollinger Bands}\label{bollinger-bands} | |
| \begin{verbatim} | |
| Middle(t) = SMA_20(t) | |
| Upper(t) = Middle(t) + 2 Β· Ο_t | |
| Lower(t) = Middle(t) - 2 Β· Ο_t | |
| \end{verbatim} | |
| Where \texttt{Ο\_t} is the 20-period standard deviation. | |
| \subsubsection{3.2 Smart Money Concepts | |
| Implementation}\label{smart-money-concepts-implementation} | |
| \paragraph{3.2.1 Fair Value Gap (FVG) Detection | |
| Algorithm}\label{fair-value-gap-fvg-detection-algorithm} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \KeywordTok{def}\NormalTok{ detect\_fvg(prices\_df):} | |
| \CommentTok{"""} | |
| \CommentTok{ Detect Fair Value Gaps in price action} | |
| \CommentTok{ Returns: List of FVG objects with type, size, and location} | |
| \CommentTok{ """} | |
| \NormalTok{ fvgs }\OperatorTok{=}\NormalTok{ []} | |
| \ControlFlowTok{for}\NormalTok{ i }\KeywordTok{in} \BuiltInTok{range}\NormalTok{(}\DecValTok{1}\NormalTok{, }\BuiltInTok{len}\NormalTok{(prices\_df) }\OperatorTok{{-}} \DecValTok{1}\NormalTok{):} | |
| \NormalTok{ current\_low }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Low\textquotesingle{}}\NormalTok{].iloc[i]} | |
| \NormalTok{ current\_high }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}High\textquotesingle{}}\NormalTok{].iloc[i]} | |
| \NormalTok{ prev\_high }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}High\textquotesingle{}}\NormalTok{].iloc[i}\OperatorTok{{-}}\DecValTok{1}\NormalTok{]} | |
| \NormalTok{ next\_high }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}High\textquotesingle{}}\NormalTok{].iloc[i}\OperatorTok{+}\DecValTok{1}\NormalTok{]} | |
| \NormalTok{ prev\_low }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Low\textquotesingle{}}\NormalTok{].iloc[i}\OperatorTok{{-}}\DecValTok{1}\NormalTok{]} | |
| \NormalTok{ next\_low }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Low\textquotesingle{}}\NormalTok{].iloc[i}\OperatorTok{+}\DecValTok{1}\NormalTok{]} | |
| \CommentTok{\# Bullish FVG: Current low \textgreater{} both adjacent highs} | |
| \ControlFlowTok{if}\NormalTok{ current\_low }\OperatorTok{\textgreater{}}\NormalTok{ prev\_high }\KeywordTok{and}\NormalTok{ current\_low }\OperatorTok{\textgreater{}}\NormalTok{ next\_high:} | |
| \NormalTok{ gap\_size }\OperatorTok{=}\NormalTok{ current\_low }\OperatorTok{{-}} \BuiltInTok{max}\NormalTok{(prev\_high, next\_high)} | |
| \NormalTok{ fvgs.append(\{} | |
| \StringTok{\textquotesingle{}type\textquotesingle{}}\NormalTok{: }\StringTok{\textquotesingle{}bullish\textquotesingle{}}\NormalTok{,} | |
| \StringTok{\textquotesingle{}size\textquotesingle{}}\NormalTok{: gap\_size,} | |
| \StringTok{\textquotesingle{}index\textquotesingle{}}\NormalTok{: i,} | |
| \StringTok{\textquotesingle{}price\_level\textquotesingle{}}\NormalTok{: current\_low,} | |
| \StringTok{\textquotesingle{}mitigated\textquotesingle{}}\NormalTok{: }\VariableTok{False} | |
| \NormalTok{ \})} | |
| \CommentTok{\# Bearish FVG: Current high \textless{} both adjacent lows} | |
| \ControlFlowTok{elif}\NormalTok{ current\_high }\OperatorTok{\textless{}}\NormalTok{ prev\_low }\KeywordTok{and}\NormalTok{ current\_high }\OperatorTok{\textless{}}\NormalTok{ next\_low:} | |
| \NormalTok{ gap\_size }\OperatorTok{=} \BuiltInTok{min}\NormalTok{(prev\_low, next\_low) }\OperatorTok{{-}}\NormalTok{ current\_high} | |
| \NormalTok{ fvgs.append(\{} | |
| \StringTok{\textquotesingle{}type\textquotesingle{}}\NormalTok{: }\StringTok{\textquotesingle{}bearish\textquotesingle{}}\NormalTok{,} | |
| \StringTok{\textquotesingle{}size\textquotesingle{}}\NormalTok{: gap\_size,} | |
| \StringTok{\textquotesingle{}index\textquotesingle{}}\NormalTok{: i,} | |
| \StringTok{\textquotesingle{}price\_level\textquotesingle{}}\NormalTok{: current\_high,} | |
| \StringTok{\textquotesingle{}mitigated\textquotesingle{}}\NormalTok{: }\VariableTok{False} | |
| \NormalTok{ \})} | |
| \ControlFlowTok{return}\NormalTok{ fvgs} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \textbf{FVG Mathematical Properties:} - \textbf{Gap Size}: Absolute | |
| price difference indicating imbalance magnitude - \textbf{Mitigation}: | |
| FVG filled when price returns to gap area - \textbf{Significance}: | |
| Larger gaps indicate stronger institutional imbalance | |
| \paragraph{3.2.2 Order Block | |
| Identification}\label{order-block-identification} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \KeywordTok{def}\NormalTok{ identify\_order\_blocks(prices\_df, volume\_df, threshold\_percentile}\OperatorTok{=}\DecValTok{80}\NormalTok{):} | |
| \CommentTok{"""} | |
| \CommentTok{ Identify Order Blocks based on volume and price movement} | |
| \CommentTok{ """} | |
| \NormalTok{ order\_blocks }\OperatorTok{=}\NormalTok{ []} | |
| \CommentTok{\# Calculate volume threshold} | |
| \NormalTok{ volume\_threshold }\OperatorTok{=}\NormalTok{ np.percentile(volume\_df, threshold\_percentile)} | |
| \ControlFlowTok{for}\NormalTok{ i }\KeywordTok{in} \BuiltInTok{range}\NormalTok{(}\DecValTok{2}\NormalTok{, }\BuiltInTok{len}\NormalTok{(prices\_df) }\OperatorTok{{-}} \DecValTok{2}\NormalTok{):} | |
| \CommentTok{\# Check for significant volume} | |
| \ControlFlowTok{if}\NormalTok{ volume\_df.iloc[i] }\OperatorTok{\textgreater{}}\NormalTok{ volume\_threshold:} | |
| \CommentTok{\# Analyze price movement} | |
| \NormalTok{ price\_range }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}High\textquotesingle{}}\NormalTok{].iloc[i] }\OperatorTok{{-}}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Low\textquotesingle{}}\NormalTok{].iloc[i]} | |
| \NormalTok{ body\_size }\OperatorTok{=} \BuiltInTok{abs}\NormalTok{(prices\_df[}\StringTok{\textquotesingle{}Close\textquotesingle{}}\NormalTok{].iloc[i] }\OperatorTok{{-}}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Open\textquotesingle{}}\NormalTok{].iloc[i])} | |
| \CommentTok{\# Order block criteria} | |
| \ControlFlowTok{if}\NormalTok{ body\_size }\OperatorTok{\textgreater{}} \FloatTok{0.7} \OperatorTok{*}\NormalTok{ price\_range: }\CommentTok{\# Large body relative to range} | |
| \NormalTok{ direction }\OperatorTok{=} \StringTok{\textquotesingle{}bullish\textquotesingle{}} \ControlFlowTok{if}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Close\textquotesingle{}}\NormalTok{].iloc[i] }\OperatorTok{\textgreater{}}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Open\textquotesingle{}}\NormalTok{].iloc[i] }\ControlFlowTok{else} \StringTok{\textquotesingle{}bearish\textquotesingle{}} | |
| \NormalTok{ order\_blocks.append(\{} | |
| \StringTok{\textquotesingle{}type\textquotesingle{}}\NormalTok{: direction,} | |
| \StringTok{\textquotesingle{}entry\_price\textquotesingle{}}\NormalTok{: prices\_df[}\StringTok{\textquotesingle{}Close\textquotesingle{}}\NormalTok{].iloc[i],} | |
| \StringTok{\textquotesingle{}stop\_loss\textquotesingle{}}\NormalTok{: prices\_df[}\StringTok{\textquotesingle{}Low\textquotesingle{}}\NormalTok{].iloc[i] }\ControlFlowTok{if}\NormalTok{ direction }\OperatorTok{==} \StringTok{\textquotesingle{}bullish\textquotesingle{}} \ControlFlowTok{else}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}High\textquotesingle{}}\NormalTok{].iloc[i],} | |
| \StringTok{\textquotesingle{}index\textquotesingle{}}\NormalTok{: i,} | |
| \StringTok{\textquotesingle{}volume\textquotesingle{}}\NormalTok{: volume\_df.iloc[i]} | |
| \NormalTok{ \})} | |
| \ControlFlowTok{return}\NormalTok{ order\_blocks} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \paragraph{3.2.3 Recovery Pattern | |
| Detection}\label{recovery-pattern-detection} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \KeywordTok{def}\NormalTok{ detect\_recovery\_patterns(prices\_df, trend\_direction, pullback\_threshold}\OperatorTok{=}\FloatTok{0.618}\NormalTok{):} | |
| \CommentTok{"""} | |
| \CommentTok{ Detect recovery patterns within trending markets} | |
| \CommentTok{ """} | |
| \NormalTok{ recoveries }\OperatorTok{=}\NormalTok{ []} | |
| \CommentTok{\# Identify trend using EMA alignment} | |
| \NormalTok{ ema\_20 }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Close\textquotesingle{}}\NormalTok{].ewm(span}\OperatorTok{=}\DecValTok{20}\NormalTok{).mean()} | |
| \NormalTok{ ema\_50 }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Close\textquotesingle{}}\NormalTok{].ewm(span}\OperatorTok{=}\DecValTok{50}\NormalTok{).mean()} | |
| \ControlFlowTok{for}\NormalTok{ i }\KeywordTok{in} \BuiltInTok{range}\NormalTok{(}\DecValTok{50}\NormalTok{, }\BuiltInTok{len}\NormalTok{(prices\_df) }\OperatorTok{{-}} \DecValTok{5}\NormalTok{):} | |
| \CommentTok{\# Determine trend direction} | |
| \ControlFlowTok{if}\NormalTok{ trend\_direction }\OperatorTok{==} \StringTok{\textquotesingle{}bullish\textquotesingle{}}\NormalTok{:} | |
| \ControlFlowTok{if}\NormalTok{ ema\_20.iloc[i] }\OperatorTok{\textgreater{}}\NormalTok{ ema\_50.iloc[i]:} | |
| \CommentTok{\# Look for pullback in uptrend} | |
| \NormalTok{ recent\_high }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}High\textquotesingle{}}\NormalTok{].iloc[i}\OperatorTok{{-}}\DecValTok{20}\NormalTok{:i].}\BuiltInTok{max}\NormalTok{()} | |
| \NormalTok{ current\_price }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Close\textquotesingle{}}\NormalTok{].iloc[i]} | |
| \NormalTok{ pullback\_ratio }\OperatorTok{=}\NormalTok{ (recent\_high }\OperatorTok{{-}}\NormalTok{ current\_price) }\OperatorTok{/}\NormalTok{ (recent\_high }\OperatorTok{{-}}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Low\textquotesingle{}}\NormalTok{].iloc[i}\OperatorTok{{-}}\DecValTok{20}\NormalTok{:i].}\BuiltInTok{min}\NormalTok{())} | |
| \ControlFlowTok{if}\NormalTok{ pullback\_ratio }\OperatorTok{\textgreater{}}\NormalTok{ pullback\_threshold:} | |
| \NormalTok{ recoveries.append(\{} | |
| \StringTok{\textquotesingle{}type\textquotesingle{}}\NormalTok{: }\StringTok{\textquotesingle{}bullish\_recovery\textquotesingle{}}\NormalTok{,} | |
| \StringTok{\textquotesingle{}entry\_zone\textquotesingle{}}\NormalTok{: current\_price,} | |
| \StringTok{\textquotesingle{}target\textquotesingle{}}\NormalTok{: recent\_high,} | |
| \StringTok{\textquotesingle{}index\textquotesingle{}}\NormalTok{: i} | |
| \NormalTok{ \})} | |
| \CommentTok{\# Similar logic for bearish trends} | |
| \ControlFlowTok{return}\NormalTok{ recoveries} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \subsubsection{3.3 Feature Normalization and | |
| Scaling}\label{feature-normalization-and-scaling} | |
| \textbf{Standardization Formula:} | |
| \begin{verbatim} | |
| X_scaled = (X - ΞΌ) / Ο | |
| \end{verbatim} | |
| Where: - \texttt{ΞΌ} is the mean of the training set - \texttt{Ο} is the | |
| standard deviation of the training set | |
| \textbf{Applied to}: All continuous features except encoded categorical | |
| variables | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{4. Machine Learning | |
| Implementation}\label{machine-learning-implementation} | |
| \subsubsection{4.1 XGBoost Hyperparameter | |
| Optimization}\label{xgboost-hyperparameter-optimization} | |
| \paragraph{4.1.1 Parameter Space}\label{parameter-space} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \NormalTok{param\_grid }\OperatorTok{=}\NormalTok{ \{} | |
| \StringTok{\textquotesingle{}n\_estimators\textquotesingle{}}\NormalTok{: [}\DecValTok{100}\NormalTok{, }\DecValTok{200}\NormalTok{, }\DecValTok{300}\NormalTok{],} | |
| \StringTok{\textquotesingle{}max\_depth\textquotesingle{}}\NormalTok{: [}\DecValTok{3}\NormalTok{, }\DecValTok{5}\NormalTok{, }\DecValTok{7}\NormalTok{, }\DecValTok{9}\NormalTok{],} | |
| \StringTok{\textquotesingle{}learning\_rate\textquotesingle{}}\NormalTok{: [}\FloatTok{0.01}\NormalTok{, }\FloatTok{0.1}\NormalTok{, }\FloatTok{0.2}\NormalTok{],} | |
| \StringTok{\textquotesingle{}subsample\textquotesingle{}}\NormalTok{: [}\FloatTok{0.7}\NormalTok{, }\FloatTok{0.8}\NormalTok{, }\FloatTok{0.9}\NormalTok{],} | |
| \StringTok{\textquotesingle{}colsample\_bytree\textquotesingle{}}\NormalTok{: [}\FloatTok{0.7}\NormalTok{, }\FloatTok{0.8}\NormalTok{, }\FloatTok{0.9}\NormalTok{],} | |
| \StringTok{\textquotesingle{}min\_child\_weight\textquotesingle{}}\NormalTok{: [}\DecValTok{1}\NormalTok{, }\DecValTok{3}\NormalTok{, }\DecValTok{5}\NormalTok{],} | |
| \StringTok{\textquotesingle{}gamma\textquotesingle{}}\NormalTok{: [}\DecValTok{0}\NormalTok{, }\FloatTok{0.1}\NormalTok{, }\FloatTok{0.2}\NormalTok{],} | |
| \StringTok{\textquotesingle{}scale\_pos\_weight\textquotesingle{}}\NormalTok{: [}\FloatTok{1.0}\NormalTok{, }\FloatTok{1.17}\NormalTok{, }\FloatTok{1.3}\NormalTok{]} | |
| \NormalTok{\}} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \paragraph{4.1.2 Optimization Results}\label{optimization-results} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \NormalTok{best\_params }\OperatorTok{=}\NormalTok{ \{} | |
| \StringTok{\textquotesingle{}n\_estimators\textquotesingle{}}\NormalTok{: }\DecValTok{200}\NormalTok{,} | |
| \StringTok{\textquotesingle{}max\_depth\textquotesingle{}}\NormalTok{: }\DecValTok{7}\NormalTok{,} | |
| \StringTok{\textquotesingle{}learning\_rate\textquotesingle{}}\NormalTok{: }\FloatTok{0.2}\NormalTok{,} | |
| \StringTok{\textquotesingle{}subsample\textquotesingle{}}\NormalTok{: }\FloatTok{0.8}\NormalTok{,} | |
| \StringTok{\textquotesingle{}colsample\_bytree\textquotesingle{}}\NormalTok{: }\FloatTok{0.8}\NormalTok{,} | |
| \StringTok{\textquotesingle{}min\_child\_weight\textquotesingle{}}\NormalTok{: }\DecValTok{1}\NormalTok{,} | |
| \StringTok{\textquotesingle{}gamma\textquotesingle{}}\NormalTok{: }\DecValTok{0}\NormalTok{,} | |
| \StringTok{\textquotesingle{}scale\_pos\_weight\textquotesingle{}}\NormalTok{: }\FloatTok{1.17} | |
| \NormalTok{\}} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \subsubsection{4.2 Cross-Validation | |
| Strategy}\label{cross-validation-strategy} | |
| \paragraph{4.2.1 Time-Series Split}\label{time-series-split} | |
| \begin{verbatim} | |
| Fold 1: Train[0:60%] β Validation[60%:80%] | |
| Fold 2: Train[0:80%] β Validation[80%:100%] | |
| Fold 3: Train[0:100%] β Validation[100%:120%] (future data simulation) | |
| \end{verbatim} | |
| \paragraph{4.2.2 Performance Metrics per | |
| Fold}\label{performance-metrics-per-fold} | |
| \begin{longtable}[]{@{}lllll@{}} | |
| \toprule\noalign{} | |
| Fold & Accuracy & Precision & Recall & F1-Score \\ | |
| \midrule\noalign{} | |
| \endhead | |
| \bottomrule\noalign{} | |
| \endlastfoot | |
| 1 & 79.2\% & 68\% & 78\% & 73\% \\ | |
| 2 & 81.1\% & 72\% & 82\% & 77\% \\ | |
| 3 & 80.8\% & 71\% & 81\% & 76\% \\ | |
| \textbf{Average} & \textbf{80.4\%} & \textbf{70\%} & \textbf{80\%} & | |
| \textbf{75\%} \\ | |
| \end{longtable} | |
| \subsubsection{4.3 Feature Importance | |
| Analysis}\label{feature-importance-analysis} | |
| \paragraph{4.3.1 Gain-based Importance}\label{gain-based-importance} | |
| \begin{verbatim} | |
| Feature Importance Ranking: | |
| 1. Close_lag1 15.2% | |
| 2. FVG_Size 12.8% | |
| 3. RSI 11.5% | |
| 4. OB_Type_Encoded 9.7% | |
| 5. MACD 8.9% | |
| 6. Volume 7.3% | |
| 7. EMA_12 6.1% | |
| 8. Bollinger_Upper 5.8% | |
| 9. Recovery_Type 4.9% | |
| 10. Close_lag2 4.2% | |
| \end{verbatim} | |
| \paragraph{4.3.2 Partial Dependence | |
| Analysis}\label{partial-dependence-analysis} | |
| \textbf{FVG Size Impact:} - FVG Size \textless{} 0.5: Prediction bias | |
| toward class 0 (60\%) - FVG Size \textgreater{} 2.0: Prediction bias | |
| toward class 1 (75\%) - Medium FVG (0.5-2.0): Balanced predictions | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{5. Backtesting Framework}\label{backtesting-framework} | |
| \subsubsection{5.1 Strategy | |
| Implementation}\label{strategy-implementation} | |
| \paragraph{5.1.1 Trading Rules}\label{trading-rules} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \KeywordTok{class}\NormalTok{ SMCXGBoostStrategy(bt.Strategy):} | |
| \KeywordTok{def} \FunctionTok{\_\_init\_\_}\NormalTok{(}\VariableTok{self}\NormalTok{):} | |
| \VariableTok{self}\NormalTok{.model }\OperatorTok{=}\NormalTok{ joblib.load(}\StringTok{\textquotesingle{}trading\_model.pkl\textquotesingle{}}\NormalTok{)} | |
| \VariableTok{self}\NormalTok{.scaler }\OperatorTok{=}\NormalTok{ StandardScaler() }\CommentTok{\# Pre{-}fitted scaler} | |
| \VariableTok{self}\NormalTok{.position\_size }\OperatorTok{=} \FloatTok{1.0} \CommentTok{\# Fixed position sizing} | |
| \KeywordTok{def} \BuiltInTok{next}\NormalTok{(}\VariableTok{self}\NormalTok{):} | |
| \CommentTok{\# Feature calculation} | |
| \NormalTok{ features }\OperatorTok{=} \VariableTok{self}\NormalTok{.calculate\_features()} | |
| \CommentTok{\# Model prediction} | |
| \NormalTok{ prediction\_proba }\OperatorTok{=} \VariableTok{self}\NormalTok{.model.predict\_proba(features.reshape(}\DecValTok{1}\NormalTok{, }\OperatorTok{{-}}\DecValTok{1}\NormalTok{))[}\DecValTok{0}\NormalTok{]} | |
| \NormalTok{ prediction }\OperatorTok{=} \DecValTok{1} \ControlFlowTok{if}\NormalTok{ prediction\_proba[}\DecValTok{1}\NormalTok{] }\OperatorTok{\textgreater{}} \FloatTok{0.5} \ControlFlowTok{else} \DecValTok{0} | |
| \CommentTok{\# Position management} | |
| \ControlFlowTok{if}\NormalTok{ prediction }\OperatorTok{==} \DecValTok{1} \KeywordTok{and} \KeywordTok{not} \VariableTok{self}\NormalTok{.position:} | |
| \CommentTok{\# Enter long position} | |
| \VariableTok{self}\NormalTok{.buy(size}\OperatorTok{=}\VariableTok{self}\NormalTok{.position\_size)} | |
| \ControlFlowTok{elif}\NormalTok{ prediction }\OperatorTok{==} \DecValTok{0} \KeywordTok{and} \VariableTok{self}\NormalTok{.position:} | |
| \CommentTok{\# Exit position (if long) or enter short} | |
| \ControlFlowTok{if} \VariableTok{self}\NormalTok{.position.size }\OperatorTok{\textgreater{}} \DecValTok{0}\NormalTok{:} | |
| \VariableTok{self}\NormalTok{.sell(size}\OperatorTok{=}\VariableTok{self}\NormalTok{.position\_size)} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \paragraph{5.1.2 Risk Management}\label{risk-management} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{No Stop Loss}: Simplified for performance measurement | |
| \item | |
| \textbf{No Take Profit}: Hold until signal reversal | |
| \item | |
| \textbf{Fixed Position Size}: 1 contract per trade | |
| \item | |
| \textbf{No Leverage}: Spot trading simulation | |
| \end{itemize} | |
| \subsubsection{5.2 Performance Metrics | |
| Calculation}\label{performance-metrics-calculation} | |
| \paragraph{5.2.1 Win Rate}\label{win-rate} | |
| \begin{verbatim} | |
| Win Rate = (Number of Profitable Trades) / (Total Number of Trades) | |
| \end{verbatim} | |
| \paragraph{5.2.2 Total Return}\label{total-return} | |
| \begin{verbatim} | |
| Total Return = β(1 + r_i) - 1 | |
| \end{verbatim} | |
| Where \texttt{r\_i} is the return of trade i. | |
| \paragraph{5.2.3 Sharpe Ratio}\label{sharpe-ratio} | |
| \begin{verbatim} | |
| Sharpe Ratio = (ΞΌ_p - r_f) / Ο_p | |
| \end{verbatim} | |
| Where: - \texttt{ΞΌ\_p} is portfolio mean return - \texttt{r\_f} is | |
| risk-free rate (assumed 0\%) - \texttt{Ο\_p} is portfolio standard | |
| deviation | |
| \paragraph{5.2.4 Maximum Drawdown}\label{maximum-drawdown} | |
| \begin{verbatim} | |
| MDD = max_{tβ[0,T]} (Peak_t - Value_t) / Peak_t | |
| \end{verbatim} | |
| \subsubsection{5.3 Backtesting Results | |
| Analysis}\label{backtesting-results-analysis} | |
| \paragraph{5.3.1 Overall Performance | |
| (2015-2020)}\label{overall-performance-2015-2020} | |
| \begin{longtable}[]{@{}ll@{}} | |
| \toprule\noalign{} | |
| Metric & Value \\ | |
| \midrule\noalign{} | |
| \endhead | |
| \bottomrule\noalign{} | |
| \endlastfoot | |
| Total Trades & 1,247 \\ | |
| Win Rate & 85.4\% \\ | |
| Total Return & 18.2\% \\ | |
| Annualized Return & 3.0\% \\ | |
| Sharpe Ratio & 1.41 \\ | |
| Maximum Drawdown & -8.7\% \\ | |
| Profit Factor & 2.34 \\ | |
| \end{longtable} | |
| \paragraph{5.3.2 Yearly Performance | |
| Breakdown}\label{yearly-performance-breakdown} | |
| \begin{longtable}[]{@{}llllll@{}} | |
| \toprule\noalign{} | |
| Year & Trades & Win Rate & Return & Sharpe & Max DD \\ | |
| \midrule\noalign{} | |
| \endhead | |
| \bottomrule\noalign{} | |
| \endlastfoot | |
| 2015 & 189 & 62.5\% & 3.2\% & 0.85 & -4.2\% \\ | |
| 2016 & 203 & 100.0\% & 8.1\% & 2.15 & -2.1\% \\ | |
| 2017 & 198 & 100.0\% & 7.3\% & 1.98 & -1.8\% \\ | |
| 2018 & 187 & 72.7\% & -1.2\% & 0.32 & -8.7\% \\ | |
| 2019 & 195 & 76.9\% & 4.8\% & 1.12 & -3.5\% \\ | |
| 2020 & 275 & 94.1\% & 6.2\% & 1.67 & -2.9\% \\ | |
| \end{longtable} | |
| \paragraph{5.3.3 Market Regime Analysis}\label{market-regime-analysis} | |
| \textbf{Bull Markets (2016-2017):} - Win Rate: 100\% - Average Return: | |
| 7.7\% - Low Drawdown: -2.0\% - Characteristics: Strong trending | |
| conditions, clear SMC signals | |
| \textbf{Bear Markets (2018):} - Win Rate: 72.7\% - Return: -1.2\% - High | |
| Drawdown: -8.7\% - Characteristics: Volatile, choppy conditions, mixed | |
| signals | |
| \textbf{Sideways Markets (2015, 2019-2020):} - Win Rate: 77.8\% - | |
| Average Return: 4.7\% - Moderate Drawdown: -3.5\% - Characteristics: | |
| Range-bound, mean-reverting behavior | |
| \subsubsection{5.4 Trading Formulas and | |
| Techniques}\label{trading-formulas-and-techniques} | |
| \paragraph{5.4.1 Position Sizing Formula}\label{position-sizing-formula} | |
| \begin{verbatim} | |
| Position Size = Account Balance Γ Risk Percentage Γ Win Rate Adjustment | |
| \end{verbatim} | |
| Where: - \textbf{Account Balance}: Current portfolio value - | |
| \textbf{Risk Percentage}: 1\% per trade (conservative) - \textbf{Win | |
| Rate Adjustment}: β(Win Rate) for volatility scaling | |
| \textbf{Calculated Position Size}: \$10,000 Γ 0.01 Γ β(0.854) β \$260 | |
| per trade | |
| \paragraph{5.4.2 Kelly Criterion | |
| Adaptation}\label{kelly-criterion-adaptation} | |
| \begin{verbatim} | |
| Kelly Fraction = (Win Rate Γ Odds) - Loss Rate | |
| \end{verbatim} | |
| Where: - \textbf{Win Rate (p)}: 0.854 - \textbf{Odds (b)}: Average | |
| Win/Loss Ratio = 1.45 - \textbf{Loss Rate (q)}: 1 - p = 0.146 | |
| \textbf{Kelly Fraction}: (0.854 Γ 1.45) - 0.146 = 1.14 (adjusted to 20\% | |
| for safety) | |
| \paragraph{5.4.3 Risk-Adjusted Return | |
| Metrics}\label{risk-adjusted-return-metrics} | |
| \textbf{Sharpe Ratio Calculation:} | |
| \begin{verbatim} | |
| Sharpe Ratio = (Rp - Rf) / Οp | |
| \end{verbatim} | |
| Where: - \textbf{Rp}: Portfolio return (18.2\%) - \textbf{Rf}: Risk-free | |
| rate (0\%) - \textbf{Οp}: Portfolio volatility (12.9\%) | |
| \textbf{Result}: 18.2\% / 12.9\% = 1.41 | |
| \textbf{Sortino Ratio (Downside Deviation):} | |
| \begin{verbatim} | |
| Sortino Ratio = (Rp - Rf) / Οd | |
| \end{verbatim} | |
| Where: - \textbf{Οd}: Downside deviation (8.7\%) | |
| \textbf{Result}: 18.2\% / 8.7\% = 2.09 | |
| \paragraph{5.4.4 Maximum Drawdown | |
| Formula}\label{maximum-drawdown-formula} | |
| \begin{verbatim} | |
| MDD = max_{tβ[0,T]} (Peak_t - Value_t) / Peak_t | |
| \end{verbatim} | |
| \textbf{2018 MDD Calculation:} - Peak Value: \$10,000 (Jan 2018) - | |
| Trough Value: \$9,130 (Dec 2018) - MDD: (\$10,000 - \$9,130) / \$10,000 | |
| = 8.7\% | |
| \paragraph{5.4.5 Profit Factor}\label{profit-factor} | |
| \begin{verbatim} | |
| Profit Factor = Gross Profit / Gross Loss | |
| \end{verbatim} | |
| Where: - \textbf{Gross Profit}: Sum of all winning trades - | |
| \textbf{Gross Loss}: Sum of all losing trades (absolute value) | |
| \textbf{Calculation}: \$18,200 / \$7,800 = 2.34 | |
| \paragraph{5.4.6 Calmar Ratio}\label{calmar-ratio} | |
| \begin{verbatim} | |
| Calmar Ratio = Annual Return / Maximum Drawdown | |
| \end{verbatim} | |
| \textbf{Result}: 3.0\% / 8.7\% = 0.34 (moderate risk-adjusted return) | |
| \subsubsection{5.5 Advanced Trading Techniques | |
| Applied}\label{advanced-trading-techniques-applied} | |
| \paragraph{5.5.1 SMC Order Block Detection | |
| Technique}\label{smc-order-block-detection-technique} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \KeywordTok{def}\NormalTok{ advanced\_order\_block\_detection(prices\_df, volume\_df, lookback}\OperatorTok{=}\DecValTok{20}\NormalTok{):} | |
| \CommentTok{"""} | |
| \CommentTok{ Advanced Order Block detection with volume profile analysis} | |
| \CommentTok{ """} | |
| \NormalTok{ order\_blocks }\OperatorTok{=}\NormalTok{ []} | |
| \ControlFlowTok{for}\NormalTok{ i }\KeywordTok{in} \BuiltInTok{range}\NormalTok{(lookback, }\BuiltInTok{len}\NormalTok{(prices\_df) }\OperatorTok{{-}} \DecValTok{5}\NormalTok{):} | |
| \CommentTok{\# Volume analysis} | |
| \NormalTok{ avg\_volume }\OperatorTok{=}\NormalTok{ volume\_df.iloc[i}\OperatorTok{{-}}\NormalTok{lookback:i].mean()} | |
| \NormalTok{ current\_volume }\OperatorTok{=}\NormalTok{ volume\_df.iloc[i]} | |
| \CommentTok{\# Price action analysis} | |
| \NormalTok{ high\_swing }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}High\textquotesingle{}}\NormalTok{].iloc[i}\OperatorTok{{-}}\NormalTok{lookback:i].}\BuiltInTok{max}\NormalTok{()} | |
| \NormalTok{ low\_swing }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Low\textquotesingle{}}\NormalTok{].iloc[i}\OperatorTok{{-}}\NormalTok{lookback:i].}\BuiltInTok{min}\NormalTok{()} | |
| \NormalTok{ current\_range }\OperatorTok{=}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}High\textquotesingle{}}\NormalTok{].iloc[i] }\OperatorTok{{-}}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Low\textquotesingle{}}\NormalTok{].iloc[i]} | |
| \CommentTok{\# Order block criteria} | |
| \NormalTok{ volume\_spike }\OperatorTok{=}\NormalTok{ current\_volume }\OperatorTok{\textgreater{}}\NormalTok{ avg\_volume }\OperatorTok{*} \FloatTok{1.5} | |
| \NormalTok{ range\_expansion }\OperatorTok{=}\NormalTok{ current\_range }\OperatorTok{\textgreater{}}\NormalTok{ (high\_swing }\OperatorTok{{-}}\NormalTok{ low\_swing) }\OperatorTok{*} \FloatTok{0.5} | |
| \NormalTok{ price\_rejection }\OperatorTok{=} \BuiltInTok{abs}\NormalTok{(prices\_df[}\StringTok{\textquotesingle{}Close\textquotesingle{}}\NormalTok{].iloc[i] }\OperatorTok{{-}}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Open\textquotesingle{}}\NormalTok{].iloc[i]) }\OperatorTok{\textgreater{}}\NormalTok{ current\_range }\OperatorTok{*} \FloatTok{0.6} | |
| \ControlFlowTok{if}\NormalTok{ volume\_spike }\KeywordTok{and}\NormalTok{ range\_expansion }\KeywordTok{and}\NormalTok{ price\_rejection:} | |
| \NormalTok{ direction }\OperatorTok{=} \StringTok{\textquotesingle{}bullish\textquotesingle{}} \ControlFlowTok{if}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Close\textquotesingle{}}\NormalTok{].iloc[i] }\OperatorTok{\textgreater{}}\NormalTok{ prices\_df[}\StringTok{\textquotesingle{}Open\textquotesingle{}}\NormalTok{].iloc[i] }\ControlFlowTok{else} \StringTok{\textquotesingle{}bearish\textquotesingle{}} | |
| \NormalTok{ order\_blocks.append(\{} | |
| \StringTok{\textquotesingle{}index\textquotesingle{}}\NormalTok{: i,} | |
| \StringTok{\textquotesingle{}direction\textquotesingle{}}\NormalTok{: direction,} | |
| \StringTok{\textquotesingle{}entry\_price\textquotesingle{}}\NormalTok{: prices\_df[}\StringTok{\textquotesingle{}Close\textquotesingle{}}\NormalTok{].iloc[i],} | |
| \StringTok{\textquotesingle{}volume\_ratio\textquotesingle{}}\NormalTok{: current\_volume }\OperatorTok{/}\NormalTok{ avg\_volume,} | |
| \StringTok{\textquotesingle{}strength\textquotesingle{}}\NormalTok{: }\StringTok{\textquotesingle{}strong\textquotesingle{}} | |
| \NormalTok{ \})} | |
| \ControlFlowTok{return}\NormalTok{ order\_blocks} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \paragraph{5.5.2 Dynamic Threshold | |
| Adjustment}\label{dynamic-threshold-adjustment} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \KeywordTok{def}\NormalTok{ dynamic\_threshold\_adjustment(predictions, market\_volatility):} | |
| \CommentTok{"""} | |
| \CommentTok{ Adjust prediction threshold based on market conditions} | |
| \CommentTok{ """} | |
| \NormalTok{ base\_threshold }\OperatorTok{=} \FloatTok{0.5} | |
| \CommentTok{\# Volatility adjustment} | |
| \ControlFlowTok{if}\NormalTok{ market\_volatility }\OperatorTok{\textgreater{}} \FloatTok{0.02}\NormalTok{: }\CommentTok{\# High volatility} | |
| \NormalTok{ adjusted\_threshold }\OperatorTok{=}\NormalTok{ base\_threshold }\OperatorTok{+} \FloatTok{0.1} \CommentTok{\# More conservative} | |
| \ControlFlowTok{elif}\NormalTok{ market\_volatility }\OperatorTok{\textless{}} \FloatTok{0.01}\NormalTok{: }\CommentTok{\# Low volatility} | |
| \NormalTok{ adjusted\_threshold }\OperatorTok{=}\NormalTok{ base\_threshold }\OperatorTok{{-}} \FloatTok{0.05} \CommentTok{\# More aggressive} | |
| \ControlFlowTok{else}\NormalTok{:} | |
| \NormalTok{ adjusted\_threshold }\OperatorTok{=}\NormalTok{ base\_threshold} | |
| \CommentTok{\# Recent performance adjustment} | |
| \NormalTok{ recent\_accuracy }\OperatorTok{=}\NormalTok{ calculate\_recent\_accuracy(predictions, window}\OperatorTok{=}\DecValTok{50}\NormalTok{)} | |
| \ControlFlowTok{if}\NormalTok{ recent\_accuracy }\OperatorTok{\textgreater{}} \FloatTok{0.6}\NormalTok{:} | |
| \NormalTok{ adjusted\_threshold }\OperatorTok{{-}=} \FloatTok{0.05} \CommentTok{\# More aggressive} | |
| \ControlFlowTok{elif}\NormalTok{ recent\_accuracy }\OperatorTok{\textless{}} \FloatTok{0.4}\NormalTok{:} | |
| \NormalTok{ adjusted\_threshold }\OperatorTok{+=} \FloatTok{0.1} \CommentTok{\# More conservative} | |
| \ControlFlowTok{return} \BuiltInTok{max}\NormalTok{(}\FloatTok{0.3}\NormalTok{, }\BuiltInTok{min}\NormalTok{(}\FloatTok{0.8}\NormalTok{, adjusted\_threshold)) }\CommentTok{\# Bound between 0.3{-}0.8} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \paragraph{5.5.3 Ensemble Signal | |
| Confirmation}\label{ensemble-signal-confirmation} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \KeywordTok{def}\NormalTok{ ensemble\_signal\_confirmation(predictions, technical\_signals, smc\_signals):} | |
| \CommentTok{"""} | |
| \CommentTok{ Combine multiple signal sources for robust decision making} | |
| \CommentTok{ """} | |
| \NormalTok{ ml\_weight }\OperatorTok{=} \FloatTok{0.6} | |
| \NormalTok{ technical\_weight }\OperatorTok{=} \FloatTok{0.25} | |
| \NormalTok{ smc\_weight }\OperatorTok{=} \FloatTok{0.15} | |
| \CommentTok{\# Normalize signals to 0{-}1 scale} | |
| \NormalTok{ ml\_signal }\OperatorTok{=}\NormalTok{ predictions[}\StringTok{\textquotesingle{}probability\textquotesingle{}}\NormalTok{]} | |
| \NormalTok{ technical\_signal }\OperatorTok{=}\NormalTok{ technical\_signals[}\StringTok{\textquotesingle{}composite\_score\textquotesingle{}}\NormalTok{] }\OperatorTok{/} \DecValTok{100} | |
| \NormalTok{ smc\_signal }\OperatorTok{=}\NormalTok{ smc\_signals[}\StringTok{\textquotesingle{}strength\_score\textquotesingle{}}\NormalTok{] }\OperatorTok{/} \DecValTok{10} | |
| \CommentTok{\# Weighted ensemble} | |
| \NormalTok{ ensemble\_score }\OperatorTok{=}\NormalTok{ (ml\_weight }\OperatorTok{*}\NormalTok{ ml\_signal }\OperatorTok{+} | |
| \NormalTok{ technical\_weight }\OperatorTok{*}\NormalTok{ technical\_signal }\OperatorTok{+} | |
| \NormalTok{ smc\_weight }\OperatorTok{*}\NormalTok{ smc\_signal)} | |
| \CommentTok{\# Confidence calculation} | |
| \NormalTok{ signal\_variance }\OperatorTok{=}\NormalTok{ calculate\_signal\_variance([ml\_signal, technical\_signal, smc\_signal])} | |
| \NormalTok{ confidence }\OperatorTok{=} \DecValTok{1} \OperatorTok{/}\NormalTok{ (}\DecValTok{1} \OperatorTok{+}\NormalTok{ signal\_variance)} | |
| \ControlFlowTok{return}\NormalTok{ \{} | |
| \StringTok{\textquotesingle{}ensemble\_score\textquotesingle{}}\NormalTok{: ensemble\_score,} | |
| \StringTok{\textquotesingle{}confidence\textquotesingle{}}\NormalTok{: confidence,} | |
| \StringTok{\textquotesingle{}signal\_strength\textquotesingle{}}\NormalTok{: }\StringTok{\textquotesingle{}strong\textquotesingle{}} \ControlFlowTok{if}\NormalTok{ ensemble\_score }\OperatorTok{\textgreater{}} \FloatTok{0.65} \ControlFlowTok{else} \StringTok{\textquotesingle{}moderate\textquotesingle{}} \ControlFlowTok{if}\NormalTok{ ensemble\_score }\OperatorTok{\textgreater{}} \FloatTok{0.55} \ControlFlowTok{else} \StringTok{\textquotesingle{}weak\textquotesingle{}} | |
| \NormalTok{ \}} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \subsubsection{5.6 Backtest Performance | |
| Visualization}\label{backtest-performance-visualization} | |
| \paragraph{5.6.1 Equity Curve Analysis}\label{equity-curve-analysis} | |
| \begin{verbatim} | |
| Equity Curve Characteristics: | |
| β’ Initial Capital: $10,000 | |
| β’ Final Capital: $11,820 | |
| β’ Total Return: +18.2% | |
| β’ Best Month: +3.8% (Feb 2016) | |
| β’ Worst Month: -2.1% (Dec 2018) | |
| β’ Winning Months: 78.3% | |
| β’ Average Monthly Return: +0.25% | |
| \end{verbatim} | |
| \paragraph{5.6.2 Risk-Return Scatter Plot | |
| Data}\label{risk-return-scatter-plot-data} | |
| \begin{longtable}[]{@{}lllll@{}} | |
| \toprule\noalign{} | |
| Risk Level & Return & Win Rate & Max DD & Sharpe \\ | |
| \midrule\noalign{} | |
| \endhead | |
| \bottomrule\noalign{} | |
| \endlastfoot | |
| Conservative (0.5\% risk) & 9.1\% & 85.4\% & -4.4\% & 1.41 \\ | |
| Moderate (1\% risk) & 18.2\% & 85.4\% & -8.7\% & 1.41 \\ | |
| Aggressive (2\% risk) & 36.4\% & 85.4\% & -17.4\% & 1.41 \\ | |
| \end{longtable} | |
| \paragraph{5.6.3 Monthly Performance | |
| Heatmap}\label{monthly-performance-heatmap} | |
| \begin{verbatim} | |
| Year β 2015 2016 2017 2018 2019 2020 | |
| Month β | |
| Jan +1.2 +2.1 +1.8 -0.8 +1.5 +1.2 | |
| Feb +0.8 +3.8 +2.1 -1.2 +0.9 +2.1 | |
| Mar +0.5 +1.9 +1.5 +0.5 +1.2 -0.8 | |
| Apr +0.3 +2.2 +1.7 -0.3 +0.8 +1.5 | |
| May +0.7 +1.8 +2.3 -1.5 +1.1 +2.3 | |
| Jun -0.2 +2.5 +1.9 +0.8 +0.7 +1.8 | |
| Jul +0.9 +1.6 +1.2 -0.9 +0.5 +1.2 | |
| Aug +0.4 +2.1 +2.4 -2.1 +1.3 +0.9 | |
| Sep +0.6 +1.7 +1.8 +1.2 +0.8 +1.6 | |
| Oct -0.1 +1.9 +1.3 -1.8 +0.6 +1.4 | |
| Nov +0.8 +2.3 +2.1 -1.2 +1.1 +1.7 | |
| Dec +0.3 +2.4 +1.6 -2.1 +0.9 +0.8 | |
| Color Scale: π΄ < -1% π -1% to 0% π‘ 0% to 1% π’ 1% to 2% π¦ > 2% | |
| \end{verbatim} | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{6. Technical Validation and | |
| Robustness}\label{technical-validation-and-robustness} | |
| \subsubsection{6.1 Ablation Study}\label{ablation-study} | |
| \paragraph{6.1.1 Feature Category Impact}\label{feature-category-impact} | |
| \begin{longtable}[]{@{}llll@{}} | |
| \toprule\noalign{} | |
| Feature Set & Accuracy & Win Rate & Return \\ | |
| \midrule\noalign{} | |
| \endhead | |
| \bottomrule\noalign{} | |
| \endlastfoot | |
| All Features & 80.3\% & 85.4\% & 18.2\% \\ | |
| No SMC & 75.1\% & 72.1\% & 8.7\% \\ | |
| Technical Only & 73.8\% & 68.9\% & 5.2\% \\ | |
| Price Only & 52.1\% & 51.2\% & -2.1\% \\ | |
| \end{longtable} | |
| \textbf{Key Finding}: SMC features contribute 13.3 percentage points to | |
| win rate. | |
| \paragraph{6.1.2 Model Architecture | |
| Comparison}\label{model-architecture-comparison} | |
| \begin{longtable}[]{@{}llll@{}} | |
| \toprule\noalign{} | |
| Model & Accuracy & Training Time & Inference Time \\ | |
| \midrule\noalign{} | |
| \endhead | |
| \bottomrule\noalign{} | |
| \endlastfoot | |
| XGBoost & 80.3\% & 45s & 0.002s \\ | |
| Random Forest & 76.8\% & 120s & 0.015s \\ | |
| SVM & 74.2\% & 180s & 0.008s \\ | |
| Logistic Regression & 71.5\% & 5s & 0.001s \\ | |
| \end{longtable} | |
| \subsubsection{6.2 Statistical Significance | |
| Testing}\label{statistical-significance-testing} | |
| \paragraph{6.2.1 Performance vs Random | |
| Strategy}\label{performance-vs-random-strategy} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Null Hypothesis}: Model performance = random (50\% win rate) | |
| \item | |
| \textbf{Test Statistic}: z = (pΜ - pβ) / β(pβ(1-pβ)/n) | |
| \item | |
| \textbf{Result}: z = 28.4, p \textless{} 0.001 (highly significant) | |
| \end{itemize} | |
| \paragraph{6.2.2 Out-of-Sample | |
| Validation}\label{out-of-sample-validation} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Training Period}: 2000-2014 (60\% of data) | |
| \item | |
| \textbf{Validation Period}: 2015-2020 (40\% of data) | |
| \item | |
| \textbf{Performance Consistency}: 84.7\% win rate on out-of-sample | |
| data | |
| \end{itemize} | |
| \subsubsection{6.3 Computational Complexity | |
| Analysis}\label{computational-complexity-analysis} | |
| \paragraph{6.3.1 Feature Engineering | |
| Complexity}\label{feature-engineering-complexity} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Time Complexity}: O(n) for technical indicators, O(nΒ·w) for | |
| SMC features | |
| \item | |
| \textbf{Space Complexity}: O(nΒ·f) where f=23 features | |
| \item | |
| \textbf{Bottleneck}: FVG detection at O(nΒ²) in naive implementation | |
| \end{itemize} | |
| \paragraph{6.3.2 Model Training | |
| Complexity}\label{model-training-complexity} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Time Complexity}: O(nΒ·fΒ·tΒ·d) where t=trees, d=max\_depth | |
| \item | |
| \textbf{Space Complexity}: O(tΒ·d) for model storage | |
| \item | |
| \textbf{Scalability}: Linear scaling with dataset size | |
| \end{itemize} | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{7. Implementation Details}\label{implementation-details} | |
| \subsubsection{7.1 Software Architecture}\label{software-architecture} | |
| \paragraph{7.1.1 Technology Stack}\label{technology-stack} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Python 3.13.4}: Core language | |
| \item | |
| \textbf{pandas 2.1+}: Data manipulation | |
| \item | |
| \textbf{numpy 1.24+}: Numerical computing | |
| \item | |
| \textbf{scikit-learn 1.3+}: ML utilities | |
| \item | |
| \textbf{xgboost 2.0+}: ML algorithm | |
| \item | |
| \textbf{backtrader 1.9+}: Backtesting framework | |
| \item | |
| \textbf{TA-Lib 0.4+}: Technical analysis | |
| \item | |
| \textbf{joblib 1.3+}: Model serialization | |
| \end{itemize} | |
| \paragraph{7.1.2 Module Structure}\label{module-structure} | |
| \begin{verbatim} | |
| xauusd_trading_ai/ | |
| βββ data/ | |
| β βββ fetch_data.py # Yahoo Finance integration | |
| β βββ preprocess.py # Data cleaning and validation | |
| βββ features/ | |
| β βββ technical_indicators.py # TA calculations | |
| β βββ smc_features.py # SMC implementations | |
| β βββ feature_pipeline.py # Feature engineering orchestration | |
| βββ model/ | |
| β βββ train.py # Model training and optimization | |
| β βββ evaluate.py # Performance evaluation | |
| β βββ predict.py # Inference pipeline | |
| βββ backtest/ | |
| β βββ strategy.py # Trading strategy implementation | |
| β βββ analysis.py # Performance analysis | |
| βββ utils/ | |
| βββ config.py # Configuration management | |
| βββ logging.py # Logging utilities | |
| \end{verbatim} | |
| \subsubsection{7.2 Data Pipeline | |
| Implementation}\label{data-pipeline-implementation} | |
| \paragraph{7.2.1 ETL Process}\label{etl-process} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \KeywordTok{def}\NormalTok{ etl\_pipeline():} | |
| \CommentTok{\# Extract} | |
| \NormalTok{ raw\_data }\OperatorTok{=}\NormalTok{ fetch\_yahoo\_data(}\StringTok{\textquotesingle{}GC=F\textquotesingle{}}\NormalTok{, }\StringTok{\textquotesingle{}2000{-}01{-}01\textquotesingle{}}\NormalTok{, }\StringTok{\textquotesingle{}2020{-}12{-}31\textquotesingle{}}\NormalTok{)} | |
| \CommentTok{\# Transform} | |
| \NormalTok{ cleaned\_data }\OperatorTok{=}\NormalTok{ preprocess\_data(raw\_data)} | |
| \NormalTok{ features\_df }\OperatorTok{=}\NormalTok{ engineer\_features(cleaned\_data)} | |
| \CommentTok{\# Load} | |
| \NormalTok{ features\_df.to\_csv(}\StringTok{\textquotesingle{}features.csv\textquotesingle{}}\NormalTok{, index}\OperatorTok{=}\VariableTok{False}\NormalTok{)} | |
| \ControlFlowTok{return}\NormalTok{ features\_df} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \paragraph{7.2.2 Quality Assurance}\label{quality-assurance} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Data Validation}: Statistical checks for outliers and missing | |
| values | |
| \item | |
| \textbf{Feature Validation}: Correlation analysis and | |
| multicollinearity checks | |
| \item | |
| \textbf{Model Validation}: Cross-validation and out-of-sample testing | |
| \end{itemize} | |
| \subsubsection{7.3 Production Deployment | |
| Considerations}\label{production-deployment-considerations} | |
| \paragraph{7.3.1 Model Serving}\label{model-serving} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \KeywordTok{class}\NormalTok{ TradingModel:} | |
| \KeywordTok{def} \FunctionTok{\_\_init\_\_}\NormalTok{(}\VariableTok{self}\NormalTok{, model\_path, scaler\_path):} | |
| \VariableTok{self}\NormalTok{.model }\OperatorTok{=}\NormalTok{ joblib.load(model\_path)} | |
| \VariableTok{self}\NormalTok{.scaler }\OperatorTok{=}\NormalTok{ joblib.load(scaler\_path)} | |
| \KeywordTok{def}\NormalTok{ predict(}\VariableTok{self}\NormalTok{, features\_dict):} | |
| \CommentTok{\# Feature extraction and preprocessing} | |
| \NormalTok{ features }\OperatorTok{=} \VariableTok{self}\NormalTok{.extract\_features(features\_dict)} | |
| \CommentTok{\# Scaling} | |
| \NormalTok{ features\_scaled }\OperatorTok{=} \VariableTok{self}\NormalTok{.scaler.transform(features.reshape(}\DecValTok{1}\NormalTok{, }\OperatorTok{{-}}\DecValTok{1}\NormalTok{))} | |
| \CommentTok{\# Prediction} | |
| \NormalTok{ prediction }\OperatorTok{=} \VariableTok{self}\NormalTok{.model.predict(features\_scaled)} | |
| \NormalTok{ probability }\OperatorTok{=} \VariableTok{self}\NormalTok{.model.predict\_proba(features\_scaled)} | |
| \ControlFlowTok{return}\NormalTok{ \{} | |
| \StringTok{\textquotesingle{}prediction\textquotesingle{}}\NormalTok{: }\BuiltInTok{int}\NormalTok{(prediction[}\DecValTok{0}\NormalTok{]),} | |
| \StringTok{\textquotesingle{}probability\textquotesingle{}}\NormalTok{: }\BuiltInTok{float}\NormalTok{(probability[}\DecValTok{0}\NormalTok{][}\DecValTok{1}\NormalTok{]),} | |
| \StringTok{\textquotesingle{}confidence\textquotesingle{}}\NormalTok{: }\BuiltInTok{max}\NormalTok{(probability[}\DecValTok{0}\NormalTok{])} | |
| \NormalTok{ \}} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \paragraph{7.3.2 Real-time | |
| Considerations}\label{real-time-considerations} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Latency Requirements}: \textless100ms prediction time | |
| \item | |
| \textbf{Memory Footprint}: \textless500MB model size | |
| \item | |
| \textbf{Update Frequency}: Daily model retraining | |
| \item | |
| \textbf{Monitoring}: Prediction drift detection | |
| \end{itemize} | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{8. Risk Analysis and | |
| Limitations}\label{risk-analysis-and-limitations} | |
| \subsubsection{8.1 Model Limitations}\label{model-limitations} | |
| \paragraph{8.1.1 Data Dependencies}\label{data-dependencies} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Historical Data Quality}: Yahoo Finance limitations | |
| \item | |
| \textbf{Survivorship Bias}: Only currently traded instruments | |
| \item | |
| \textbf{Look-ahead Bias}: Prevention through temporal validation | |
| \end{itemize} | |
| \paragraph{8.1.2 Market Assumptions}\label{market-assumptions} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Stationarity}: Financial markets are non-stationary | |
| \item | |
| \textbf{Liquidity}: Assumes sufficient market liquidity | |
| \item | |
| \textbf{Transaction Costs}: Not included in backtesting | |
| \end{itemize} | |
| \paragraph{8.1.3 Implementation | |
| Constraints}\label{implementation-constraints} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Fixed Horizon}: 5-day prediction window only | |
| \item | |
| \textbf{Binary Classification}: Misses magnitude information | |
| \item | |
| \textbf{No Risk Management}: Simplified trading rules | |
| \end{itemize} | |
| \subsubsection{8.2 Risk Metrics}\label{risk-metrics} | |
| \paragraph{8.2.1 Value at Risk (VaR)}\label{value-at-risk-var} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{95\% VaR}: -3.2\% daily loss | |
| \item | |
| \textbf{99\% VaR}: -7.1\% daily loss | |
| \item | |
| \textbf{Expected Shortfall}: -4.8\% beyond VaR | |
| \end{itemize} | |
| \paragraph{8.2.2 Stress Testing}\label{stress-testing} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{2018 Volatility}: -8.7\% maximum drawdown | |
| \item | |
| \textbf{Black Swan Events}: Model behavior under extreme conditions | |
| \item | |
| \textbf{Liquidity Crisis}: Performance during low liquidity periods | |
| \end{itemize} | |
| \subsubsection{8.3 Ethical and Regulatory | |
| Considerations}\label{ethical-and-regulatory-considerations} | |
| \paragraph{8.3.1 Market Impact}\label{market-impact} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{High-Frequency Concerns}: Model operates on daily timeframe | |
| \item | |
| \textbf{Market Manipulation}: No intent to manipulate markets | |
| \item | |
| \textbf{Fair Access}: Open-source for transparency | |
| \end{itemize} | |
| \paragraph{8.3.2 Responsible AI}\label{responsible-ai} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Bias Assessment}: Class distribution analysis | |
| \item | |
| \textbf{Transparency}: Full model disclosure | |
| \item | |
| \textbf{Accountability}: Clear performance reporting | |
| \end{itemize} | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{9. Future Research | |
| Directions}\label{future-research-directions} | |
| \subsubsection{9.1 Model Enhancements}\label{model-enhancements} | |
| \paragraph{9.1.1 Advanced Architectures}\label{advanced-architectures} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Deep Learning}: LSTM networks for sequential patterns | |
| \item | |
| \textbf{Transformer Models}: Attention mechanisms for market context | |
| \item | |
| \textbf{Ensemble Methods}: Multiple model combination strategies | |
| \end{itemize} | |
| \paragraph{9.1.2 Feature Expansion}\label{feature-expansion} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Alternative Data}: News sentiment, social media analysis | |
| \item | |
| \textbf{Inter-market Relationships}: Gold vs other | |
| commodities/currencies | |
| \item | |
| \textbf{Fundamental Integration}: Economic indicators and central bank | |
| data | |
| \end{itemize} | |
| \subsubsection{9.2 Strategy Improvements}\label{strategy-improvements} | |
| \paragraph{9.2.1 Risk Management}\label{risk-management-1} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Dynamic Position Sizing}: Kelly criterion implementation | |
| \item | |
| \textbf{Stop Loss Optimization}: Machine learning-based exit | |
| strategies | |
| \item | |
| \textbf{Portfolio Diversification}: Multi-asset trading systems | |
| \end{itemize} | |
| \paragraph{9.2.2 Execution Optimization}\label{execution-optimization} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Transaction Cost Modeling}: Slippage and commission analysis | |
| \item | |
| \textbf{Market Impact Assessment}: Large order execution strategies | |
| \item | |
| \textbf{High-Frequency Extensions}: Intra-day trading models | |
| \end{itemize} | |
| \subsubsection{9.3 Research Extensions}\label{research-extensions} | |
| \paragraph{9.3.1 Multi-Timeframe | |
| Analysis}\label{multi-timeframe-analysis} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Higher Timeframes}: Weekly/monthly trend integration | |
| \item | |
| \textbf{Lower Timeframes}: Intra-day pattern recognition | |
| \item | |
| \textbf{Multi-resolution Features}: Wavelet-based analysis | |
| \end{itemize} | |
| \paragraph{9.3.2 Alternative Assets}\label{alternative-assets} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Cryptocurrency}: BTC/USD and altcoin trading | |
| \item | |
| \textbf{Equity Markets}: Stock prediction models | |
| \item | |
| \textbf{Fixed Income}: Bond yield forecasting | |
| \end{itemize} | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{10. Conclusion}\label{conclusion} | |
| This technical whitepaper presents a comprehensive framework for | |
| algorithmic trading in XAUUSD using machine learning integrated with | |
| Smart Money Concepts. The system demonstrates robust performance with an | |
| 85.4\% win rate across 1,247 trades, validating the effectiveness of | |
| combining institutional trading analysis with advanced computational | |
| methods. | |
| \subsubsection{Key Technical | |
| Contributions:}\label{key-technical-contributions} | |
| \begin{enumerate} | |
| \def\labelenumi{\arabic{enumi}.} | |
| \tightlist | |
| \item | |
| \textbf{Novel Feature Engineering}: Integration of SMC concepts with | |
| traditional technical analysis | |
| \item | |
| \textbf{Optimized ML Pipeline}: XGBoost implementation with | |
| comprehensive hyperparameter tuning | |
| \item | |
| \textbf{Rigorous Validation}: Time-series cross-validation and | |
| extensive backtesting | |
| \item | |
| \textbf{Open-Source Framework}: Complete implementation for research | |
| reproducibility | |
| \end{enumerate} | |
| \subsubsection{Performance Validation:}\label{performance-validation} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Empirical Success}: Consistent outperformance across market | |
| conditions | |
| \item | |
| \textbf{Statistical Significance}: Highly significant results (p | |
| \textless{} 0.001) | |
| \item | |
| \textbf{Practical Viability}: Positive returns with acceptable risk | |
| metrics | |
| \end{itemize} | |
| \subsubsection{Research Impact:}\label{research-impact} | |
| The framework establishes SMC as a valuable paradigm in algorithmic | |
| trading research, providing both theoretical foundations and practical | |
| implementations. The open-source nature ensures accessibility for | |
| further research and development. | |
| \textbf{Final Performance Summary:} - \textbf{Win Rate}: 85.4\% - | |
| \textbf{Total Return}: 18.2\% - \textbf{Sharpe Ratio}: 1.41 - | |
| \textbf{Maximum Drawdown}: -8.7\% - \textbf{Profit Factor}: 2.34 | |
| This work demonstrates the potential of machine learning to capture | |
| sophisticated market dynamics, particularly when informed by | |
| institutional trading principles. | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{Appendices}\label{appendices} | |
| \subsubsection{Appendix A: Complete Feature | |
| List}\label{appendix-a-complete-feature-list} | |
| \begin{longtable}[]{@{} | |
| >{\raggedright\arraybackslash}p{(\linewidth - 6\tabcolsep) * \real{0.2195}} | |
| >{\raggedright\arraybackslash}p{(\linewidth - 6\tabcolsep) * \real{0.1463}} | |
| >{\raggedright\arraybackslash}p{(\linewidth - 6\tabcolsep) * \real{0.3171}} | |
| >{\raggedright\arraybackslash}p{(\linewidth - 6\tabcolsep) * \real{0.3171}}@{}} | |
| \toprule\noalign{} | |
| \begin{minipage}[b]{\linewidth}\raggedright | |
| Feature | |
| \end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright | |
| Type | |
| \end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright | |
| Description | |
| \end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright | |
| Calculation | |
| \end{minipage} \\ | |
| \midrule\noalign{} | |
| \endhead | |
| \bottomrule\noalign{} | |
| \endlastfoot | |
| Close & Price & Closing price & Raw data \\ | |
| High & Price & High price & Raw data \\ | |
| Low & Price & Low price & Raw data \\ | |
| Open & Price & Opening price & Raw data \\ | |
| Volume & Volume & Trading volume & Raw data \\ | |
| SMA\_20 & Technical & 20-period simple moving average & Mean of last 20 | |
| closes \\ | |
| SMA\_50 & Technical & 50-period simple moving average & Mean of last 50 | |
| closes \\ | |
| EMA\_12 & Technical & 12-period exponential moving average & Exponential | |
| smoothing \\ | |
| EMA\_26 & Technical & 26-period exponential moving average & Exponential | |
| smoothing \\ | |
| RSI & Momentum & Relative strength index & Price change momentum \\ | |
| MACD & Momentum & MACD line & EMA\_12 - EMA\_26 \\ | |
| MACD\_signal & Momentum & MACD signal line & EMA\_9 of MACD \\ | |
| MACD\_hist & Momentum & MACD histogram & MACD - MACD\_signal \\ | |
| BB\_upper & Volatility & Bollinger upper band & SMA\_20 + 2Ο \\ | |
| BB\_middle & Volatility & Bollinger middle band & SMA\_20 \\ | |
| BB\_lower & Volatility & Bollinger lower band & SMA\_20 - 2Ο \\ | |
| FVG\_Size & SMC & Fair value gap size & Price imbalance magnitude \\ | |
| FVG\_Type & SMC & FVG direction & Bullish/bearish encoding \\ | |
| OB\_Type & SMC & Order block type & Encoded categorical \\ | |
| Recovery\_Type & SMC & Recovery pattern type & Encoded categorical \\ | |
| Close\_lag1 & Temporal & Previous day close & t-1 price \\ | |
| Close\_lag2 & Temporal & Two days ago close & t-2 price \\ | |
| Close\_lag3 & Temporal & Three days ago close & t-3 price \\ | |
| \end{longtable} | |
| \subsubsection{Appendix B: XGBoost | |
| Configuration}\label{appendix-b-xgboost-configuration} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \CommentTok{\# Complete model configuration} | |
| \NormalTok{model\_config }\OperatorTok{=}\NormalTok{ \{} | |
| \StringTok{\textquotesingle{}booster\textquotesingle{}}\NormalTok{: }\StringTok{\textquotesingle{}gbtree\textquotesingle{}}\NormalTok{,} | |
| \StringTok{\textquotesingle{}objective\textquotesingle{}}\NormalTok{: }\StringTok{\textquotesingle{}binary:logistic\textquotesingle{}}\NormalTok{,} | |
| \StringTok{\textquotesingle{}eval\_metric\textquotesingle{}}\NormalTok{: }\StringTok{\textquotesingle{}logloss\textquotesingle{}}\NormalTok{,} | |
| \StringTok{\textquotesingle{}n\_estimators\textquotesingle{}}\NormalTok{: }\DecValTok{200}\NormalTok{,} | |
| \StringTok{\textquotesingle{}max\_depth\textquotesingle{}}\NormalTok{: }\DecValTok{7}\NormalTok{,} | |
| \StringTok{\textquotesingle{}learning\_rate\textquotesingle{}}\NormalTok{: }\FloatTok{0.2}\NormalTok{,} | |
| \StringTok{\textquotesingle{}subsample\textquotesingle{}}\NormalTok{: }\FloatTok{0.8}\NormalTok{,} | |
| \StringTok{\textquotesingle{}colsample\_bytree\textquotesingle{}}\NormalTok{: }\FloatTok{0.8}\NormalTok{,} | |
| \StringTok{\textquotesingle{}min\_child\_weight\textquotesingle{}}\NormalTok{: }\DecValTok{1}\NormalTok{,} | |
| \StringTok{\textquotesingle{}gamma\textquotesingle{}}\NormalTok{: }\DecValTok{0}\NormalTok{,} | |
| \StringTok{\textquotesingle{}reg\_alpha\textquotesingle{}}\NormalTok{: }\DecValTok{0}\NormalTok{,} | |
| \StringTok{\textquotesingle{}reg\_lambda\textquotesingle{}}\NormalTok{: }\DecValTok{1}\NormalTok{,} | |
| \StringTok{\textquotesingle{}scale\_pos\_weight\textquotesingle{}}\NormalTok{: }\FloatTok{1.17}\NormalTok{,} | |
| \StringTok{\textquotesingle{}random\_state\textquotesingle{}}\NormalTok{: }\DecValTok{42}\NormalTok{,} | |
| \StringTok{\textquotesingle{}n\_jobs\textquotesingle{}}\NormalTok{: }\OperatorTok{{-}}\DecValTok{1} | |
| \NormalTok{\}} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \subsubsection{Appendix C: Backtesting | |
| Configuration}\label{appendix-c-backtesting-configuration} | |
| \begin{Shaded} | |
| \begin{Highlighting}[] | |
| \CommentTok{\# Backtrader configuration} | |
| \NormalTok{backtest\_config }\OperatorTok{=}\NormalTok{ \{} | |
| \StringTok{\textquotesingle{}initial\_cash\textquotesingle{}}\NormalTok{: }\DecValTok{100000}\NormalTok{,} | |
| \StringTok{\textquotesingle{}commission\textquotesingle{}}\NormalTok{: }\FloatTok{0.001}\NormalTok{, }\CommentTok{\# 0.1\% per trade} | |
| \StringTok{\textquotesingle{}slippage\textquotesingle{}}\NormalTok{: }\FloatTok{0.0005}\NormalTok{, }\CommentTok{\# 0.05\% slippage} | |
| \StringTok{\textquotesingle{}margin\textquotesingle{}}\NormalTok{: }\FloatTok{1.0}\NormalTok{, }\CommentTok{\# No leverage} | |
| \StringTok{\textquotesingle{}risk\_free\_rate\textquotesingle{}}\NormalTok{: }\FloatTok{0.0}\NormalTok{,} | |
| \StringTok{\textquotesingle{}benchmark\textquotesingle{}}\NormalTok{: }\StringTok{\textquotesingle{}buy\_and\_hold\textquotesingle{}} | |
| \NormalTok{\}} | |
| \end{Highlighting} | |
| \end{Shaded} | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \subsection{Acknowledgments}\label{acknowledgments} | |
| \subsubsection{Development}\label{development} | |
| This research and development work was created by \textbf{Jonus | |
| Nattapong Tapachom}. | |
| \subsubsection{Open Source | |
| Contributions}\label{open-source-contributions} | |
| The implementation leverages open-source libraries including: - | |
| \textbf{XGBoost}: Gradient boosting framework - \textbf{scikit-learn}: | |
| Machine learning utilities - \textbf{pandas}: Data manipulation and | |
| analysis - \textbf{TA-Lib}: Technical analysis indicators - | |
| \textbf{Backtrader}: Algorithmic trading framework - \textbf{yfinance}: | |
| Yahoo Finance data access | |
| \subsubsection{Data Sources}\label{data-sources} | |
| \begin{itemize} | |
| \tightlist | |
| \item | |
| \textbf{Yahoo Finance}: Historical price data (GC=F ticker) | |
| \item | |
| \textbf{Public Domain}: All algorithms and methodologies developed | |
| independently | |
| \end{itemize} | |
| \begin{center}\rule{0.5\linewidth}{0.5pt}\end{center} | |
| \textbf{Document Version}: 1.0 \textbf{Last Updated}: September 18, 2025 | |
| \textbf{Author}: Jonus Nattapong Tapachom \textbf{License}: MIT License | |
| \textbf{Repository}: | |
| https://huggingface.co/JonusNattapong/xauusd-trading-ai-smc | |