How to avoid overfitting?
Overfitting is a common challenge in algorithmic trading, where a strategy is fine-tuned to perform exceptionally well on historical data but fails to generalize to future, unseen market conditions. Avoiding overfitting ensures that your trading strategies remain robust and profitable in real-world scenarios. Here are some effective techniques to avoid overfitting in strategy development, especially when using tools like the Express Generator.
1. Use a Large and Diverse Dataset
Overfitting often occurs when a strategy is tailored too closely to a small set of historical data. To avoid this:
- Use as much data as possible: Ensure that the strategy is tested on multiple years of data, including periods with different market conditions (bullish, bearish, ranging).
- Include diverse market conditions: Use data from different market regimes (volatility, trends, crises) to ensure your strategy isn’t just optimized for one specific type of environment.
- Avoid cherry-picking data: Use complete data sets rather than excluding certain periods that might make your strategy look worse.
2. Out-of-Sample Testing
Out-of-sample testing is a critical technique to assess how well your strategy performs on data it hasn’t been trained on.
- Split your data: Use part of your historical data for strategy optimization (in-sample) and the rest for testing (out-of-sample). For example, if you have 10 years of data, use the first 7 years for optimization and the remaining 3 years for testing.
- Monitor performance: If your strategy performs well in-sample but poorly out-of-sample, it’s likely overfitted. A strategy that performs consistently in both periods is more likely to generalize well.
In Express Generator, you can set out-of-sample testing using these parameters:
data_start_percent = 0
data_end_percent = 100
Adjust these values to specify the percentage of data to use for training and testing.
3. Forward Testing
After backtesting, forward testing is essential to evaluate how the strategy behaves on unseen, real-time data.
- Paper trade in real time: Forward testing involves running the strategy in a demo or paper trading environment with real-time data. This tests the strategy’s adaptability to live market conditions.
- Monitor performance over weeks or months: Make sure the strategy maintains consistent performance across different market conditions, and avoid tweaking the strategy based on short-term results.
In Express Generator, you can enable forward testing with these parameters:
use_forward_testing = true
preload_data_bars = 0
4. Limit Strategy Complexity
Overly complex strategies are more prone to overfitting because they tend to adapt too closely to the historical data, including noise or random market fluctuations.
- Limit the number of indicators: Use only a few key indicators that are known to work well in various conditions rather than a large combination of signals.
- Avoid too many parameters: The more adjustable parameters a strategy has, the higher the risk of overfitting. Keep it simple by avoiding excessive optimizations of stop-loss, take-profit, or indicator parameters.
In Express Generator, you can limit the number of indicators with these parameters:
max_entry_slots = 3
max_exit_slots = 2
This restricts the number of entry and exit rules, reducing the chance of overfitting.
5. Set Strict Acceptance Criteria
Setting meaningful acceptance criteria ensures that strategies are not just performing well due to random luck in backtesting but have sound risk-reward characteristics.
- Focus on essential performance metrics: Use metrics such as the Profit Factor, Return to Drawdown, Win/Loss Ratio, and Stagnation. Strategies that meet these thresholds are more likely to be robust.
Example of criteria setup:
min_count_of_trades = 100
min_profit_factor = 1.5
min_return_to_drawdown = 2.0
These values ensure that the strategy has a reasonable number of trades and a good balance between risk and reward.
6. Monte Carlo Simulation
Monte Carlo simulations can be used to stress-test strategies under different market conditions, execution problems, and parameter variations.
- Randomize market conditions: Introduce random spreads, slippage, and skipped trade entries to simulate real-world conditions and test if the strategy is robust against them.
- Simulate parameter variations: Introduce small random changes to the strategy’s parameters (like indicator periods) and see if the strategy still performs well. A strategy that drastically changes its behavior with small parameter tweaks is likely overfitted.
Example setup for Monte Carlo:
enable_monte_carlo = true
count_of_tests = 20
spread_max = 30
slippage_max = 20
skip_entries_percent = 2
ind_params_max_change_percent = 20
This setup runs 20 tests, introduces spread and slippage variations, and randomizes indicator parameters to test the strategy’s resilience.
7. Use Conservative Backtest Settings
Ensure that backtesting assumptions are realistic, especially regarding trade execution:
- Include slippage and spread: Ensure that backtests reflect actual trading conditions by accounting for spreads and slippage.
- Use realistic stop-loss and take-profit: Don’t optimize these parameters excessively. Use ranges that are based on reasonable market expectations rather than fine-tuning them to historical data.
Example in Express Generator:
spread = 20
slippage_max = 10
This accounts for real-world conditions where trades are executed with a slight disadvantage compared to perfect conditions.
8. Cross-Validation
Cross-validation is a technique borrowed from machine learning where multiple subsets of the data are used for testing and validation.
- Split the data into multiple parts: Rotate the in-sample and out-of-sample periods several times. Each split tests a different portion of the data, ensuring that the strategy isn’t overfitting to a single set of conditions.
In Express Generator, you can create multiple data files from different periods or markets and test the strategy on each of them.
9. Avoid Data Snooping and Over-Optimization
Over-optimization occurs when a strategy is excessively fine-tuned to the historical data. This can result in a strategy that fits perfectly to past data but fails in real markets.
- Set reasonable parameter ranges: Instead of fine-tuning a strategy to exact levels, use broader ranges for indicators like Moving Averages or RSI periods.
- Avoid adjusting the strategy after each failure: Every strategy will have losses, but adjusting the rules after each poor performance is likely to lead to overfitting.
10. Regular Re-Evaluation
Markets evolve, and a strategy that worked well in the past might need adjustments or even replacement over time. Re-evaluate the strategy periodically:
- Monitor performance continuously: Keep track of key metrics like drawdown, win/loss ratio, and profit factor, and compare them to historical performance.
- Update data regularly: Refresh your data sets to ensure your strategy is working with the most up-to-date information.
Conclusion
To avoid overfitting, it’s important to use diverse data, set realistic acceptance criteria, and stress-test your strategy with techniques like Monte Carlo simulation and forward testing. Keep strategies simple, focus on robustness, and continuously re-evaluate performance over time. By following these practices, you can develop trading strategies that perform well in both historical and real-world markets.