While nearly every backtest can be misleading, it only becomes clear how solid a strategy is once you test it on out of sample data. Any backtest that is run successfully on only in-sample data literally means nothing. Additionally, just cause it performs well out of sample still does not validate the strategy. What you look for in a good system is not only how it does out of sample, but does it behave in a similar manner out of sample as it did in-sample. If the volatility drastically changes, from in sample to out of sample, i would question it's integrity. The more robust strategies behave out of sample just as they did in sample. Forget the arbitrary rules of >100 trades or 10,000 trades needed for it to be good... that again is meaningless. What is most important is looking at the strategy from a fundamental standpoint, and not base it on random rules. The best models are adaptive. Period. The market is ever-changing, on all levels, because it is fractal in nature. Regime shifts are real, and they can destroy static/non-adaptive models very easily. Just cause you test a strategy over 4-5 years doesn't mean it is a great model, the market may be in a specific mode during that period, and a regime shift could cause the whole thing to blow up. After years of testing and working with models, it has become clearly evident that the best models are those that are the simplest in nature. That is the key to robustness. Often i see people create models that perform beautifully for a given period, but due to the fragile nature of their system or rules a random event will blow the thing up. In my mind, the best models can be successfully applied to most trade/investment time frames, and across most or all common tradable financial assets (currencies, futures, etc). The key to this, again, is keeping the "rules" simple. Final thing to note, any models using popular technical analysis methods such as RSI, SMAs (crossovers for example), etc., are all laggy. This can be dangerous in and of itself. Most of these systems just can't react quick enough to be effective in an ever-changing environment. If you go lower down in their periods/number of bars, you may be reducing lag but you're increasing noise... another issue that plagues models. Unbeknownst to many, an SMA of say 20 is actually 10 bars lagged, inherently. Do you want to be operating on information 10 bars lagged? I certainly don't. Exponential Moving Averages (EMA) and Front Weighted Moving Averages (FWMA) suffer from lag as well. An EMA of 10 is technically lagging by 4.5 days [(length-1)/2] while a FWMA of 10 will be lagging by 3 days at a minimum. So, what my point is with all this is look to develop a system or systems that don't suffer from lag and that don't suffer from following too much noise either. Make sure it performs out of sample as it did in sample, the performance metrics should be very similar. There is a balance there, but adaptivity is the key to nearly every successful strategy.
Here's one I threw together to verify something that was working live (discretionary) on stocks, to see if it would work on index data.
You can get a lot of backtests here Choose the good ones and proceed From this blog post. A must read.