I heared from others that,picking the most priority stock/commodity after special rules,such as 1. re-select all the stocks in which priceclose(bar) > sma(bar,close,10) into a basket everyday. 2. order the stock in the basket which gets the max RSI(bar,#close,5). actually was a statistic trap. I also heared that data-mining was another kind of curve-fitting,subjectively,I don't agree with him,but I don't know how to tell the difference between normal statistical phenomenon and trap,and couldn't find books or references about this topic,any suggestion please? PS: I saw the same threat opened in the discussion of www.wealth-lab.com,it's somewhat interesting and meaningful for me. Any reply is welcome,thanks in advanced.

I got really tired of the in-sample/out-of-sample/ curve fitting/statistics arguments. If you think your system should work then trade it with very small size at IB. Their commish structure allows that. Real trading tells you a lot more than further "cyber/math space mental trading".

It's real simple (in principle). To have a worthwhile trading system, it has to do two things. First, it has to make a profit AFTER fees, slippage, etc. Second, it has to beat buy-and-hold, otherwise why bother? What you need is a sufficiently long backtest that produces the number of trades minus the number of parameters you're optimizing to be equal to or greater than 30. Then you can do a meaningful statistical analysis on your results. All that bear-market-bull-market stuff is nonsense. The market goes up and down in every kind of market. If your system is getting you in and out 30 or more times in your backtest, you're experiencing all the different types of market by default. Then perform the standard statistical tests, and modify the system if necessary or trash it and test a better one. When you get a system that produces valid backtest results you can live with, that's the one you trade. Simple.

Data mining, by definition, explore the statistical relationship among the data you have presented there. So, of course a system built around that is curve-fitting. So is basket reselection, on-the-fly reoptimization, etc. The main point is, the results (i.e. performance) obtained over a long period of time, including periods with different characteristics, do show the necessary "consistencies" in various measures. That tells you something about the nature of the "edge" you mined out of your data is short term randomness or really part of the data's overall behaviour. So there is really nothing wrong with curve-fitting. Just how you fit the curve correctly is the key. A simple example of what I mean. A daytrading system, say, produce about 20 trades a month. You get about 240 trades a year. Most beginners in system development thought that if the system's performance optimized over last year's data is going through the roof, they hit a jackpot. Wrong. With the minimal requirement for most basic statistics collection, you need at least 30 samples. But, each partition of 30 samples here results in 1, yes, 1 single observation of performance statistics like average p/l, max dd%, etc. for that period. So, in a year, all you got is 8 observations. To give yourself any confidence in the performance, you need at least 30 observations. That implies you need at least 4 years of results to prove that the system is stable. And, we have not even taken into account in using drastically different time period to compare the performance. Oops, I talked like a math geek again