If I divide data into in-sample (up till Sept. 2007) and out-sample (from Sept. 2007 to 2010), Most strategies that perform well in-sample will be falling like a rock out-sample. But if I don't include the extreme case of year 2008 into my out-sample test data, I am cheating myself - what if another black swan event happens? Any thoughts on testing the robustness of an optimized trading strategy? And any thoughts/pointers about discovering trading strategies that can go well in all cycles, e.g. doing well in-sample (data prior to 2008) and also perform well on the out-of-sample 2008/2009? Thanks a lot!

Please qualify as follows: "Most of my strategies that perform well in-sample will be falling like a rock out-sample." You can only talk about your strategies. Nobody else's.

IMO the only way to test the robustness of a trading system is my using a genetic optimizer to walk forward, with one of the variables being the in-sample period in number of trading days. I hope this help.

Maybe in-sample should be interspersed with out-of-sample data, break it into the smallest units practical and let the backtester randomly pick which time periods are in or out of sample...

Yes...even better with OHLC bar structuring. Walkforward optimization minimize the tendency for Optimization techniques to curve fit.

So in your findings, it is not always the case that a larger training sample produces better out of sample results?

I have an automated system that trade futures. The size of the training sample varies with what is being traded and changes over time. The shortest I've seen was 36 trading days, the longest being 160 days...remember that's just one of the variables that is optimized.