if your strategy has 50 parameters, either you're really good and doing some esoteric stuff or have little idea about how the markets work. You'll overfit your data. One ex-rentec guy I knew used to tell stories about the complex hmm models they built back in the day. I believe you can capture essential market structures in some non-obvious ways like hmm's but I've always preferred the elegant solution which can be generalized.
In this book you can find how long should be your test set for a given number of parameters. I read this book few years ago so I don't remember the exact formula http://www.amazon.com/Algorithmic-Trading-Winning-Strategies-Rationale/dp/1118460146
Genetic Optimization is just an alternative to brute-forcing all parameter combinations. Does not answer my question about the methodology. So, 50+ parameters? Do you first optimize the first one keeping the other 49+ with their default or some predetermined values, and optimize one parameter by one sequentially till you optimize all of them? Genetic optimization might return a global maximum, but how do you test the parameter combination's (the combination that genetic optimization returned) sensitivity/robustness?
My question wasn't about how long should my test set and training set be, it was about in what fashion the parameters were optimized. If you have 2 parameters, you can just bruteforce every possible combination and plot them as a surface, however when you have 3 or more parameters, you have to add additional dimensions to your results, and humans are only capable of interpreting up to surface plots visually. Another way to do optimize 3 or more parameters is to optimize them sequentially, but that might make you miss the most optimized/robust combination of parameters.
Market Microstructure trading seems to require a lot of parameters. There are just lots of things you have to keep track of and account for. Counting trades on Bid/Offer, watching for orders pulling at the front of the book vs the back of the book, tagging certain orders as manual, others as spreaders, etc. The list goes on and on. So, you start with 10-20 parameters - many of which vary because of time of day - and then you come across all the situations where accounting for other activity would help your strategy and so you need more parameters. My approach has been generally to find a robust set of non-essential parameters per product/expiration and then work through the 10 or so left by manually constraining the search space and then running a genetic optimization. I can do about 10,000 simultaneous simulations so 'shotgunning' a search space with a few million simulations doesn't take terribly long. Maybe there's a better solution but this is what works for me.
A genetically optimized system for a glabal maximum will almost always be robust to small changes in parameters and the test may fool you. See relevant links Fooled by machine learning Benferroni correction If you do optimization you essentionally do many multiple comparisons and effectively you lose significance. This is the problem mainly.