Just trying to reconcile something in my head regarding sample distributions as described in the book: 1) To test a 1 rule strategy for 'edge' or statistical significance over random, Aronson recommends using Monte Carlo or Bootstrap to estimate a sample distribution and compare your test to the sample distribution. 2) When testing a number of rules, in this case lets say one rule optimized over many parameters, he recommends performing the MC over all N rules, taking the highest random return over the N rules and including that into the sample distribution.... So my question, if you optimize a strategy, you'll likely pick the rule with highest return (or drawdown, etc.) What if you just utilizing #1 on this optimized rule? How would that be any better then coming up with a single rule and using #1 to test for significance? Or is rule #1 inherently biased (as in you were lucky picking that 1 rule)? Kinda late at night so let me know if this needs clarification?

#1 is the unbiased estimator. This is so, since you are comparing an arbitrary rule to a sampling distribution of many possible rules. case #2 where you are optimizing is considered a biased estimator. Think of it as cherry picking the best outcomes of the rules you test (systematic bias). He shows some methods to strip out this bias to fairly compare your best return to the sampling distribution of the best returns. Careful on your terminology. Sampling distribution is significantly different than sample distribution. Hope that helps. It's a long book to digest.

Lets say I chose an arbitrary rule. I would then perform test #1 to identify statistical signifgance. Lets say that this arbitrary rule, by luck, I find out later also happens to be the best optimized rule. Which method should be used? For example, I take an MA cross over, 5 x 20. I obtain a sample mean. I would then perform an MC using procedure #1 to find the p-value. I find it statistically signifigant and trade it. BUT if I took the MA crossover, optimized over a number of different lengths, and I still found 5 x 20 to be the optimal solution, I would have to perform procedure #2 to find the sample? How are these two different? I guess wouldn't I have to optimize an arbitrary rule and perform procedure #2 MC otherwise my arbitrary rule (Parameters) may have just been lucky? Does that make sense?

In theory, if you happened to stumble upon the best rule by chance (i.e. case I), it would be like winning the lottery -- BUT-- that would correctly be statistically significant given that it was above the p value of your MC/bootstrap sampling distribution. If you somehow, additionally, stumbled upon the same rule set using method #2, you would not be required to strip out the bias, since you have already proved it was unbiased and statistically significant by use of the 1st rule. Stripping out the bias in the best rule set is a method for compensating for intentionally excluding information in the rule universe. The 1st rule does not exclude information. However, theoretically, the best rule is the best rule, regardless of how you find it.

Think I understand, one last example for clarification: I test 5x20 MA cross and find it statistically significant, and begin trading. I then later decide to optimize the parameters and find 7x20 yields a higher mean return. Since 5x20 is already significant, when analyzing 7x20 I can take it as a single rule and test P value only based on 7x20 MC and don't need to perform MC over all rules.... Edit: Thank you for your assistance, btw!

The simple answer is no, because as you just mentioned, you "optimized" it again. Optimize = cherry picking = bias. Since you hypothetically proved 5X20 was previously statistically significant, it may deflect the bias somewhat (there could be a way to model it as a type of conditional relationship, but I'd have to think more about that), but better to stick to rule of thumb ptimize = bias when in doubt IMO.