Maybe I misread his original post, but I was under impression he takes a subset of the strategy returns and runs some descriptive statistics on that. One can argue that it has some merit, e.g. you can have a mean, median and 1% percentile Sharpe ratio, which would tell you how bad things would be if everything went wrong. My personal problem with it is that any additional metric is likely to fuel parameter tweaking and a metric that has a lot of degrees of freedom is guaranteed to do that.
Thanks for all the valuable feedback so far. To answer Elf's last question first, this is one single strategy that has a few input parameters that can be varied (like entry timing, trigger level, take profit level, stop loss level, etc.). With a grid search I test thousands of settings, rank them, and select all strategies between the 89%- and 97%-percentile (thus ignoring the extremely lucky outliers). The ranking of the individual strategies is done using the Monte Carlo analysis I described. I then trade the portfolio of the selected strategies in the next time interval and repeat the process again (Walk Forward Analysis). I am now trying to find the best hyperparameters (WFA reoptimize interval, WFA lookback period, percentile range of selected strategies, etc.). To do that, I would like an objective, measurable metric to evaluate the WFA result and compare it against other WFA runs. As mentioned, intuitively it doesn't feel right to use the individual trades of all the strategies in the portfolio as if they are independent. Hence, my post to seek advice how else to do a Monte Carlo analysis (e.g. use 1-day returns of equity curve instead of individual trades). Of course I am open to other metrics that can be useful to compare WFA results, it is just I am familiar with MC and have learned to distrust Sharpe. I guess, if all you have is a hammer, then ...
Yes, that is what my intuition tells me. Hence, I don't want to consider each trade as independent and rather look at them as clustered events. The input parameters have a strong effect on the behavior of the strategy (much more than the moving averages you mentioned as an example), and seem to do a good job complementing each other. Here is an example on GBPCHF (with a portfolio of about 500 strategies running in parallel):
Having a fixed statistical calculation as opposed to a computationally heavy, probabilistic simulation is certainly a plus. I will have some reading to do... Thanks for the reference. What I get from this, is that I need to adjust my code to do sampling with replacement.
Given a hyper-grid of various parameters, what’s the standard deviation of sharpe ratios across that space? Cause if your SR varies from 0.3 to 0.9 and you barely have 6 years of results it’s all spurious. Even if it was not curve fit, t-stat for sharpe of 1 and 6 years of results is like 2.4 (ie NOT statistically significant).
You could replace "monte carlo" in your original post with "math" or "computer code" and not be wrong . It is just not informative to do so. You could argue that the bootstrap, jackknife and resampling in general are all sub categories of monte carlo methods. It is a broad class of methods. I would think of it like if we have an imaginary dice that could have between 10 and a 100 sides, with a lookup table of payouts based on what side comes up and data on runs of the dice, we could use resampling to try to guess at the state of the dice. That would work because the dice throws are independent, identically distributed, stationary, ergodic, non-path dependent. Resampling from observations can provide information about the model we think we have. If you apply this though to a process that is dependent, non-ergodic, non-stationary, path-dependent, random correlations that come and go, you have nothing. Maybe even something worse than nothing if you believe you have something and act on it above what you would have done if you have nothing. To me, it is a situation that all models are wrong but some are useful. Sharpe ratio and assuming normality is obviously wrong but is still useful. It also has the benefit that you are not fooling yourself like this.