Open-ended vs. range-constrained parameters and curve-fitting

Random.Capital · Apr 21, 2012

Quote from logic_man:

Are either of these two types more susceptible to curve-fitting?
More...

Why worry about curve-fitting?

logic_man · Apr 21, 2012

Quote from dom993:

1- do you have enough setups in the "bad" part of the range to think it is statistically significant?

2- how stable in time is the performance in the "bad" part of the range? BE overall means little, for example it could have been negative in the 1st half of your backtesting & positive in the 2nd half, which could have several implications

3- do you have any market dynamics "theory" that can explain the results of that filter?
More...

Hi,

1. I think so, but the set-up isn't exactly easy to backtest over a decade or something, so I've really only got accurate data with all of the relevant metrics for 81 examples from the "bad" part and 149 from the "good" part. Those 81 examples from the "bad" part average ~-.25 ES points per trade, while the 149 examples from the "good" part of the range average ~3 ES points per trade. This leads me to conclude that the mean outcomes of these two populations are actually different, since I think if the means were the same, they would have converged by now. That's based on the heuristic I was taught in stats classes that 30 examples of something were typically sufficient to start drawing some tentative conclusions.

2. It's actually very stable. The average outcome for the "bad" part of the range has been negative virtually from the beginning of the data series.

3. Yes, if I were to explain to you why I think this phenomenon exists, it would be very intuitive to you. I've explained it to two people, one trader and one non-trader, and both "got it" very easily. In fact, that's why I begin my strategy optimization by focusing on this specific parameter.

logic_man · Apr 21, 2012

Quote from Random.Capital:

Why worry about curve-fitting?
More...

I worry about everything. It's genetic.

Seriously, though, it's because I don't like to take on more risk than I think I'm taking on. If I don't curve-fit, I think that I have a fairly accurate estimate of how much risk I am taking on, whereas if I do curve-fit, I am highly likely to be underestimating how much risk I am taking on. The possibility that the I am taking on more risk than I think I am is unnerving to me.

dom993 · Apr 22, 2012

Quote from logic_man:

Hi,

1. I think so, but the set-up isn't exactly easy to backtest over a decade or something, so I've really only got accurate data with all of the relevant metrics for 81 examples from the "bad" part and 149 from the "good" part. Those 81 examples from the "bad" part average ~-.25 ES points per trade, while the 149 examples from the "good" part of the range average ~3 ES points per trade. This leads me to conclude that the mean outcomes of these two populations are actually different, since I think if the means were the same, they would have converged by now. That's based on the heuristic I was taught in stats classes that 30 examples of something were typically sufficient to start drawing some tentative conclusions.

2. It's actually very stable. The average outcome for the "bad" part of the range has been negative virtually from the beginning of the data series.

3. Yes, if I were to explain to you why I think this phenomenon exists, it would be very intuitive to you. I've explained it to two people, one trader and one non-trader, and both "got it" very easily. In fact, that's why I begin my strategy optimization by focusing on this specific parameter.
More...

I doubt the 30 samples heuristic applies to trading, especially to backtesting. But your filter impacts 35% of the setups, and it appears to be stable throughout the backtesting period, this is good.

What is somewhat contradictory, is that despite you believe the filter to be based on a real market phenomenon, the average outcome for the trades filtered is about BE - that says "no better than random". One reading of this could be that the negative edge spotted by the filter balances the positive edge of your basic system. Another reading could be, your basic system has no edge but the backtesting is lucky on the subset of trades outside the filter.

I suggest doing some additional work to assess the value of that filter ... if the "bad" part of the filter is detrimental to your basic system, it could be good for a system working off "opposite" paradigm (if your system looks for reversals, try using the bad part of the filter on a trend-continuation system).

One last comment ... are these 230 setups all you have for your basic system, or is this just the subset for which you have access to the information required for the filter? If it is only a subset, then the obvious thing to do would be to get the information required by the filter for your entire backtesting period. THAT would be good OOS testing for that filter.

logic_man · Apr 22, 2012

Quote from dom993:

I doubt the 30 samples heuristic applies to trading, especially to backtesting. But your filter impacts 35% of the setups, and it appears to be stable throughout the backtesting period, this is good.

What is somewhat contradictory, is that despite you believe the filter to be based on a real market phenomenon, the average outcome for the trades filtered is about BE - that says "no better than random". One reading of this could be that the negative edge spotted by the filter balances the positive edge of your basic system. Another reading could be, your basic system has no edge but the backtesting is lucky on the subset of trades outside the filter.

I suggest doing some additional work to assess the value of that filter ... if the "bad" part of the filter is detrimental to your basic system, it could be good for a system working off "opposite" paradigm (if your system looks for reversals, try using the bad part of the filter on a trend-continuation system).

One last comment ... are these 230 setups all you have for your basic system, or is this just the subset for which you have access to the information required for the filter? If it is only a subset, then the obvious thing to do would be to get the information required by the filter for your entire backtesting period. THAT would be good OOS testing for that filter.
More...

Yes, I don't put a huge amount of stock on the idea of 30 samples, just mentioning it as one suggested number where you start to get away from purely random outcomes.

I am starting to think that once one gets beyond the filtering value and up through to the maximum value, the outcomes are essentially random, so that the real edge is in the range of values from the minimum value up through the filtering value. My initial hypothesis was that the edge extended from the minimum to the maximum values, but the actual edge has been shown to be valid for a smaller range of values than I thought. While it is possible that the "good" part of the filter has just been lucky, the profit factor of that part is nearly 7, so that is one heck of a lucky streak. I've been watching markets for a fairly long time and I can't say that I've seen anything so unique about the time period during which I've been collecting this data that would lead to such an outcome. I've been applying the filters with approximately the same values to a new market (Euro) for a little over a month now and the profit factor is near 5. I don't assume that the "good" part of the filter will be exactly the same for a different market, but I am happy to see that it is approximately the same. Again, the "bad" part of the filter for the Euro is slightly negative, on average. If the profit factors were closer to 1.5 to 2, I'd be more concerned, but these results strike me as supportive of the idea that this is a real phenomenon and something intrinsic to the market's functioning.

While I like the suggestion of possibly building another system off of the "bad" part of the filter, I think that since it is breakeven it wouldn't really provide the basis for a new system, which might be the case if it were strongly negative.

I actually have information for 400 set-ups in total, but the other 170 set-ups have been filtered by another filter as invalid (there is some overlap of the filters). Again, the results from the time of my discovery of those filters through today have been very consistently at slightly negative, much like the first filter.

dom993 · Apr 22, 2012

Have you tried inverting the other filter & test your new filter on the 170 setups that are currently discarded? I would expect the "good" part of the new filter to perform better than the "bad" part on those 170 setups, even though "better" might just be "not as negative" in that case.

I would also re-analyze the old filter, in the light of the new one ... ie., does the old filter do any good assuming you use the new filter.

logic_man · Apr 22, 2012

Quote from dom993:

Have you tried inverting the other filter & test your new filter on the 170 setups that are currently discarded? I would expect the "good" part of the new filter to perform better than the "bad" part on those 170 setups, even though "better" might just be "not as negative" in that case.

I would also re-analyze the old filter, in the light of the new one ... ie., does the old filter do any good assuming you use the new filter.
More...

Yes, and you are correct that using the other filter, both subsets which rely on the second filter are negative, with the "good" part of the second filter being slightly less negative than the "bad" part. So, the "good" and "bad" are actually a second filter.

The filter which discards the 170 setups takes precedence over even the "good" and "bad" parts of the range, precisely because even the "good" part of the range is negative using that filter.

So, in a sense, if my initial hypothesis had contained slightly different values for these two parameters, I would not have had to filter anything. However, since that could have led to a different set of trades, I prefer to retain the filtering approach rather than to go forward with the optimized parameters. I will simply ignore the trades which trigger beyond the filter values, let them play out and then take trades which meet the filter values. I actually like this approach because what it ends up doing is making it more difficult to enter a trade, thus reducing the risk of overtrading.