Open-ended vs. range-constrained parameters and curve-fitting

vikana · Apr 20, 2012

Quote from logic_man:

... So, I'm not selecting a single value for the parameter, but I am defining a set of values within the parameter's allowable range which I would know from my historical data would lead to a negative outcome on average, meaning that now I have myself a viable way of filtering out trades as undesirable as well as a way of identifying positive expectancy trades, based on the one parameter.
...
More...

I think bounded and un-bounded are equivalent and that the same rules apply. You can often transform a bounded variable to an unbounded one and visa versa.

For instance, for a moving average you can look at the period from 1 to infinity. If you in stead look at frequency (which is equivalent) you're looking at the range of 1 to 1/infi=0, i.e. the bounded interval of ]0;1].

Both are valid and mean the same.

logic_man · Apr 20, 2012

Quote from alexvnew:

I think you ignore the fact that any indicator can be forced to be range-constrained between some fixed numbers.
More...

But when you are optimizing, why would you take an indicator reading on, to stick with the moving average example, and force it to be a moving average between 10 and 20 periods? You would, at some point, pick the best possible value for the number of periods. If 17 gives you the best results, you would only use 17. I've not heard of anyone using a moving average strategy which says to enter if any moving average from the 48 to the 52 crosses over any moving average from the 196 to the 204, for example. Sure, some people try to get cute and use the 49 over 199 as the "golden cross" and I get that, but it is still a single value for each parameter in the optimization. I suppose you could have multiple instances of a MA crossover strategy, each with a different set of optimal values, though.

It still seems like that is different from optimizing to find a range of values which can serve as action triggers, as in the example which I used of entering if the odds of something are above 50% and not entering if they are below 50%. There, whether the odds are 51% or 95%, you take the same action because the expected value of that action is positive. If the odds are 49% or 0%, you don't take action because the expected value is negative.

Admittedly, this is a little different from the original contrast I laid out, but this is really what I was getting at.

logic_man · Apr 20, 2012

Seeing the discussion and clarifying what I meant for myself, maybe I should have entitled this thread "Single values vs. ranges as model parameters and curve-fitting".

Maybe the discussion would have been the same, though, since I'm not saying that I fully buy in to the distinction.

But, to take a relatively simple example, if a strategy says to go long when the odds the Fed will cut rates are over 50%, stay flat if they are from 49% to 10% and go short if they are under 10% (let's say you've found that, historically, there is a 10% chance the bond market is reading the Fed wrong and that opens up a shorting opportunity). That strategy takes a range as its parameter values.

That seems like a different kind of approach than one which says to go long if the short-term Fed funds rate goes 100 basis points lower than the rate on some corporate bond fund index (on the idea that those corporate rates would come down and the market would be positive, yadda, yadda, yadda) and go short if the Fed funds rate goes 25 basis points lower. In this second strategy, the only trigger for action are the single exact values "100 basis points" and "25 basis points". Assume that you can time your entry so that you can enter at those exact times, although that does assume the market is available 24/7 in some sense. Not that that assumption is correct, but I think the issue of execution on such a strategy is secondary to my question.

Is the risk of "bad" curve-fitting equally present in both approaches?

braincell · Apr 20, 2012

Quote from logic_man:

But, to take a relatively simple example, if a strategy says to go long when the odds the Fed will cut rates are over 50%, stay flat if they are from 49% to 10% and go short if they are under 10% (let's say you've found that, historically, there is a 10% chance the bond market is reading the Fed wrong and that opens up a shorting opportunity). That strategy takes a range as its parameter values.

Is the risk of "bad" curve-fitting equally present in both approaches?
More...

What you're saying there is the same thing in the earlier post, I see what you're asking. That's why I said - how you determine what is a fundamental market mechanism is an entirely different issue. It's an issue of statistics.

If you take one instrument, one bar size, one parameter of an indicator or whatever, and you test it, let's say that over all data you have you get 1000 sampling points upon which to base your conclusion. However, in the range of selections you had millions of values in-between. If you get a "perfect number" within that range, there's (to simplify things to the max) a 1000 in a few million chances that it's a truly meaningful value (even if it's a range of values). So if you wanted to really test your parameter you'd need a few more tonnes of data to make it valid. Then you'll realise that based on descriptive statistics of the instrument the value ranges seem to change. For example, the sustainability of positive trends is correlated to slightly higher values, etc. Then you'd have to create correlations within that huge pile of data and try to draw conclusions out of that - so that's basically combinations within data, increasing the amount of data by several factors. So for example you see a link between two descriptive statistics of the market and one parameter value, 3 values put on a 3d graph, you can see planes forming where the "meaningful" value ranges might statistically emerge. These planes in data signify a possible robustness, which can itself later be computed into single values and correlated to whatever else. You can also then move into n-dimensional search for planes, and if you're into math, that's a great hobby.

What i'm trying to say is that in order to find truly meaningful non-curve-fit parameters, it takes 1- about 100 GB of data, 2- a great statistical platform 3- understanding of statistics and quantitative analysis.

I saw a website recently that was kind of interesting (this discussion reminded me of it), though i didn't really read into the details, that might help you out: http://meyersanalytics.com/
The "walk forward surface explorer" looks interesting to me, if it does what i think it does - read the "Data Mining and Curve Fitting. " on it's page.

Ultimately, it's nowhere near as easy as reading into a good range of parameter values for a single parameter on a single market. The probabilities are very large that with a 1000 samples you will get curve-fits all over the place. You need millions at least - in which case you'll probably get no meaningful data because of the differences in each market - then you're back to square one to link those markets somehow and find what could work with all of that in mind.

Further, statistics if applied incorrectly, will you give you garbage. It takes quite a bit more math and fundamental understanding of how everything links together to draw such conclusions. Even then, you have to take into consideration the framework (model) you are trying to put your output values into. Trying to predict price is often a futile exercise, but finding good parameters for one component of a model (ie detection of probability for mean reversion) can pay off.

logic_man · Apr 20, 2012

Quote from braincell:

What you're saying there is the same thing in the earlier post, I see what you're asking. That's why I said - how you determine what is a fundamental market mechanism is an entirely different issue. It's an issue of statistics.

If you take one instrument, one bar size, one parameter of an indicator or whatever, and you test it, let's say that over all data you have you get 1000 sampling points upon which to base your conclusion. However, in the range of selections you had millions of values in-between. If you get a "perfect number" within that range, there's (to simplify things to the max) a 1000 in a few million chances that it's a truly meaningful value (even if it's a range of values). So if you wanted to really test your parameter you'd need a few more tonnes of data to make it valid. Then you'll realise that based on descriptive statistics of the instrument the value ranges seem to change. For example, the sustainability of positive trends is correlated to slightly higher values, etc. Then you'd have to create correlations within that huge pile of data and try to draw conclusions out of that - so that's basically combinations within data, increasing the amount of data by several factors. So for example you see a link between two descriptive statistics of the market and one parameter value, 3 values put on a 3d graph, you can see planes forming where the "meaningful" value ranges might statistically emerge. These planes in data signify a possible robustness, which can itself later be computed into single values and correlated to whatever else. You can also then move into n-dimensional search for planes, and if you're into math, that's a great hobby.

What i'm trying to say is that in order to find truly meaningful non-curve-fit parameters, it takes 1- about 100 GB of data, 2- a great statistical platform 3- understanding of statistics and quantitative analysis.

I saw a website recently that was kind of interesting (this discussion reminded me of it), though i didn't really read into the details, that might help you out: http://meyersanalytics.com/

Ultimately, it's nowhere near as easy as reading into a good range of parameter values for a single parameter on a single market. The probabilities are very large that with a 1000 samples you will get curve-fits all over the place. You need millions at least - in which case you'll probably get no meaningful data because of the differences in each market - then you're back to square one to link those markets somehow and find what could work with all of that in mind.
More...

Hah, I was at that same site a few weeks ago, based on some Google searches I was doing.

What if (and this is a big what, and maybe the biggest issue of them all), rather than having "millions of values in between", you actually were testing ALL of the opportunities which existed to test that range of values for that parameter? In other words, outside of the context of your sample data gathering to determine the optimal range of values, you had found a way to treat all other data points as noise and irrelevant? I make about .5 trades per day, based on one specific set-up, so I'm really only interested in less than (let's say) 1% of the data the market generates. Within that 1%, I can test 100% of the opportunities to validate and optimize my range. Think of people who only trade the opening range, or something like that. As long as the opening range validates their parameter values, the rest of the day can do what it pleases.

Then, what you'd really be saying is "Only this specific context matters for my test, because only when this specific context exists, am I going to trade anyway". In essence, you would be setting up a scientific experiment under controlled conditions, which, if the EMH is correct, you probably shouldn't be able to do, right? But what if you could? And what if it was an experiment no one else was running? Wouldn't that be the ultimate edge, if it worked? It would essentially be "private information" about the market.

I think, if that were possible, the whole "out of sample" issue would disappear, for the most part, because you were dealing with the entire population of instances of the set-up, not just a sample of the population.

Or maybe not

trade4ever2day · Apr 20, 2012

Quote from braincell:

I saw a website recently that was kind of interesting (this discussion reminded me of it), though i didn't really read into the details, that might help you out: http://meyersanalytics.com/
The "walk forward surface explorer" looks interesting to me, if it does what i think it does - read the "Data Mining and Curve Fitting. " on it's page.
More...

His argument contains a certain logical but also mathematical flaw: circularity. He claims that a forward test should involve many oos tests, not just one. Let us define those as:

oos1, oos2, oos3, ...., oosN

He argues that

"we must repeat the walk forward out-of-sample (oos) analysis over many test/oos sections and take the average of our weekly results over all out-of-sample sections. This average gives us an expected weekly return and a standard deviation of weekly returns which allows us to statistically estimate the expected equity and it's range for N weeks in the future."

But the mean of the means should equal the mean of the combined

oos1+oos2+oos3+...+oosN = oos

Then, you can just use the one oos for that and walk-forward does not escape the problem. One may still get a significant mean return in oos as defined above but fail in oos(N+1) if conditions change completely.

I am afraid his whole talk is circular and he is trying to push some tools that do not solve the problem because the argument he uses in their favor comes back to haunt him.

amazingIndustry · Apr 20, 2012

if I may suggest you focus on the stability of the fitted range or optimized point value as a function of predictive power. As pointed out in your example it often makes sense to optimize for ranges rather than single values. But what seems most important to me is the level of robustness such optimization results provide. This makes in my book the difference between curve fitting and proper optimization. If your predictive power remains stable over the optimized parameter range when moving OOS then that is what you want to aim for.

Quote from logic_man:

Seeing the discussion and clarifying what I meant for myself, maybe I should have entitled this thread "Single values vs. ranges as model parameters and curve-fitting".

Maybe the discussion would have been the same, though, since I'm not saying that I fully buy in to the distinction.

But, to take a relatively simple example, if a strategy says to go long when the odds the Fed will cut rates are over 50%, stay flat if they are from 49% to 10% and go short if they are under 10% (let's say you've found that, historically, there is a 10% chance the bond market is reading the Fed wrong and that opens up a shorting opportunity). That strategy takes a range as its parameter values.

That seems like a different kind of approach than one which says to go long if the short-term Fed funds rate goes 100 basis points lower than the rate on some corporate bond fund index (on the idea that those corporate rates would come down and the market would be positive, yadda, yadda, yadda) and go short if the Fed funds rate goes 25 basis points lower. In this second strategy, the only trigger for action are the single exact values "100 basis points" and "25 basis points". Assume that you can time your entry so that you can enter at those exact times, although that does assume the market is available 24/7 in some sense. Not that that assumption is correct, but I think the issue of execution on such a strategy is secondary to my question.

Is the risk of "bad" curve-fitting equally present in both approaches?
More...

amazingIndustry · Apr 20, 2012

sure if you focus on means only. If you apply the same approach to risk it would not hold but you would get entirely different results.

Quote from alexandermerwe:

His argument contains a certain logical but also mathematical flaw: circularity. He claims that a forward test should involve many oos tests, not just one. Let us define those as:

oos1, oos2, oos3, ...., oosN

He argues that

"we must repeat the walk forward out-of-sample (oos) analysis over many test/oos sections and take the average of our weekly results over all out-of-sample sections. This average gives us an expected weekly return and a standard deviation of weekly returns which allows us to statistically estimate the expected equity and it's range for N weeks in the future."

But the mean of the means should equal the mean of the combined

oos1+oos2+oos3+...+oosN = oos

Then, you can just use the one oos for that and walk-forward does not escape the problem. One may still get a significant mean return in oos as defined above but fail in oos(N+1) if conditions change completely.

I am afraid his whole talk is circular and he is trying to push some tools that do not solve the problem because the argument he uses in their favor comes back to haunt him.
More...

logic_man · Apr 20, 2012

Quote from amazingIndustry:

if I may suggest you focus on the stability of the fitted range or optimized point value as a function of predictive power. As pointed out in your example it often makes sense to optimize for ranges rather than single values. But what seems most important to me is the level of robustness such optimization results provide. This makes in my book the difference between curve fitting and proper optimization. If your predictive power remains stable over the optimized parameter range when moving OOS then that is what you want to aim for.
More...

Yes, I am monitoring the boundaries between the good part of the range and the bad part. What I really like is that the "bad part" of the range is still break-even, rather than a complete loser. I don't want to trade it per se because I prefer the meatier part of the range, but it seems that it being break-even says a lot about the robustness of the approach. I've also started paper-trading the Euro with it and am getting approximately the same results I've had over time in the ES, with approximately the same parameter values. It's early yet, but I have to like what I am seeing, not only that it's working out of sample, but on a different instrument as well.

dom993 · Apr 21, 2012

Quote from logic_man:

I don't want to trade it per se because I prefer the meatier part of the range, but it seems that it being break-even says a lot about the robustness of the approach.
More...

1- do you have enough setups in the "bad" part of the range to think it is statistically significant?

2- how stable in time is the performance in the "bad" part of the range? BE overall means little, for example it could have been negative in the 1st half of your backtesting & positive in the 2nd half, which could have several implications

3- do you have any market dynamics "theory" that can explain the results of that filter?