anyone else struggling with cointegration vectors that break down quickly in out-of-sample backtests? I'm currently backtesting a stat arb strategy - basket trading ETF's and some of their underlyings. it's based on a model that only trades at the close of each trading day. in every one of my tests, the cointegration vector in my model crumbles quickly out-of-sample, and any mean-reverting behavior the spread had quickly evaporates - making the model useless. it's worth noting that in some tests, a slightly distorted version of the in-sample mean-reverting spread will emerge out-of-sample, but it doesn't persist long at all. on top of that, this type of behavior seems completely random among all tests. could the issue be in the selection of the underlying stocks? I've tried selecting purely based on how well each of the underlyings cointegrated with the ETF over, say, a three-year period. doing just this seems to exacerbate the vector breakdowns out-of-sample. I've experimented with telescoping time-frames. that is, how well does XYZ cointegrate with ETF over 1 year? 2 years? 3 years? doing this DID reduce some of the noise and create a more stable spread out-of-sample, but not to such a degree that I'd feel comfortable trading the model. I realize that out-of-sample performance is never what it is in-sample, but this can't be right. there doesn't seem to be any edge over just guessing or throwing darts at the wall. any ideas? thanks in advance.

You did not provide enough information to back your conclusions. Did you say the model trades at each close? Why do you do that? What is the exit? What do you trade specifically? Your post is way too vague.

There is stat arb going on. Pairs trading is the numero uno strategy that launched stat arb. And this is what the OP is trying to do, using a cointegration vector to find some mean reverting pairs of stocks. Read Statistical Arbitrage: Algorithmic Trading Insights and Techniques.

Would help to know how you judge that it is not working out of sample. Do you do another cointegration test or do you have a strategy that was profitable in sample and failed out of sample?

hypothetically, the model trades 'at the close' because I'm only using daily close prices in my backtesting. in the future, my aim is focus on intraday price data. for example, 1-minute bars. the model would then be set up to look for trades 'at the close' of each minute. the spread of each potential trade is monitored in real-time, be it a spread based on daily data, or 1-minute data. the model looks to enter and exit when, say, the spread is two standard deviations above the mean of the spread, or two standard deviations below the mean of the spread, respectively. specifically, I'm looking to trade the spread between an ETF and a basket of, say, ten of its underlying stocks. ideally, this is a non-directional market strategy. in essence, I'm creating a synthetic asset (the 'spread') that looks something like this: synthetic asset = ETF - v*(weighted basket) where 'v' is the cointegration vector (a 10X1 matrix), and the 'weighted basket' is a linear combination of ten of the underlying stocks (a 1X10 matrix). the synthetic asset should theoretically always have close to zero value. however, when market imperfections (i.e., 'mispricings') are present, trading opportunities may present themselves - but it is only possible to capitalize on these opportunities if your spread is mean-reverting. I am able to create a spread that mean-reverts in-sample, but out of sample, things change. either the amplitude of the in-sample spread is changed, or the spread no longer mean-reverts - it may start trending. both scenarios make trading the model near-impossible.

the latter - I have, for example, a strategy that was profitable in-sample, but fails out-of-sample. by 'fails', I mean that the amplitude of the spread changes drastically out-of-sample, the spread begins trending out-of-sample, or a combination of both. these changes to the spread result from the cointegration vector breaking down. my question is, how do I minimize this breakdown so as to obtain a more mean-reverting spread out-of-sample? in other words, are there more effective techniques of selecting the underlying stocks in the basket that I'm not aware of? or are the vector breakdowns the result of something else? thanks to all in advance for your time on this.

Have you tried a rolling window? I explain, instead of taking the cointegration ratio on all the data in sample and then using it out of sample, recalculate the value every day always looking xx days behind. This should/could help to adapt to the changing behaviour of the spread. This is how it is done in basic hedge fund replication papers. I dont really have an answer except that it might not be cointegrated. What you are talking about looks a lot like what there is in A. Chang book. If it's in a book, do you think that it is really working that easy?

I've considered using a rolling window, but it seems that one must be careful about recalibrating the cointegration vector too often or else the transaction costs become prohibitive. I figured I'd first try to see if there were any other techniques available to stabilize the spread aside from that. Are you referring to Ernie Chan? I've read the book - and do realize that the insight provided is for demonstration purposes only. The situation I presented in this post is just an example, not necessarily my exact strategy.

I was indeed refering to Chan book. I dont have any insight and I doubt the guys have them are around here. Try Wilmott or nuclearphynance? You can have a look at Burgess famous paper on pairs trading and cointegration and maybe you can find some insights. I dont think a roling window is going to increase transaction costs if you have it a not so short period and your trade timeframe is short.