Any good book on Statistical Arbitrage?

Discussion in 'Strategy Building' started by ezbentley, Apr 16, 2009.

  1. So I was studying epchan's blog on pairs trading and came across this interesting paradox that a reader and him discussed in this post:

    http://epchan.blogspot.com/2007/02/in-looking-for-pairs-of-financial.html

    I don't know if the policy allows a direct quote from another source, but here it is:

    "As an example, consider the limit when, as you say, the cointegration is very "good": for an index of N components you have included N-1 components in your basket. Now, this basket will approximate the index very well indeed. However, the difference (index - basket) is the one component left out and it is manifestly not mean reverting (being a process corresponding to a single stock). If that Nth component were mean reverting, we would just trade it directly and not bother with the synthetic hedge..."

    The discussion digressed into in-sample vs out-sample testing, but what the reader said remains a paradox to me.

    It seems to make sense that any deviation between the ETF and a basket of stocks would come from the other components NOT included in the basket. So why not just trade those components? Maybe I am so amateur that I am missing some essential idea. Does anyone care to elaborate on this paradox?

    Thanks
     
    #41     Jun 3, 2009
  2. i think what they're saying is there are two scenarios you can look at:

    1) one day XOM rises 10%, for example, but the peer group, XLE, the energy index, rises only 1%.

    you could take a view, based on various tests, that the price difference has gone out of proportion, and will later contract.

    so you would make a bet that the price difference will contract - either XLE will catch up with XOM's spike, or XOM will go back down, or both. So you would buy XLE and short XOM.

    2) i think what they're saying is that you may be better off trading several energy stocks - not only XOM, for example - against XLE.

    so you would select, say, 10 energy stocks from XLE to trade against XLE. you may, one day, detect a strong divergence from the relationship between this basket and XLE (assuming there is a more or less stable relationship, and you think it will continue, going forward)

    you make a bet that this divergence will get back to some estimated mean value (which i guess people get from the backtestings).

    so you have two extremes: one is your basket is 29 stocks out of 30 (XLE). this won't be tradable, because any divergence will be very small

    the other extreme is you have a basket of only, say, 2 energy stocks out of 30. maybe it's too risky because 2 stock can go away and stay there. but 10 stock go back and you make the money

    the trick is to find the 'right' stocks for the basket. and i think you have to use various statistical for this

    i actually don't if it's a good way to try to trade pairs for a retail investor. because you have so many stocks, and you don't know where they will go 100%

    i mean what happen if you're wrong and it all go in the different directions
     
    #42     Jun 4, 2009
  3. I think the seeming paradox(to me at least) is that constructing a basket of component stocks and trade them against the ETF is fundamentally flawed.

    Let's say the ETF has N components. You can form two time series:
    1. the ETF itself
    2. a basket of all N stocks with proper relative weights
    These two time series(ETF - N stocks) will cointegrate almost perfectly, assuming the tracking errors are too small to be tradeable.

    Then consider these two new time series:
    1. the ETF itself
    2. a basket of a subset of the N component stocks. Let's call this series n. And let's call all the other stocks NOT in this subset e(for error).
    These two new time series will cointegrate only if a linear combination of them is a stationary time series. So if we form the linear combination by taking their difference(since spread is what we try to trade), we get a series that's basically ETF - n stocks. (I am not being mathematically strict on my notations) But the resulting time series, ETF - n stocks, by construction, is equal to all the other components that are left out of the basket, that is the series e. And since e consists of just some ordinary stocks, there is no reason to believe that e is a stationary process. So ETF - n is also not a stationary process.

    Above is my line of thoughts. Since I am an amateur, I would be happy to learn whether my reasoning is right or wrong and how the paradox is resolved.

    Thanks,
     
    #43     Jun 4, 2009
    TraDaToR likes this.
  4. Hi Matt,

    Can you tell us which topic exactly talks about daily vs overnight ranges in gummy? As you know, his site has many topics and I couldn't find it by searching for "range" or "overnight."

    Thank you.
     
    #44     Jun 7, 2009
  5. The point is that there may exist a subset of n stocks the ETF (for a total of N stocks) component which can explain let's say 90% of the ETF evolution, since the the remaining N-n are higly correlated to the appropriate weighted sum of the n stokcs, so that the effectively n stocks are enough to mimic the ETF.

    In this case, the arbitrage would work.
    The point is how to find the n stocks.
    One way is to try all possible subsets and find the one which better fits the ETF (bit fit I mean error of the regression is minimum).

    The point is to do a walk forward back-testing on the historical data and check if the cointegration persists.
     
    #45     Jun 7, 2009
  6. ezbentley
    Look at the two articles DSA and DSA2 (Daily Stock Activity) plus DSO (Daily Stock Oscillation).
    Also plot your own graph of overnight change v daily change - see which stocks have a bias towards one or the other
     
    #46     Jun 8, 2009
  7. bearmf

    bearmf

    There is no paradox.

    You are assuming we take the same weights for N stocks in the second case as in the first case (their proper weights in index). And in this case indeed the difference will be the weighted sum of other stocks, non-stationary.

    But instead we recompute the weights of N stocks so that they approximate the index well without other stocks, and these weights will be different and hopefully we will get a stationary residual.
     
    #47     Jun 9, 2009
  8. academic

    academic

    Hi Matt,

    A few posters on ET have said that individuals have no chance competing in the high-frequency pair trading space. Do you disagree with that? Do you consider intra-day pair strategies to be in competition with the big arbitrageurs?

    You talked mostly about doing stat arb between stock baskets and ETFs, but do you also use futures?

    Thanks for your great posts.
     
    #48     Jun 22, 2009
  9. #49     Jun 29, 2009
  10. Are you sure you did not compute returns by applying the standard formula to pair prices? What is the annualized volatility of a typical pair which yields this kind of return? Is it cointegrated if it moves so much more than bonds? If so then how many such high yield trades can one make in real time without monitoring the entire stocks universe (or even having a 100-ticker limit imposed by a broker such as IB)?

    How sensitive to lack of diversification is this strategy? I mean: does it work at all below 20-30 pairs run at any one time? Or else does the drawdown get beyond -50% for some unlucky portfolios? So is it at all appropriate to pitch it to indy traders who on average can only afford to monitor 10 and invest in 1 such basket?

    And why not disclose some performance measure appropriate for hedge funds, with kurtosis and/or drawdown included?

    Are retail costs really competitive in such liquid issues despite the disadvantages in all areas: the antiquated margining model, high commission rates, no access to dark liquidity pools or lo-latency access providers? After all, the lack of retail edge would be very consistent with a profitable strategy being taught and published.
     
    #50     Jul 2, 2009