Hi everyone, Hoping to get some clarification on how to use PCA to hedge a bond futures butterfly against direction and slope. Thus leaving the curvature the part you are trying to trade. Looking at a 2yr 5yr 10yr futures butterfly. One way to construct the fly is simply 2yr yield - 2 * 5yr yield + 10yr yield. But then you have exposure to direction and slope, which I would like to hedge out by changing the ratio of the legs. This is the most relevant article I can find on the interweb: https://quant.stackexchange.com/questions/25342/calculating-pca-hedge-ratio-for-3-leg-spread I'm using the Analyse-It add-in in excel, and believe it is working correctly, but when I try to reproduce the example, I don't get the same PC eigenvectors that Chris Taylor does. Which is confusing. Additionally, when I try a real world example and use a years worth of daily changes in the yields (as calculated by CQG) for the three tenors, the PC3 weightings I get out look wrong (10yr factor larger than the 2yr factor), and in fact the fly becomes more unstable. So I'm definitely going wrong somewhere. Is it correct to convert the futures price to yield, and then use the daily changes in yield? Are the PCA factors then only applicable to the yield-converted contracts? How are they then switched back to underlying futures price? Any help / comments very much appreciated. Euribored.

Sweet. But, it's at the edge of my own experience... (Did this with SPX options -- very nice results in a horrifically low-vol environment, where *any* bump was a good bump.) Which is where I would direct you. So, "No help here." Because of all the repetition, it's probably not all that much more work to writing your own analysis sections. I've never used Analyse-It, but right now (and in all ignorance), I see you working a lot more to get Analyse-It to fly, than to build your own section. But either way, I would retreat a bit, and go for getting the smallest template of analysis to work properly, and then scale it up to your full data set. A missing exponent, an erroneous time vector, and the whole thing is baloney. And so (in a perfect world), I would shrink it to nothing, hand-wring results that I know to be sound, replicate those results (if you still wish it) in Analyse-It, and then incrementally scale up, bit by bit. For me, that's the only way I can trust the results into which I'm going to be pouring capital. This one, though, doesn't surprise me so much. Before convicting your results as inaccurate, paper-trade such a butterfly and see if its behavior (both as a whole entity and its component parts) behave as the numbers suggest. We have had a *crazy* market over these last few years, when prior times (an drates) are used as comparison. What's behind us has changed, what's in front of us is *known* to be changing -- why should it be any surprise that we have 'non-robust', unreliable results in front of us? Yes -- it may be your spreadsheet. AND, it may just be the market, with some pricing in danger, and some pricing in placid growth. Not so much help, then -- but a path of what I'd do (having face something similar in SPX land): go back to method basics and scale up, and make the numbers-in-front-of-you realization of just how crazy things are/have-been/will-be.

To answer the question of how to get ratios that leave you flat in one or more of the principal components, imagine the following. Let A be a matrix of dimension 3x3 composed of 3 columns vectors each representing the factor exposure to component 1,2,3 respectively. Consider A^T where T denotes the transpose and column vector p where each entry is your dollar exposure to each future (ensure the ordering aligns with your component matrix). The (A^T * p) gives a column vector where each entry is your dollar exposure to each factor. You want to chose a p (which is equivalent to choosing ratios) such that this vector fits your desired exposure profile (in this case youâ€™d like to hedge out the direction and slope components). This should be a fairly straightforward algebra exercise from there.

looks like he is getting PCA of the returns themselves instead of the returns correlation which is what I have always done. Code: from sklearn.decomposition import PCA import pandas as pd returns=pd.DataFrame([ [ 0.0143 , 0.0910 , 0.1451 ] ,[ 0.1791 , 0.3505 , 0.4588 ] ,[ 0.0572 , 0.1358 , 0.0120 ] ,[ 0.0357 , 0.1809 , 0.2884 ] ,[-0.0571 , -0.1096 , -0.0719 ] ,[ 0.0286 , 0.0710 , 0.1319 ] ,[ 0.0429 , 0.1806 , 0.2754 ] ,[-0.0357 , -0.0579 , -0.1075 ] ,[ 0.0714 , 0.2513 , 0.4304 ] ,[-0.0214 , -0.0771 , -0.1667 ] ],columns=['2y','5y','10y']) # pca_fit = PCA(n_components = len(returns.columns)).fit(returns.corr()) # expvar = pca_fit.explained_variance_ratio_.cumsum() # loadings = pca_fit.components_ pca_fit = PCA(n_components = len(returns.columns)).fit(returns) expvar = pca_fit.explained_variance_ratio_.cumsum() loadings = pca_fit.components_ ----- loadings array([[ 0.21689632, 0.53916968, 0.81378869], [ 0.55112255, 0.62044122, -0.55795755], [ 0.80574184, -0.56951624, 0.16257715]])

He has a choice of either doing SVD on the return series or getting eigenvectors/values of the covariance matrix. The result would be similar. Now, there is really no point in doing PCA only on the series your are hedging (i.e. 2s/5s/10s) - it's a dimensionality reduction technique, right?

Not sure I follow you -- but for me, I wanted to see sensitivities -- I wanted to see the market's expectation(s) so that (in this case) I wouldn't be trapped in one leg or another. Extreme peaks were a giveaway that there was an event to which I needed to pay attention.

My point is that if you are trying to hedge 2s/5s/10s and that's the only data you are using, you don't need PCA. Instead, you can do something simpler like regression of the body to the two wings ( b ~ x * w1 + y * w2). Where PCA really shines is if you have a lot of data and want to reduce the variance to a few key components. E.g. if you are looking at rates, you can use every annual rate from 1 to 30 and then see what components explain the majority of variance. Then you hedge to those factors only and enjoy the residual.