Let's say I wanted to test two commodities to see how much they correlated over time. What's the starting point? I mean I could take today's prices, or any day's prices as long as they're the same day and say there's no divergance. I'll take that day's difference in prices as 0% divergence in price. Is there a more correct way to do this? It seems as if I start the analysis on an arbitrary day I may skew the whole study. For example if I started the analysis today and the two commodities have moved away from each other significantly in the past few time periods, wouldn't the results be totally screwed up. I may be thinking about this too much though. A historical test of price divergence may hold true no matter what day the test starts, right? This brings me to another question: What to do about scaling the y-axis for price difference? ex. comm A trades @ 2500 and comm B trades at 600. Is it best to mult 600 and all future prices of comm. B to match comm. A prices?

Why not take the log of the prices as a starting point to your analysis? Solves your scaling issue. As for your correlation question, you have to pick a time frame that's relevant to the time frame within which you are trading.

I'm not really following what you're saying. Sorry. Let's say ND is at 2036.5 gold is at 651.30. What's the next step?

I've found the most effective method is to take historical one day (/ one hour / five minute...what ever duration you like) % price change of the insturments you're comparing and enter the historic prices in excel. Then run the 'CORREL' function.

So your suggestion is to line up each days percent change in two columns in excel. Then run CORREL function?

He suggests taking the logarithm of the price, because the lognormal distribution is bounded from below by zero, just like asset prices (they can't have a negative price). (Someone correct me if I remember wrong) Take the log of each price, find the Covariace between asset A and B and their STD. Correlation = Cov(A,B)/ (STD(A)*STD(B)). -Neo

first, scaling is not relevant when using correlation, since correlation subtracts the mean of every variable and divides by the std. second, i don't understand the log suggestion somebody gave you. either you're interested in the variables correlation or in their logs correlation. the log-normal distribution isn't relevant IMO. third, the first value issue you've mentioned is not relevant if your data is long enough. (you can try and see) fourth, i would suggest calculating a "sliding correlation": decide the time period which is relevant (according to your system e.g. 20 days or so) and calculate the correlation of each 20 values. meaning: correlation(var1(i:i+19),var2(i:i+19)) and than correlation(var1(i:i+19),var2(i:i+19)) etc. this way you could find areas in time where the correlation had skew and maybe see the reasons why this happed. it's easy to do even with excel. hope this helps

Neo dude is correct. The formula he refers to is the same as the function 'Correl' in Excel. the function just saves the bother of typing it out in the worksheet. it's a good methodology and I've written pairs trading models (for equities) using this.

Alright so I got correlation down. Gives you values between 1 and -1. So I'm assuming the higher the hist. correl. and higher the price divergence . . . the better chances the gap should close?