Fully automated futures trading

test_user · Nov 7, 2020

globalarbtrader said:
https://qoppac.blogspot.com/2020/11/improving-use-of-correlations-in.html?m=1

GAT
More...

Hello Robert,

First I'd like to point out a couple of issues with the code examples.
In the new correlation article in order to run the provided examples one needs a couple of additional imports:
from collections import namedtuple
from scipy.stats import norm
(at least I had to add them, using Python 3.7)

In the improved Sharpe ratio adjustment article there is a small bug in the code example in the function "mini_bootstrap_ratio_given_SR_diff", there is this line:
dist_points = np.arange(p_step, stop=(1-pstep)+0.000001, step=p_step)
the "pstep" should be "p_step".

Next I will discuss my experiences and findings with the original handcrafting correlation method.

test_user · Nov 7, 2020

I wanted to implement handcrafting correlation candidate matrix matching method with interpolation.

I ran the code to test it with correlation set [0 0.4 0], which is close to the perfect candidate match of [0 0.5 0] and got the following results:

>>> cmatrix = np.array([[1.,0.4,0.],[0.4,1.,0.],[0.,0.,1.]])
>>> get_weights_using_candidate_method(cmatrix)
[0.2791967258738014, 0.3936510514477296, 0.32715222267846916]

Immediately I spotted an issue - the weights for A and C must be symmetrical in this case, but it wasn't.

I debugged it to see why this was happening and found out the symmetry is broken because the correlations are matched against one asymmetrical permutation of a candidate. For instance, we have candidate [0.5 0 0.5]. Our case of [0 0.4 0] is matched against [0.5 0.5 0] permutation which doesn't produce symmetrical weights, but it's not matched against [0 0.5 0.5] which has the same distance from [0 0.4 0], to counter the introduced assymmetry. I.e. only one match per candidate matrix is done.

This problem can be solved by matching against all permutations of all candidates. Bluntly put, it's 6 permutations per each of the 10 candidates (so 60 weights instead of 10 to average). This modification solves the symmetry issues.

The next, even bigger issue I encountered is with the actual interpolation method. It produces quite pathological results.
For example, let's take correlation matrix [0 x 0] and run x from 0 to 0.99, execute interpolation and plot the resulting weights. We get the following:

This doesn't look good. We can see odd convergences at correlations [0 0.5 0] and [0 0.9 0]. This happens because the candidate weights are 1/distance, so at the exact candidate matches the weight for the exact match dwarfs the rest. (if distance is 0, then 1/distance is infinity)

I tried to do the same plot using the improved method that matches all permutations of all candidate matrices, to fix the issue of weird asymmetric matches:

Kinda looks better. The weights are symmetric (red and blue lines are the same, but only red is visible). But still unsatisfactory due to the 1/distance weighting of candidate weights messing things up.

In the next post I'll provide a solution I came up with.

test_user · Nov 7, 2020

There was still a lot of time remaining before the new November article is published, I started to think how to solve the interpolation problem.

I could not reproduce the method used to produce the candidate matrices and their weights (you gave a clue that it was shrinkage, but the shrinkage factor appeared to be all over the place without any codeable rhyme or reason).

After many experiments I stumbled upon a quite simple method, based on a kernel smoothing idea. It consists of 3 steps:
1. Round correlations as per handcrafting method:

if c <= 0.25, then it's 0
if c > 0.7, then it's 0.9
otherwise it is 0.5
2. Match the rounded correlations to a candidate matrix (it will be an exact match obviously) to get weights
3. Do steps 1 and 2 for the neighbouring correlation values. Average the resulting weights. That's it.
For example, say we have correlations [a b c]
We iterate through neighbouring values (using some sensible step size, like 0.05):
x = [a - 0.2; a + 0.2]
y = [b - 0.2; b + 0.2]
z = [c - 0.2; c + 0.2]
and do steps 1 and 2 for [x y z]

The result looks much more sensible than the (1/Euclidean distance) method from the previous post. This is the same experiment of interpolating [0 x 0] where x runs from 0 to 0.99:

This basic method is akin to using a very naive kernel smoother (taking simple average of all neighbouring points up to distance of 0.2). I have tried a more "academic" approach - using the Gaussian kernel, but the results were almost identical so it's not worth bothering with this additional complication.

globalarbtrader · Nov 9, 2020

test_user said:
There was still a lot of time remaining before the new November article is published, I started to think how to solve the interpolation problem.

I could not reproduce the method used to produce the candidate matrices and their weights (you gave a clue that it was shrinkage, but the shrinkage factor appeared to be all over the place without any codeable rhyme or reason).

After many experiments I stumbled upon a quite simple method, based on a kernel smoothing idea. It consists of 3 steps:
1. Round correlations as per handcrafting method:

if c <= 0.25, then it's 0
if c > 0.7, then it's 0.9
otherwise it is 0.5
2. Match the rounded correlations to a candidate matrix (it will be an exact match obviously) to get weights
3. Do steps 1 and 2 for the neighbouring correlation values. Average the resulting weights. That's it.
For example, say we have correlations [a b c]
We iterate through neighbouring values (using some sensible step size, like 0.05):
x = [a - 0.2; a + 0.2]
y = [b - 0.2; b + 0.2]
z = [c - 0.2; c + 0.2]
and do steps 1 and 2 for [x y z]

The result looks much more sensible than the (1/Euclidean distance) method from the previous post. This is the same experiment of interpolating [0 x 0] where x runs from 0 to 0.99:
View attachment 243527
This basic method is akin to using a very naive kernel smoother (taking simple average of all neighbouring points up to distance of 0.2). I have tried a more "academic" approach - using the Gaussian kernel, but the results were almost identical so it's not worth bothering with this additional complication.
More...

Thanks, that's exactly the kind of 'hacky' solution I like!

GAT

test_user · Nov 9, 2020

globalarbtrader said:
Thanks, that's exactly the kind of 'hacky' solution I like!

GAT
More...

By the way, one small tip is to use max distance of 0.19, instead of 0.20. Then it reproduces the original handcrafting table exactly. This is simply due to the rounding function, when, say, correlation of 0.5 turns into 0.9 when we check the "neighbour" 0.5 + 0.2 = 0.7 rounded towards 0.9. So I use "neighbour" max distance of 0.19 with step size 0.0475 instead of 0.2/0.05 setting. (the weights chart produced in the last comment was actually done with 0.19/0.0475)

This method has only 2 parameters. Max distance and step size. The step size does not influence the outcome much (do we really care if the weights differ by 0.002?), just the execution speed. Varying max distance (as long as it's a sensible value) does not bring about any massive changes either.

I was looking at your method recently. It seems to have 3 parameters:
1. Correlation estimation stdev multiplier ("4")
2. Estimation lookback window length (100 data points, or ~2 years of weekly data)
3. The minimum weight of 0.09

I don't have any objections against #1 and #3 but #2 did make me feel a bit suspicious (only 2 years of data?). When I get a few free moments later today, I'll post some of my findings with regard to this issue.

globalarbtrader · Nov 9, 2020

test_user said:
By the way, one small tip is to use max distance of 0.19, instead of 0.20. Then it reproduces the original handcrafting table exactly. This is simply due to the rounding function, when, say, correlation of 0.5 turns into 0.9 when we check the "neighbour" 0.5 + 0.2 = 0.7 rounded towards 0.9. So I use "neighbour" max distance of 0.19 with step size 0.0475 instead of 0.2/0.05 setting. (the weights chart produced in the last comment was actually done with 0.19/0.0475)

This method has only 2 parameters. Max distance and step size. The step size does not influence the outcome much (do we really care if the weights differ by 0.002?), just the execution speed. Varying max distance (as long as it's a sensible value) does not bring about any massive changes either.

I was looking at your method recently. It seems to have 3 parameters:
1. Correlation estimation stdev multiplier ("4")
2. Estimation lookback window length (100 data points, or ~2 years of weekly data)
3. The minimum weight of 0.09

I don't have any objections against #1 and #3 but #2 did make me feel a bit suspicious (only 2 years of data?). When I get a few free moments later today, I'll post some of my findings with regard to this issue.
More...

I'd be suspicous as well. Having thought about it some more, given that predictability was still going up with 10 years of data (not for the underyling instruments, but for both forecast and instrument weights). The only real reason to use a shorter window was because I was uncomfortable with the zero weights produced by the longer data periods. But I already 'solved' that with the minimum weight. In fact, the data is telling me that with 10 years of data you can be darn sure that the correct portfolio does indeed have a zero weight in some cases. It strikes me as better to use the explicit hack, rather than achieving it through the back door and pretending it's the correct thing to do.

Note that where we to consider the uncertainty of Sharpe ratios and correlations jointly,that would justify having at least some weight even when an asset is highly correlated as there would be some outcomes when having a zero weight would be suboptimal. This could be shown by doing a double pass; stepping through SR values and correlation values together, and optimising across them all. Whilst this is a neat idea it would slow things down a lot and we'd lose the intuition of the two step process (here are the weights for this correlation matrix. now lets' adjust them for SR). Something for another blog post.

So I'd revert to using all the available data for correlations, or maybe 10 years doesn't make much difference. I've changed the post accordingly.

GAT

test_user · Nov 9, 2020

With regards to 2 year rolling windows for correlation estimation, at first I looked at weighting the rules. I was looking at rolling 2 year correlations between different rules and their variations (ewmac, normalized momentum, breakout, doesn't matter). I found that 2 year correlation is swinging from place to place, but seemingly around its long term mean. Of course, the closer the mean is to 1, the narrower the swings.

Then I thought I'd just simply look at the numbers. Here's a report I did on 37 futures markets for a period 1990-2020, weekly data:

It simply compares what is happening in 2 year rolling periods VS just plain old correlation using all data (30 years in this example).
The interesting part is "Min - Full Data C[orrelation]" and "Max - Full Data C[orrelation]", and the "Largest deviation" which is just the min for the former and max for the latter.
It doesn't look significant, right? Well, I guess we could expect the rule correlations to remain stable.

I haven't looked into instrument weights yet. I have a hunch that, unlike rules, there may be some secular trends in correlations between some instruments and a rolling correlation estimation may be useful. However, this is just a guess for now (at least for me).

stochastix · Nov 9, 2020

isotope1 said:
I'm just working backwards to try and simulate rolling the contracts over on historical data, looking at corn one year out and rolling 12 months forwards, using spread trades. It seems the 'prudent' way to do this would be an 'open interest crossover', passively before the crossover and forced afterwards. For corn, it looks like you have a window of a few months to make the roll.

According to your blog post, you mention price action as being a factor in choosing when to roll. As best as I can tell, this is a random process, so catching a favourable spread is akin to 'getting lucky'. Doesn't it make sense here to use optimal stopping theory to decide the roll date, rather than guessing? (explained in a fun way here: http://www.npr.org/sections/krulwic...-marry-the-right-girl-a-mathematical-solution)

Also:
If you had to measure the quality of a discretionary roll, what statistic would you use?

Lastly, in the managed futures world, is rolling fully automated or is there still a discretionary element?

Sorry for all the questions, just trying to make sure I understand every bit! Thank you!
More...

Optimal stopping is a great idea. Funny, this post and one other is the only mention of "optimal stopping" on elitetrader.

See
https://www.diva-portal.org/smash/get/diva2:1066467/FULLTEXT01.pdf

test_user · Nov 13, 2020

Hello Robert,

I've been looking at one particular example of correlations between 3 assets:
0.9976, 0.9417, 0.9333 (AB, AC, BC)

A and B is almost 100% correlated. It makes sense that the largest weight should go to C. Additionally, the correlations are quite high and based on a lot of data, therefore the uncertainty is relatively low.

According to the original handcrafting method, the correlations are rounded to [0.9 0.9 0.9] and the optimal weights are [0.333 0.333 0.333]

However with the new method we get the following (500 data points ~10 years):

>>> apply_min_weight(optimised_weights_given_correlation_uncertainty(three_asset_corr_matrix(labelledCorrelations(0.9976, 0.9417, 0.9333)), 500))
array([0.21472084, 0.28415071, 0.50112846])

This option returns quite different weights [0.21 0.28 0.50] than the original equal weights result.

Which one would you prefer and why?

P.S.
If we cluster A B and C hierarchically into two groups, we'd get one group [A B] and the other [C]. Then we'd get the weights [0.25 0.25 0.5]. This shows how sensitive the method is to the clustering outcomes.
(I have encountered many more and better examples in my research, where a slight difference in clustering causes not so slight differences in weights)

globalarbtrader · Nov 13, 2020

test_user said:
Hello Robert,

I've been looking at one particular example of correlations between 3 assets:
0.9976, 0.9417, 0.9333 (AB, AC, BC)

A and B is almost 100% correlated. It makes sense that the largest weight should go to C. Additionally, the correlations are quite high and based on a lot of data, therefore the uncertainty is relatively low.

According to the original handcrafting method, the correlations are rounded to [0.9 0.9 0.9] and the optimal weights are [0.333 0.333 0.333]

However with the new method we get the following (500 data points ~10 years):

>>> apply_min_weight(optimised_weights_given_correlation_uncertainty(three_asset_corr_matrix(labelledCorrelations(0.9976, 0.9417, 0.9333)), 500))
array([0.21472084, 0.28415071, 0.50112846])

This option returns quite different weights [0.21 0.28 0.50] than the original equal weights result.

Which one would you prefer and why?
More...

My heart says equal weights. My head says the new method is correct. My gut, which is very pragmatic, says it won't matter very much eithier way. What are the two assets, out of curiousity?

GAT