Reversion To the Mean (RTM) Intraday Strategies

MAESTRO · Mar 28, 2009

Quote from knocks420:

I'll take a guess that MAESTRO is referring to exponential and weighted moving averages. MA's are used to eliminate 'noise' in a time series and these act as low-pass filters in EE parlance.

EDIT:

Upon further thought I should include tests for Cointegration.
More...

No, that's not it!

dtrader98 · Mar 28, 2009

Quote from MAESTRO:

Right on the money, dtrader98! cubic spline interpolations is the family of curves that GUARANTEE Gaussian distribution around each point! Congratulations! Very good! Now, the challenge is to design a practical algorithm to exploit this phenomenon and you have your Holy Grail!
More...

I think you've dropped enough clues in other areas (albeit, unintentionally) to figure out what you are inferring.

Always look forward to finding a thread with some of your sage insights.

knocks420 · Mar 28, 2009

Quote from MAESTRO:

Right on the money, dtrader98! cubic spline interpolations is the family of curves that GUARANTEE Gaussian distribution around each point! Congratulations! Very good! Now, the challenge is to design a practical algorithm to exploit this phenomenon and you have your Holy Grail!
More...

Guaranteed for the past but how does it respond to out-of-sample data? If its closer fit does this result in less robustness?

ASusilovic · Mar 28, 2009

Quote from MAESTRO:

Right on the money, dtrader98! cubic spline interpolations is the family of curves that GUARANTEE Gaussian distribution around each point! Congratulations! Very good! Now, the challenge is to design a practical algorithm to exploit this phenomenon and you have your Holy Grail!
More...

Cubic spline interpolations ? OMG, that goes back to higher mathematics 15 years ago !

Quadratic spline interpolation

The quadratic spline can be constructed as

S_i(x) = y_i + z_i(x-x_i) + \frac{z_{i+1}-z_i}{2(x_{i+1}-x_i)}(x-x_i)^2

The coefficients can be found by choosing a z0 and then using the recurrence relation:

z_{i+1} = -z_i + 2 \frac{y_{i+1}-y_i}{x_{i+1}-x_i}

The coefficients z above are basically a running derivative approximation. Since only two points are used to calculate the next iteration's curve (instead of three), this method is susceptible to severe oscillation effects when signal change is quickly followed by a steady signal. Without some sort of dampening, these oscillation effects make the above method a poor choice.

A more stable alternative to this method is a cubic spline passing through the middle points of the original data instead. To calculate Si(x) first select a j such that xj−1 < x < xj+1 and that |xj−1 − x| + |xj+1 − x| is minimized. Then, find a, b, c and d of S(x) = ax3 + bx2 + cx + d such that:

\begin{align} S \left( \frac{x_{j-1} + x_j}{2} \right) = \frac{y_{j-1} + y_j}{2}\\ S'\left( \frac{x_{j-1} + x_j}{2} \right) = \frac{y_j-y_{j-1}}{x_j-x_{j-1}}\\ S \left( \frac{x_j + x_{j+1}}{2} \right) = \frac{y_j + y_{j+1}}{2}\\ S'\left( \frac{x_j + x_{j+1}}{2} \right) = \frac{y_j-y_{j+1}}{x_{j+1}-x_j} \end{align}

In effect, this yields a set of curves that are continuous in the first degree, highly stable (i.e. aren't subject to oscillation effects) and does not require huge matrix solving.

[edit] Cubic spline interpolation

For a data set {xi} of n+1 points, we can construct a cubic spline with n piecewise cubic polynomials between the data points. If

S(x)=\left\{\begin{matrix} S_0(x),\ x\in[x_0,x_1]\\ S_1(x),\ x\in[x_1,x_2]\\ \cdots \\ S_{n-1}(x),\ x\in[x_{n-1},x_n]\end{matrix}\right.

represents the spline function interpolating the function f, we require:

* the interpolating property, S(xi)=f(xi)
* the splines to join up, Si-1(xi) = Si(xi), i=1,...,n-1
* twice continuous differentiable, S'i-1(xi) = S'i(xi) and S''i-1(xi) = S''i(xi), i=1,...,n-1.

For the n cubic polynomials comprising S, this means to determine these polynomials, we need to determine 4n conditions (since for one polynomial of degree three, there are four conditions on choosing the curve). However, the interpolating property gives us n + 1 conditions, and the conditions on the interior data points give us n + 1 − 2 = n − 1 data points each, summing to 4n − 2 conditions. We require two other conditions, and these can be imposed upon the problem for different reasons.

One such choice results in the so-called clamped cubic spline, with

S'(x_0) = u \,\!
S'(x_k) = v \,\!

for given values u and v.

Alternately, we can set

S''(x_0) = S''(x_n) = 0 \,\!.

resulting in the natural cubic spline. The natural cubic spline is approximately the same curve as created by the spline device.

Amongst all twice continuously differentiable functions, clamped and natural cubic splines yield the least oscillation about the function f which is interpolated.

Another choice gives the periodic cubic spline if

S(x_0) = S(x_n) \,\!
S'(x_0) = S'(x_n) \,\!
S''(x_0) = S''(x_n) \,\!

Another choice gives the complete cubic spline if

S(x_0) = S(x_n) \,\!
S'(x_0) = S'(x_n) \,\!
S''(x_0) = f'(x_0),\quad S''(x_n)=f'(x_n) \,\!

[edit] Minimality of the cubic splines

The cubic spline has a very important variational interpretation, in fact it is the function that minimizes the functional

J(f)=\int_a^b |f''(x)|^2 dx,

over the functions in the Sobolev space H2([a;b]).

The functional J contains an approximation of the total curvature \left|\frac{f''(x)}{(1+f'(x)^2)^{\frac{3}{2}}}\right| of the graph of f(x) and then the spline is the approximation of f(x) with minimal curvature. It is also aesthetically pleasing.

Since the total energy of an elastic strip is proportional to the curvature, the spline is the configuration of minimal energy of an elastic strip constrained to n points. A spline is also an instrument to design based on an elastic strip.

[edit] Interpolation using natural cubic spline

It can be defined as

S_i(x) = \frac{z_{i+1} (x-x_i)^3 + z_i (x_{i+1}-x)^3}{6h_i} + \left(\frac{y_{i+1}}{h_i} - \frac{h_i}{6} z_{i+1}\right)(x-x_i) + \left(\frac{y_{i}}{h_i} - \frac{h_i}{6} z_i\right) (x_{i+1}-x)

and

h_i = x_{i+1} - x_i \,\!.

The coefficients can be found by solving this system of equations:

\begin{align} z_0 &= 0 \\ h_{i-1} z_{i-1} + 2(h_{i-1} + h_i) z_i + h_i z_{i+1} &= 6 \left( \frac{y_{i+1}-y_i}{h_i} - \frac{y_i-y_{i-1}}{h_{i-1}} \right) \qquad \mbox{ , } i=1,\ldots n -1\\ z_n &= 0 \end{align}

MAESTRO · Mar 28, 2009

Quote from ASusilovic:

The cubic spline has a very important variational interpretation, in fact it is the function that minimizes the functional

J(f)=\int_a^b |f''(x)|^2 dx,

over the functions in the Sobolev space H2([a;b]).....
More...

Indeed! That is what it makes so perfect for the RTM methods! More so, for different markets one can use different spline rigidness resulting in different parameters of the price distribution around any predicted point. In fact, there is an optimization procedure (could be reduced to a linear programming task in Sobolev space) that could optimize spline coefficients to match desired dispersion and density of the target distribution making it stable and very much predictable!

Kevin Schmit · Mar 28, 2009

Quote from ASusilovic:

\begin{align} z_0 &= 0 \\ h_{i-1} z_{i-1} + 2(h_{i-1} + h_i) z_i + h_i z_{i+1} &= 6 \left( \frac{y_{i+1}-y_i}{h_i} - \frac{y_i-y_{i-1}}{h_{i-1}} \right) \qquad \mbox{ , } i=1,\ldots n -1\\ z_n &= 0 \end{align}
More...

Is this LaTex?

I can't make heads or tails of your formulae!

Surely there is an HTML formula font or tag you could use to display these in a readable format.

Also, regarding the graph of the spline fit: it looks like the fit breaks down on the edges, particularly the right edge (last data/price point in the series), where the residual looks much larger than those corresponding to the rest of points in the series.

Regarding "guaranteed" guaussianity of the spline-fit residuals: that would be true of any least-sqaures curvilinear fit.

IronFist · Mar 28, 2009

So do you enter when price gets a certain distance away from the spline?

And does your system require averaging down when price does not immediately go in your favor?

lolatency · Mar 28, 2009

Quote from dtrader98:

I think you've dropped enough clues in other areas (albeit, unintentionally) to figure out what you are inferring.

Always look forward to finding a thread with some of your sage insights.
More...

Splines don't guarantee a guassian distribution around the data, unless the spline function itself happens to mimic approximating the mean of a guassian at a given level. Splines are just another variation on regression, but even in linear regression, it can never be assumed that the distribution of the residuals (post-fit) are normal. That has to be tested for, unless you know in advance with 100% certainty that distributions at a level are normal.

It may be that some stocks have this, but it's not mathematically guaranteed. i.e., it's not a given.

But, for the sake discussion, would you care to execute the Shapiro-Wilks test on all of the residuals on some of your plots? R has this function in one of its core libraries, I believe.

lolatency · Mar 28, 2009

Quote from MAESTRO:

Indeed! That is what it makes so perfect for the RTM methods! More so, for different markets one can use different spline rigidness resulting in different parameters of the price distribution around any predicted point. In fact, there is an optimization procedure (could be reduced to a linear programming task in Sobolev space) that could optimize spline coefficients to match desired dispersion and density of the target distribution making it stable and very much predictable!
More...

The problem with your explanation is that you've only been able to describe the behavior of the residuals in a spline fit with known, past data. You haven't actually addressed how you are going to forecast a prediction point.

If you're doing a straight linear approximation 1 step into the future, you're asking for trouble; particularly because intra-day data may have negative autocovariance at particular time-lags. Market-makers also induce negative AR(1) coefficients.

Forecasting the future has to be done using information about auto-covariance (and/or cross-auto-covariance from a leading time-series.) You need to have either the gamma or rho functions for ACF or ACVF. But even that is not simple, because you may have an integrated/cointegrated series and non-stationarity. If your volatility shifts while you have made the assumption of homoscedascity in your residuals, you are fucked.

bone · Mar 29, 2009

stat arb. highly correlated pairs.