Generating correlated stock prices for simulating herd behaviour

botpro · Feb 17, 2016

Just an update of my efforts:

There's a well-known problem of a correlation matrix being invalid (for decomposition).

This can happen not only with hand-crafted correlation matrices, it also can happen with program generated correlation matrices.

The solution to the problem is to find/compute "the nearest correlation matrix".

One of the leading researchers in this field is Nicholas J. Higham from Uni of Manchester.
I'm currently studying his papers and algorithm named "shrinking" ( http://eprints.ma.man.ac.uk/2331/01/covered/MIMS_ep2014_54.pdf )
as well the other "compute the nearest correlation matrix" algorithms.

Here's a an example for a defective correlation matrix:
1.0 0.6 0.9
0.6 1.0 0.9
0.9 0.9 1.0
http://blogs.sas.com/content/iml/20...relation-matrix-not-a-correlation-matrix.html
"the resulting matrix of pairwise correlations is not positive definite and therefore does not represent a valid correlation matrix.
How can you tell? Positive semidefinite matrices always have nonnegative eigenvalues.
As shown by the output of following program, this matrix has a negative eigenvalue [...]"

As said, the solution is to find and work with the "nearest correlation", but that is not that easy to find, it seems.

See also:
https://en.wikipedia.org/wiki/Positive-definite_matrix#Positive-semidefinite

Hmm. very complicated stuff...

Has/does anybody else studied/studying that problem? Or working in this field?

botpro · Feb 17, 2016

This is my detection routine for a defective correlation matrix:

Code:

bool matrix::is_positive_definite(const vector<vector<double>>& a, double& det)
  { /* CHECK:
       - "A^(-1) is positive definite" positive.pdf p.2
       - paper "ON THE EXISTENCE OF A CHOLESKY FACTORIZATION" by MARKUS GRASMAIR (cholesky.pdf):
         Theorem 2. An invertible matrix A ∈ R n×n admits a Cholesky factorization A = LL^T with a lower triangular matrix L ∈ R n×n,
         if and only if A is symmetric and positive definite.
         Proof. Assume that A = LL^T. Then
           A^T = (LL^T)^T = L^T^L = A,
         proving that A is symmetric. Moreover, if x ∈ R n \ {0}, then
         x^T Ax = x^T^LL^T x = (L^T x)^T L^T x = ||L^T x||2;2.
         Since A is assumed to be invertible, so is the matrix L and therefore also L^T (this
         follows from the fact that 0 != det A = det(LL^T) = (det L)^2). Since x != 0, this
         implies that also L^T x != 0, and consequently ||L T x||2 > 0, proving that A is positive definite.
    */

    return is_invertible(a, det) && (det > 0.0);
  }

Ie. the correlation matrix must be invertible and its determinant must be > 0.

As said, the missing part yet is finding (computing) the "replacement matrix" in case of detecting a defective matrix.

botpro · Feb 17, 2016

Here's another example for a defective correlation matrix when using also negative correlations:
http://www.quantumforest.com/2011/10/simulating-data-following-a-given-covariance-structure/
(the comment of RRekka, 2015/10/17 at 5:08 am, and the reply to it):
Code:
1,    0.6,  0.6,  0.6
0.6,  1,   -0.2,  0
0.6, -0.2,  1,    0
0.6,  0,    0,    1
Quote from the reply of the blog author Luis there:
"The simple answer is that it works with any positive-definite covariance matrix, which includes matrices with negative correlations.
One way to check is that the determinant of the matrix has to be positive, which is not for your matrix det(M) is -0.25.
This suggests that the correlation structure represented by your matrix does not make sense.
For example, you have a positive correlation between vars 1 & 2 and vars 1 & 3, but a negative correlation between vars 2 & 3."

debitspread · Feb 17, 2016

If your goal is to make money trading, re-implement well-studied math routines is just getting you farther and farther off track.

botpro · Feb 17, 2016

debitspread said:
If your goal is to make money trading, re-implement well-studied math routines is just getting you farther and farther off track.
More...

I'll add this to my system testing framework to make the simulations more realistic, because there is undoubtfully
some correlation among the titles, and especially for testing some black swan events (ie. then the herd behaviour shows up big).
These matrix stuff is unfortunately needed for generating correlated data.

2rosy · Feb 17, 2016

botpro said:
I'll add this to my system testing framework to make the simulations more realistic, because there is undoubtfully
some correlation among the titles, and especially for testing some black swan events (ie. then the herd behaviour shows up big).
These matrix stuff is unfortunately needed for generating correlated data.
More...

most people use real historical data

botpro · Feb 17, 2016

2rosy said:
most people use real historical data
More...

I need more. I'm not interested in historical data, I need live-data, and for testing (that's forwardtesting) simulated data (GBM) is king for me,
as I can generate and test as much as I need.

I need to generate such correlated data also for other scenarios beyond system testing.

botpro · Feb 18, 2016

Here are the very first results:

And here the correlation matrix used for the 2nd chart (hand-crafted, symmetric), and its Cholesky decomposition (the second one was used, ie. the upper triangle):

Code:

Name=mcorr rows=5 cols=5:
      1.00000      0.50000      0.60000      0.70000      0.80000
      0.50000      1.00000      0.90000      0.10000      0.20000
      0.60000      0.90000      1.00000      0.30000      0.40000
      0.70000      0.10000      0.30000      1.00000      0.50000
      0.80000      0.20000      0.40000      0.50000      1.00000
Name=cholesky rows=5 cols=5:
      1.00000      0.00000      0.00000      0.00000      0.00000
      0.50000      0.86603      0.00000      0.00000      0.00000
      0.60000      0.69282      0.40000      0.00000      0.00000
      0.70000     -0.28868      0.20000      0.62183      0.00000
      0.80000     -0.23094      0.20000     -0.26803      0.44139
Name=cholesky rows=5 cols=5:
      1.00000      0.50000      0.60000      0.70000      0.80000
      0.00000      0.86603      0.69282     -0.28868     -0.23094
      0.00000      0.00000      0.40000      0.20000      0.20000
      0.00000      0.00000      0.00000      0.62183     -0.26803
      0.00000      0.00000      0.00000      0.00000      0.44139