Generating correlated stock prices for simulating herd behaviour

Discussion in 'App Development' started by botpro, Feb 12, 2016.

  1. botpro

    botpro

    Just an update of my efforts:

    There's a well-known problem of a correlation matrix being invalid (for decomposition).

    This can happen not only with hand-crafted correlation matrices, it also can happen with program generated correlation matrices.

    The solution to the problem is to find/compute "the nearest correlation matrix".

    One of the leading researchers in this field is Nicholas J. Higham from Uni of Manchester.
    I'm currently studying his papers and algorithm named "shrinking" ( http://eprints.ma.man.ac.uk/2331/01/covered/MIMS_ep2014_54.pdf )
    as well the other "compute the nearest correlation matrix" algorithms.

    Here's a an example for a defective correlation matrix:
    1.0 0.6 0.9
    0.6 1.0 0.9
    0.9 0.9 1.0
    http://blogs.sas.com/content/iml/20...relation-matrix-not-a-correlation-matrix.html
    "the resulting matrix of pairwise correlations is not positive definite and therefore does not represent a valid correlation matrix.
    How can you tell? Positive semidefinite matrices always have nonnegative eigenvalues.
    As shown by the output of following program, this matrix has a negative eigenvalue [...]"

    As said, the solution is to find and work with the "nearest correlation", but that is not that easy to find, it seems.

    See also:
    https://en.wikipedia.org/wiki/Positive-definite_matrix#Positive-semidefinite

    Hmm. very complicated stuff...

    Has/does anybody else studied/studying that problem? Or working in this field?
     
    Last edited: Feb 17, 2016
    #11     Feb 17, 2016
  2. botpro

    botpro

    This is my detection routine for a defective correlation matrix:
    Code:
    bool matrix::is_positive_definite(const vector<vector<double>>& a, double& det)
      { /* CHECK:
           - "A^(-1) is positive definite" positive.pdf p.2
           - paper "ON THE EXISTENCE OF A CHOLESKY FACTORIZATION" by MARKUS GRASMAIR (cholesky.pdf):
             Theorem 2. An invertible matrix A ∈ R n×n admits a Cholesky factorization A = LL^T with a lower triangular matrix L ∈ R n×n,
             if and only if A is symmetric and positive definite.
             Proof. Assume that A = LL^T. Then
               A^T = (LL^T)^T = L^T^L = A,
             proving that A is symmetric. Moreover, if x ∈ R n \ {0}, then
             x^T Ax = x^T^LL^T x = (L^T x)^T L^T x = ||L^T x||2;2.
             Since A is assumed to be invertible, so is the matrix L and therefore also L^T (this
             follows from the fact that 0 != det A = det(LL^T) = (det L)^2). Since x != 0, this
             implies that also L^T x != 0, and consequently ||L T x||2 > 0, proving that A is positive definite.
        */
    
        return is_invertible(a, det) && (det > 0.0);
      }
    
    Ie. the correlation matrix must be invertible and its determinant must be > 0.

    As said, the missing part yet is finding (computing) the "replacement matrix" in case of detecting a defective matrix.
     
    Last edited: Feb 17, 2016
    #12     Feb 17, 2016
  3. botpro

    botpro

    Here's another example for a defective correlation matrix when using also negative correlations:
    http://www.quantumforest.com/2011/10/simulating-data-following-a-given-covariance-structure/
    (the comment of RRekka, 2015/10/17 at 5:08 am, and the reply to it):
    Code:
    1,    0.6,  0.6,  0.6
    0.6,  1,   -0.2,  0
    0.6, -0.2,  1,    0
    0.6,  0,    0,    1
    
    Quote from the reply of the blog author Luis there:
    "The simple answer is that it works with any positive-definite covariance matrix, which includes matrices with negative correlations.
    One way to check is that the determinant of the matrix has to be positive, which is not for your matrix det(M) is -0.25.
    This suggests that the correlation structure represented by your matrix does not make sense.
    For example, you have a positive correlation between vars 1 & 2 and vars 1 & 3, but a negative correlation between vars 2 & 3."
     
    Last edited: Feb 17, 2016
    #13     Feb 17, 2016
  4. If your goal is to make money trading, re-implement well-studied math routines is just getting you farther and farther off track.
     
    #14     Feb 17, 2016
  5. botpro

    botpro

    I'll add this to my system testing framework to make the simulations more realistic, because there is undoubtfully
    some correlation among the titles, and especially for testing some black swan events (ie. then the herd behaviour shows up big).
    These matrix stuff is unfortunately needed for generating correlated data.
     
    #15     Feb 17, 2016
  6. 2rosy

    2rosy

    most people use real historical data
     
    #16     Feb 17, 2016
  7. botpro

    botpro

    I need more. I'm not interested in historical data, I need live-data, and for testing (that's forwardtesting) simulated data (GBM) is king for me,
    as I can generate and test as much as I need.

    I need to generate such correlated data also for other scenarios beyond system testing.
     
    Last edited: Feb 17, 2016
    #17     Feb 17, 2016
  8. botpro

    botpro

    Here are the very first results:
    Correlated_GBM_1.png

    Correlated_GBM_2.png


    And here the correlation matrix used for the 2nd chart (hand-crafted, symmetric), and its Cholesky decomposition (the second one was used, ie. the upper triangle):
    Code:
    Name=mcorr rows=5 cols=5:
          1.00000      0.50000      0.60000      0.70000      0.80000
          0.50000      1.00000      0.90000      0.10000      0.20000
          0.60000      0.90000      1.00000      0.30000      0.40000
          0.70000      0.10000      0.30000      1.00000      0.50000
          0.80000      0.20000      0.40000      0.50000      1.00000
    Name=cholesky rows=5 cols=5:
          1.00000      0.00000      0.00000      0.00000      0.00000
          0.50000      0.86603      0.00000      0.00000      0.00000
          0.60000      0.69282      0.40000      0.00000      0.00000
          0.70000     -0.28868      0.20000      0.62183      0.00000
          0.80000     -0.23094      0.20000     -0.26803      0.44139
    Name=cholesky rows=5 cols=5:
          1.00000      0.50000      0.60000      0.70000      0.80000
          0.00000      0.86603      0.69282     -0.28868     -0.23094
          0.00000      0.00000      0.40000      0.20000      0.20000
          0.00000      0.00000      0.00000      0.62183     -0.26803
          0.00000      0.00000      0.00000      0.00000      0.44139
    
     
    Last edited: Feb 18, 2016
    #18     Feb 18, 2016