Just an update of my efforts: There's a well-known problem of a correlation matrix being invalid (for decomposition). This can happen not only with hand-crafted correlation matrices, it also can happen with program generated correlation matrices. The solution to the problem is to find/compute "the nearest correlation matrix". One of the leading researchers in this field is Nicholas J. Higham from Uni of Manchester. I'm currently studying his papers and algorithm named "shrinking" ( http://eprints.ma.man.ac.uk/2331/01/covered/MIMS_ep2014_54.pdf ) as well the other "compute the nearest correlation matrix" algorithms. Here's a an example for a defective correlation matrix: 1.0 0.6 0.9 0.6 1.0 0.9 0.9 0.9 1.0 http://blogs.sas.com/content/iml/20...relation-matrix-not-a-correlation-matrix.html "the resulting matrix of pairwise correlations is not positive definite and therefore does not represent a valid correlation matrix. How can you tell? Positive semidefinite matrices always have nonnegative eigenvalues. As shown by the output of following program, this matrix has a negative eigenvalue [...]" As said, the solution is to find and work with the "nearest correlation", but that is not that easy to find, it seems. See also: https://en.wikipedia.org/wiki/Positive-definite_matrix#Positive-semidefinite Hmm. very complicated stuff... Has/does anybody else studied/studying that problem? Or working in this field?
This is my detection routine for a defective correlation matrix: Code: bool matrix::is_positive_definite(const vector<vector<double>>& a, double& det) { /* CHECK: - "A^(-1) is positive definite" positive.pdf p.2 - paper "ON THE EXISTENCE OF A CHOLESKY FACTORIZATION" by MARKUS GRASMAIR (cholesky.pdf): Theorem 2. An invertible matrix A ∈ R n×n admits a Cholesky factorization A = LL^T with a lower triangular matrix L ∈ R n×n, if and only if A is symmetric and positive definite. Proof. Assume that A = LL^T. Then A^T = (LL^T)^T = L^T^L = A, proving that A is symmetric. Moreover, if x ∈ R n \ {0}, then x^T Ax = x^T^LL^T x = (L^T x)^T L^T x = ||L^T x||2;2. Since A is assumed to be invertible, so is the matrix L and therefore also L^T (this follows from the fact that 0 != det A = det(LL^T) = (det L)^2). Since x != 0, this implies that also L^T x != 0, and consequently ||L T x||2 > 0, proving that A is positive definite. */ return is_invertible(a, det) && (det > 0.0); } Ie. the correlation matrix must be invertible and its determinant must be > 0. As said, the missing part yet is finding (computing) the "replacement matrix" in case of detecting a defective matrix.
Here's another example for a defective correlation matrix when using also negative correlations: http://www.quantumforest.com/2011/10/simulating-data-following-a-given-covariance-structure/ (the comment of RRekka, 2015/10/17 at 5:08 am, and the reply to it): Code: 1, 0.6, 0.6, 0.6 0.6, 1, -0.2, 0 0.6, -0.2, 1, 0 0.6, 0, 0, 1 Quote from the reply of the blog author Luis there: "The simple answer is that it works with any positive-definite covariance matrix, which includes matrices with negative correlations. One way to check is that the determinant of the matrix has to be positive, which is not for your matrix det(M) is -0.25. This suggests that the correlation structure represented by your matrix does not make sense. For example, you have a positive correlation between vars 1 & 2 and vars 1 & 3, but a negative correlation between vars 2 & 3."
If your goal is to make money trading, re-implement well-studied math routines is just getting you farther and farther off track.
I'll add this to my system testing framework to make the simulations more realistic, because there is undoubtfully some correlation among the titles, and especially for testing some black swan events (ie. then the herd behaviour shows up big). These matrix stuff is unfortunately needed for generating correlated data.
I need more. I'm not interested in historical data, I need live-data, and for testing (that's forwardtesting) simulated data (GBM) is king for me, as I can generate and test as much as I need. I need to generate such correlated data also for other scenarios beyond system testing.
Here are the very first results: And here the correlation matrix used for the 2nd chart (hand-crafted, symmetric), and its Cholesky decomposition (the second one was used, ie. the upper triangle): Code: Name=mcorr rows=5 cols=5: 1.00000 0.50000 0.60000 0.70000 0.80000 0.50000 1.00000 0.90000 0.10000 0.20000 0.60000 0.90000 1.00000 0.30000 0.40000 0.70000 0.10000 0.30000 1.00000 0.50000 0.80000 0.20000 0.40000 0.50000 1.00000 Name=cholesky rows=5 cols=5: 1.00000 0.00000 0.00000 0.00000 0.00000 0.50000 0.86603 0.00000 0.00000 0.00000 0.60000 0.69282 0.40000 0.00000 0.00000 0.70000 -0.28868 0.20000 0.62183 0.00000 0.80000 -0.23094 0.20000 -0.26803 0.44139 Name=cholesky rows=5 cols=5: 1.00000 0.50000 0.60000 0.70000 0.80000 0.00000 0.86603 0.69282 -0.28868 -0.23094 0.00000 0.00000 0.40000 0.20000 0.20000 0.00000 0.00000 0.00000 0.62183 -0.26803 0.00000 0.00000 0.00000 0.00000 0.44139