high level statistics stuff

bookish · Apr 25, 2017

Suppose I have 5 variables that I hope will predict, or at least correlate with a sixth and seventh.
1) I want to find out how to weight them.
2) But also, I suspect that some of the variables might "weigh" more when other variables are high or low.
3) In addition some of the variables may have a predictive lead that is more or less (in time) than the others.

I have an idea how to go about doing it in a tedious fashion in excel, but does anyone know of any software or plugins that do all this in a more automated fashion?

beerntrading · Apr 25, 2017

Actuaries go to school for 6-8 years to answer that...and the tedium of Excel is why they're so well paid. If you show me two of them who agree with each other, I'll show you a sycophant or corporate lackey.

Edit: ...which isn't to poo-poo your idea, just to suggest you're on the right track with tedious work in Excel.

xandman · Apr 25, 2017

You can probably do elementary applications with multivariate regression add-ins for Excel.

When you do a multivariate regression, each dependent variable will be expressed as algebraic functions containing your independent variables

For instance, let x, y, z be your variables. Suppose we get the following formulas by running multivariate regressions on your data.

x=2y+3z
y=(x-3z)/2
z=(x-2y)/3

This is your predictive formula and this gives you the coefficients as the weighting for your predictive formula.

x: (1,2,3)
y: (1/2,1,-3/2)
z: (1/3,-2/3,1)

Some programs will output coefficients. Some will give you the formula to get the coefficients.

As of now, Excel can only auto-generate linear regression formulas for each variable independently. A proper multivariate regression will tie everything together by incorporating the covariance and stddev of the data sets to determining the coefficients.

Here is your excel resource. http://www.real-statistics.com/multivariate-statistics/

I am not sure about incorporating lead/lag. Perhaps, in a flavor if ANOVA? idk.

xandman · Apr 25, 2017

beerntrading said:
Actuaries go to school for 6-8 years to answer that...and the tedium of Excel is why they're so well paid. If you show me two of them who agree with each other, I'll show you a sycophant or corporate lackey.

Edit: ...which isn't to poo-poo your idea, just to suggest you're on the right track with tedious work in Excel.
More...

Agree. Statistics is a somewhat messed up profession/practice. Even professors sometimes sound like they are unsure of what they are saying.

Funny thing is. These same concepts are expressed in other branches of math and physical sciences using different notation and language.

Nonetheless, I think it is the path by which Finance tries to achieve mathematical rigor. At least, in this era.

Niten Doraku · Apr 25, 2017

Why not split your data into a training and testing set?

Then use a machine learning algorithm such as neural nets to assign random weights followed by a gradient descent.

Use test set and see if you like the model.

Unless you're mining for relationships that you don't know exist, in which case, it is a bit more complicated.

beerntrading · Apr 25, 2017

xandman said:
Agree. Statistics is a somewhat messed up profession/practice. Even professors sometimes sound like they are unsure of what they are saying.

Funny thing is. These same concepts are expressed in other branches of math and physical sciences using different notation and language.

Nonetheless, I think it is the path by which Finance tries to achieve mathematical rigor. At least, in this era.
More...

I'd say they're mostly useful, but misused mostly. The problem isn't so much that they can't inform, it's that people draw incorrect inferences (ask someone what a 30% chance of rain means...)

My stat professor (there was only 1) told me on his first day that he had a PhD in lying. That was my only take-away from the class. I passed with an A. I decide what to do with actuaries' statistics for a living.

2rosy · Apr 25, 2017

Agree with xandman. Use regression. If you want to delve into buzz look at feature selection in a ML library to narrow things down

globalarbtrader · Apr 26, 2017

bookish said:
Suppose I have 5 variables that I hope will predict, or at least correlate with a sixth and seventh.
1) I want to find out how to weight them.
2) But also, I suspect that some of the variables might "weigh" more when other variables are high or low.
3) In addition some of the variables may have a predictive lead that is more or less (in time) than the others.

I have an idea how to go about doing it in a tedious fashion in excel, but does anyone know of any software or plugins that do all this in a more automated fashion?
More...

The statistical model you want is eithier the VAR or the VECM (basically a regression with multiple lags). Any decent statistical package will do it, or you can use a language like python, Matlab or R with the appropriate libraries installed.

GAT

Laissez Faire · Apr 26, 2017

Interesting thread. I find myself in similar predicament, but for now, I've been trying to do it 'manually'.

Without having to learn programming, are there any good alternatives to Excel for analyzing and sorting tabular data?

I'm thinking Excel probably does what I need already, but figured I could ask.

quant1 · Apr 26, 2017

If you'd like to post a formalized construction of the model you're considering I would be happy to give more details. But as I read it, your initial model is simply addressing the relationship between 7 variables at the same time. Call each variables Xi(t) where i is an index from 1-7 and t is the time.

Model 1: a1*X1(t)+...+a7*X7(t)+b=0. Where ai is a weight for variable Xi(t) and b is a constant. Note that this s equivalent to relating X1 through X5 directly to X6 and X7 since we can move them to the right hand side. In this model, finding optimal ai and b values is easy. Take observations of X1,...,X6 several times and place each observation as a row of a matrix and also append a 1 to the end of each row for the constant . Multiply this matrix by a column vector of ai and b (the unknown values) and set this equal to a column vector of corresponding X7 values. This is a linear equation. We know that this has a least squares minimizing solution by taking the psuedo inverse of the observation matrix and multiplying the result with the X7 observation vector. The result is the weights that minimize least square error for that set of observations. Now you must actively monitor how stable these values are over a period of interest not in the set.

Model 2: this is more open ended. Your current model states that weights may be a function of the other Xi values. This is a very complex setup and will be non linear. Results in non linear dynamics and manifold learning might be of use here.

Model 3: A mix of 1 or 2 with time lags. This is an easier barrier to overcome. Treat observations of X(t) and X(t-1) as separate variables and you can try the method of model 1 I described above. Note that autocorrelation may make things odd here, but it's a start.

Good luck with your research!