Can linear regression analysis really predict the future?

Discussion in 'Strategy Development' started by tradrejoe, Nov 4, 2009.

  1. For those of you who went through the exercise of using historical data and linear regression analysis to predict the future prices of trading instruments, have you ran into situations where the best beta coefficients that generates the best curve fitting *does not* really predict the future? In fact, often times if you go back in history and pretend you were operating the prediction system in the past, the more testing the more your accuracy converge to just 50%?

    What is the correlation between the ability of a set of time series data to fit a price curve and its ability to truely forecast the future with greater than 50% accuracy? Do we just pile up everything closely related to what we try to forecast (even sun spot movements) and go as far back on the time lag as we can without crashing the supercomputer? Does anyone have any experience to share? Thanks for your insights ahead of time.
  2. Simple methods rarely outperform. And this smells like blind backtesting, which is rarely useful in stable, good risk, highly profitable future trading.

    The major financial institutions would be WAY ahead of Nevil Newbie and his 2-monitor workstation with Excel... They have $$$ hundreds of millions in COmpSci, engineering and financial experts and computing power.

    Amazing how many people wander into trading with dreams of 500% annual returns with minimal risk.
  3. I took a cursory look at the regression. I gave up on linear. non-linear looks promising. I am still trying to understand it.
  4. I use LR. I have never seen it(or any method) predict the future. But it keeps me on the right side of the trade most of the time which I enjoy.
  5. Linear regression as I use it is for one of two things:

    1)To assign a value that the market may place on 1 more dollar of revenue, assets, 1 more percent of ROE, etc

    2) The other way is by regressing closing prices over recent time periods to predict tomorrows value, two days from now value, and three days from now value, not because I actually use that value, but because linear regression shows the "trend", and that's how it's meant to be used in the context of trading system development. If your predicted value is higher two days from now, the trend points higher, if your predicted value for tomorrow is lower, the trend is also lower, and you shouldn't fight it. For a history of QLD's predicted values, you can see my thread BWolinsky trading. Today, I will most likely post more predicted values, as I said, not because they mean anything to me other than that they keep you from fighting trends in the market. It's corollary is testing fitness. Using rsquared, we can see how accurately those values hold, but it's only relevant over short time periods.
  6. Splines are types of curves, originally developed for ship-building in the days before computer modeling. Naval architects needed a way to draw a smooth curve through a set of points. The solution was to place metal weights (called knots) at the control points, and bend a thin metal or wooden beam (called spline) through the weights. The physics of the bending spline meant that the influence of each weight was greatest at the point of contact and decreased smoothly further along the spline. To get more control over a certain region of the spline, the draftsman simply added more weights.
    The surface produced by splines always appears to be smooth and pleasantly looking. The reason for that effect is that while our eyes roll along a spline line we subconsciously anticipate (following our genetically embedded sense of inertia) where the next point should be and if we indeed see it at the anticipated location it creates in us a feeling of unconscious satisfaction and a sense of pleasant symmetry. As the matter of fact, what we were able to discover is that we consider the motion of the objects normal and almost unnoticeable if their behavior in our field of view follows some sort of a spline line. It appears that our visual anticipations are very much based on the same technique that the old craftsmen use to draw smooth lines. What is more interesting is that not too long ago splines had an explosion if their usage thanks to the film industry. Before 1990s special effects in motion pictures and animations that change (or morph) one image into another through a seamless transition were achieved through cross-fading techniques on film. However, since the early 1990s, this has been replaced by computer software to create more realistic transitions. At the heart of this software were splines. Thanks to splines a new era of computer animation has begun and truly amazing and realistic characters such as Shrek were born.

    One of the special types of splines a cubic spline became the most popular tool to interpolate the data. Mathematically a cubic spline could be described as a special function defined piecewise by the third degree polynomials. A cubic spline with a linear extension of its ending point is called “natural spline”. Natural splines have three basic properties:

    • They pass through all given data points with a unique one between each set of points.

    • They are smooth, meaning that at the points where they merge their first and second derivatives are equal.

    • And finally, natural splines have the second derivative at the endpoint that is always equal to zero.

    These unique properties of natural splines make them very useful in designing anticipation tools that could accurately “extend” an existing set of data into the future. Our research showed us that in any set of data that represents a movement governed by inertia natural splines predict the future position of a center of gravity with unprecedented accuracy.
  7. Mean reversion is always an edge if you know how to use it, and it sounds advanced enough to be an acceptable method of measuring that distance away from the "center of gravity."
  8. The concept of regression comes from genetics and was popularized by Sir Francis Galton in the late 19th century with the publication of “Regression Towards Mediocrity in Hereditary Stature”. Galton observed that extreme characteristics (e.g., height) in parents were not fully passed on to their offspring. Rather, the characteristic in the offspring regressed towards a mediocre point (a point which has since been mathematically shown to be the mean). By measuring the heights of hundreds of people, he was able to quantify regression to the mean, and estimate the size of the effect. Galton wrote that, "The average regression of the offspring is a constant fraction of their respective mid-parental deviations." This means that the difference between a child and her parents on some characteristic was proportional to her parents’ deviation from typical people in the population. So if her parents were each two inches taller than the averages for men and women, on average she would be shorter than her parents by some factor (which today we would call one minus the regression coefficient) times two inches. For height, Galton estimated this correlation coefficient to be around 2/3: the height of an individual will center around 2/3rds of the parents deviation.
    Although Galton popularized the concept of regression, he fundamentally misunderstood the phenomenon; thus, his understanding of regression differs from that of modern statisticians. Galton's was correct in his observation that the characteristics of an individual are not fully determined by their parents; there must be another source. However, he explains this by arguing that, "A child inherits partly from his parents, partly from his ancestors. Speaking generally, the further his genealogy goes back, the more numerous and varied will his ancestry become, until they cease to differ from any equally numerous sample taken at haphazard from the race at large. In other words, Galton believed that regression to the mean was simply an inheritance of characteristics from ancestors that are not expressed in the parents; he did not understand regression to the mean as a statistical phenomenon. In contrast to this view, it is now known that regression to the mean is a mathematical inevitability: if there is any random variance between the height of an individual and parents (providing the correlation is not exactly equal to 1) then the predictions must regress to the mean regardless of the underlying mechanisms of inheritance, race or culture.

    It is very interesting that Galton missed the true meaning of the mean reversion and yet he came up with a device that demonstrates this principle with an amazing clarity. Galton’s “Bean Machine” that is also known as “Galton Box” consists of a vertical board with interleaved rows of pins. Balls are dropped from the top, and bounce left and right as they hit the pins. Eventually, they are collected into one-ball-wide bins at the bottom.

    Aside of vividly demonstrating the principle of regression to the mean the Galton’s Box provides an analogue proof that a normal mixture of normal distributions was itself normal! It, of course, was a stroke of genius. It was perhaps the most important breakthrough in statistics in the last half of the nineteenth century.

    Mean reversion theory has been used to create market trading strategy for many years. Typically, the trading algorithms that are based on mean reversion suggest that prices and returns eventually move back towards their mean or average. This mean or average can be the historical average of the price or return or another relevant average such as the growth in the economy or the average return of an industry. This theory has led to many investing strategies involving the purchase or sale of stocks or other securities whose recent performance has greatly differed from their historical averages. However, a change in returns could be a sign that the company no longer has the same prospects it once did, in which case it is less likely that mean reversion will occur. More so, in the event of drastic market price moves caused by discussed above “flocking behavior” or “spontaneous synchronization” mean reversion might lead to significant losses as that reversion might not occur for a long period of time.
  9. As with any highly non-linear system, the further you move away from the operating point, the worse your model becomes. I'm skeptical that linear regression can consistently give you an edge (although I'm sure there are those who make it work for hat's off to you, as I never found it useful).
    As a school project, I did some work in recurrent neural networks as predictors for non-linear time series. These are highly adaptive (they were re-trained after every data point) and actually modeled the time series (I grabbed some speech waveforms) and I always meant to go back and see what would happen with stock data.
    It was kind of a CPU hog and there was an art to training the networks, so I never pursued it.
    Anybody else ever use neural networks for this type of thing? (Hope this isn't too far off topic, so OP let me know and I'll move to another thread if necessary)
  10. I don't know but my linear regression model told me I would see a thread like this one day.
    #10     Nov 4, 2009