In the simplest case, you will predict the S&P 500 based on the history of the S&P 500 (close/volume). I am very interested in exploring the combinations of different data predictors (either user selected or automatically searched). Genetic programming essentially automatically build a program/expression/formula to model/predict some problem. Very opened ended in its applicability.
Also, I try to summarize GP in a blog post beginning here https://connect.gpsignals.com/2017/02/what-is-genetic-programming/ Thanks Dave
Interesting you mention close + volume. What volume do you use? Consolidated tape or primary-listing venue? Do you incorporate pre and post market volume? Do you reduce volume for cancelled trades? Understanding the source and validity of your data is VERY important.
The only volume data point I currently make available is part of https://www.quandl.com/data/YAHOO/INDEX_GSPC-S-P-500-Index All of the data comes from quandl at the moment (references to the data source are included) I have not included individual listings as of yet.
Yes, but do you have any idea of the volume that they use? You are using garbage data for volume which will change once you get a data source you understand. Just quoting the source as Quandl doesn't make it good. Let me give you a hint. Volume on Quandl for S&P 500 for 2017-03-17 shows 5,178,040,000. The actual volume of S&P 500 was nowhere near this! In case you can't work it out, Yahoo's S&P 500 volume is actually the total volume of the NYSE-listed shares (not including NYSE Mkt/NYSE Arca). The volume data has nothing to do with S&P 500! It's complete and utter garbage to publish a volume figure against an index that has nothing to do with the index! Quandl, Yahoo and CSI should be ashamed.
Thanks, I will look into this further. If it is a reasonable proxy, it could be a valid predictor in any case.
Of course it's correlated. But it's still wrong. Some would argue that volume on an index is garbage anyway, since it equally weights all stocks, no matter their price or weighting in the index, and it rises disproportionately for trades in low priced stocks. e.g. consider the effect of AAPLs 7:1 stock split in 2014. This would have caused a significant increase in the volume for no real change.