Hi, In an effort to pick better trades, I setup an optimization for the estimated profit for each trade. Over a few thousand trades in my backtest, the correlation between model profit vs actual profit is 0.18, applying the model forward in time, the correlation is 0.14. So the good thing is that my model is reasonably predictive of the future. Though the magnitude of the correlation value seems awfully low. Has anyone else done similar analysis? Are those correlation values reasonably good considering how chaotic/unpredictable the stock market is? Thanks very much! Kubilai

I experimented with some simple neural network models a few years back, using data series ranging from a few hundred to a few thousand observations. I computed the correlation between the out of sample predicted returns with the actual returns of the time series just like you. My recollection was that I was getting correlations in the high teens as well. At the time I determined that those levels of correlations were worthless. When I'd try to build systems using the model, they never achieved significant returns in the simulations. A while later, I read a general rule of thumb for determining if a correlation was "useful" in Niederhoffer's Practical Speculation book. He wrote that when looking at correlations in markets, he multiplies the correlation coefficient by the number of observations and that if the number is greater than 10, he considers the correlation to be "useful". So I may look again at those models some day....and see if I can get anything out of them, since according to that rule of thumb my correlation could be "useful"...whatever that means.

Hmm, that's my feeling so far too that my trade picking model isn't adding significantly to the system. I looked up Niederhoffer's rule-of-thumb too. Though it's surprising the formula isn't correlation * sqrt (sample size) similar to the formula for the standard error.

Kubilai, You said: --------------------------------------- In an effort to pick better trades, I setup an optimization for the estimated profit for each trade. Over a few thousand trades in my backtest, the correlation between model profit vs actual profit is 0.18, applying the model forward in time, the correlation is 0.14. So the good thing is that my model is reasonably predictive of the future. Though the magnitude of the correlation value seems awfully low. Has anyone else done similar analysis? Are those correlation values reasonably good considering how chaotic/unpredictable the stock market is? ------------------------------------------- In my experience those values are fairly low to trade by themselves. However there are some things you can do. First of all, what predictive methodologies are you using to generate the models? There are many: neural networks, decision trees, genetic algorithms, Bayesian, MARS, CART, Agent Swarms and so on. Second, what kind of application are you using? My best experience is with general purpose predictive modeling applications that are typically used in the corporate world to model everything from direct mail, to process control an fraud detection. Things like S-Plus, JMP, SPSS give much better results than those packages cobbled tighter by tiny companies and markeedt to traders for trading system development. For these companies their focus is on cool marketing as opposed to real ability to create a working predictive model. They know most traders who enter they markets donât successfully make money in the long term, so there isnât much cost to them for failureâ¦..most people donât succeed in the long term be they chartists or modelers. Not much potential for bad press and there is always a stream of new prospects. In contrast if a general purpose modeling application has bugs or poor results in monitoring a production process for say an oil refinery your can have a real bad result which will percolate through the industry and impact the software vendor. Have you tried ensemble modeling, which is creating multiple modes and combining them, either directly or by voting? I typically get correlation about twice your for the stock indexes using this approach. Jerry

Hi Jerry, Thanks for chiming in, you have a lot more experience than I do here. My idea is pretty simple. Given a set of trades, I try to predict the profit for each trade using a set of input values such as market cap of the stock, liquidity, etc. The predicted profit is a linear combination of the log of the inputs. And my objective function is the least squares error between the predicted and actual profit for the trades. All the code is written in python and I use the SciPy package to do the math. The general purpose modeling packages you mention of, is there one that can be aquired fairly cheaply? Most of what you mentioned are pretty far over my head. What do you suggest that I learn for the next step? Thanks!

kubilai Hi Jerry, Thanks for chiming in, you have a lot more experience than I do here. My idea is pretty simple. Given a set of trades, I try to predict the profit for each trade using a set of input values such as market cap of the stock, liquidity, etc. The predicted profit is a linear combination of the log of the inputs. And my objective function is the least squares error between the predicted and actual profit for the trades. ---------- What you are predicting is fairly difficult in that it requires predicting the results of the entire trade strategy: entry, stop, exit. My approach is just to predict the change in market price between the current bar and some number of bars into the future. I then add a trade strategy after the predictive model has made its forecast. So my R2 is the change in price in relation to the actual. I suspect you'd get better results using a non-linear approach such as a neural network. Markets have strong non-linear characteristics. -------------- All the code is written in python and I use the SciPy package to do the math. The general purpose modeling packages you mention of, is there one that can be aquired fairly cheaply? ------ Yes, RapidMIner is pretty good and it's free. (Open Source) ------ Most of what you mentioned are pretty far over my head. What do you suggest that I learn for the next step? ------ Get a good text book on data mining. Get something like RapidMiner. Functionally decompose your current approach into specific functions: entry signals, exit signals, time periods NOT to trade and explore modeling them in turn. Then build a trading system out of these sub-solutions/models. You may find your current trading vastly improved by creating an improved method through modeling to one of these sub-functions to enhance your current approach. Also be encouraged, a sophisticated model based system can result in some pretty astounding profits compared to what can be gotten from traditional methods like charting or indicator trading. Jerry

kubilai, With a really good set of models a return on total account in the neighborhood of 200 to 300% annually is being done in futures trading. Jerry

Thanks for the encouragement, Jerry. I can certainly see such returns to be possible, though the capacity for such systems would be fairly low, correct? Strong market inefficiencies must be small/niche/limited in scope... Data mining has a bad name among traders. So you must have some pre-conditions for applying these techniques successfully to the trading world. The book I'm focusing on these days: Design, Testing and Optimization of Trading Systems, suggests that two conditions must be in place to ensure successful optimization: logical basis to the strategy, and out-of-sample testing. Do you apply these? Is anything else needed to ensure the resulting model is predictive of the future? Do you use data mining only for optimization or is it useful for coming up with the initial strategy too? Cheers, Kubilai