Do Neural Networks overfit less than Decision Trees?

damien.dualsigma · Oct 28, 2017

If provided with good training data, and not overtrained, are they capable of “muting” features that are not statistically relevant? In this way they would stand a better chance once tried out of sample. Whereas non-statistically relevant features mess up the statistical basis of decision trees

ET180 · Oct 28, 2017

A neural network is a function approximator. Given enough complexity (layers, hidden nodes) a neural network can fit any function. Technically, a decision tree can too if allowed a possibly infinite number of decision nodes, but it's much harder. So therefore, it's easier to curve fit a neural network than a decision tree. If you have enough data and a good cross-validation scheme, you should be able to eliminate most of the noise from the learning algorithm.

damien.dualsigma · Oct 28, 2017

What about the overfitting risk though?

sle · Oct 28, 2017

damien.dualsigma said:
What about the overfitting risk though?
More...

Are using a neural network to solve for your factor weights or do you use to find the actual alpha factors? In the former case, the overfit risk is minimal, while in the latter it’s pretty high (and will require you to iterate through in/out-of sample to find anything stable)

damien.dualsigma · Oct 28, 2017

thanks. I am referring to actual alpha factors. Indeed, high risk, higher than with Decision trees / SVM / KNN ? and is there a difference between regression / classification wrt overfitting risk?

sle · Oct 28, 2017

damien.dualsigma said:
thanks. I am referring to actual alpha factors. Indeed, high risk, higher than with Decision trees / SVM / KNN ? and is there a difference between regression / classification wrt overfitting risk?
More...

It's a tricky topic. Personally, I don't touch almost any ML-derived "strategies" with a vaulters pole simply because I don't like trading without a strong prior hypothesis (I have looked at some NLP-based stuff, but that does not really count). My sense would be that because of the non-linear nature of NN-learning you should have nigher overfit risk, but I don't have anything to base that on.

Kevin Schmit · Oct 29, 2017

In theory, no. Neural Nets and Decision Trees are equally capable of over-fitting. However, in practice, most popular decision tree packages (e.g. xgboost) have a good deal of regularization/shrinkage built in by default. Popular NN packages do not (at least not by default).

Therefore in actual practice you're significantly less likely to overfit with decision trees.

gon · Jan 7, 2018

You can use for instance dropout layers within your neural network to avoid overfitting.

However, if you have sufficient data, you should be able to take a random subset of it and avoid overfitting.

I think that for time series, if you train your system with 1000 past days applying dropout to that subset (let's say a factor of 0.8) and then you test the model with 2000 future days, your model's total accuracy over all those next 2000 days won't be overfitted.

https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/dropout_layer.html

damien.dualsigma · Jan 7, 2018

Thank you! Good , helpful reply.
Any chance this can be implemented in Matlab?
Or is Python more advanced and required for this level of detail?

gon · Jan 7, 2018

I don't use matlab nor know how it works. I have used R and Python mainly for machine learning. I don't know, but why NNs, are you using convolutional networks to analyze charts or something?

Random Forest are much easier to configure and perform very well, why do not you give a try? And if you get a good model with it, then you can try to move on with NNs that are a pain in the ass if you are not used to them.

Random forests use bagging also to prevent overfitting and you should always split your data into train and test subsets. Even better, train, test and a thrid set of data to audit your model after you have decided that it works fine with your train and test data.

The algorithms are generic so it is not relevant the language you use as long as your data fits within one computer and you do not need parallel processing across a data lake or something like that.

A last word.

Knowing how to configure a model hyperparameters is only useful to start walking with it. But imo, you need a minimum rehearsal in statistical methodology before trying to go with machine learning.

Check the table of contents of this book and think if you know all that or are willing to learn it before going into business with statistics:

https://www.crcpress.com/Introducti...ition/Watt-McCleery-Hart/p/book/9781584886525