Do Neural Networks overfit less than Decision Trees?

fordewind · Jan 8, 2018

Alice in Wonderland tree is the only thing that wouldnt overfit

userque · Jan 8, 2018

gon said:
Iof assigning a window of price-related features, the paper shows how to create a set of planes like if they were a mesh. This way they model is able to learn also the visual side of the trading, not only a sequence of numbers, but also their position and distribution across a 2-dimensional grid.

The idea is to provide the same visual tools that the traders use to find patterns and geometric factors to take decisions.

I have not (yet) tried that approach, but I find it interesting.
More...

Ok. By patterns, I didn't just mean visual patterns. But I understand.

But again, I'm not a fan of the NN architecture. It forces formulas on data that is not necessarily rooted in formulas.

For example:

input....output
1 ........ 1
2 ........ 4
3 ........ 9

A formula can be found, and it would fit.

But if the inputs are essentially random numbers and the outputs also seem to be random:

2343143 ........ 3452452
456745674 ........ 6789789
2342 ....9869
2343143 ........ 3452452
456745674 ........ 6789789
2342 ....9869
2343143 ........ 3452452
456745674 ........ 6789789
2342 ....9869

There is no underlying formula ... but there is an obvious pattern. A NN will generate a formula even for lottery numbers. A NN will complicate the latter example by generating a network/formula that will 'fit' the data.

userque · Jan 8, 2018

gon said:
Iof assigning a window of price-related features, the paper shows how to create a set of planes like if they were a mesh. This way they model is able to learn also the visual side of the trading, not only a sequence of numbers, but also their position and distribution across a 2-dimensional grid.

The idea is to provide the same visual tools that the traders use to find patterns and geometric factors to take decisions.

I have not (yet) tried that approach, but I find it interesting.
More...

So, what is the author of that paper doing today?

https://www.google.com/search?q=Gen...er&aqs=chrome..69i57&sourceid=chrome&ie=UTF-8

"Those who can, do. Those who can't, teach."

gon · Jan 8, 2018

I see that you mean.

More than a formula what a NN will do is to weigth the coefficients to adjust the triggering of the activation functions to minimize the loss function of the prediction. Isn't that similar to finding KNNs based upon weighted euclidean distances?

I mean, the number of neurons within each layer is fixed and the activation fuctions are also preconfigured.

Regards.

gon · Jan 8, 2018

userque said:
So, what is the author of that paper doing today?

https://www.google.com/search?q=Gen...er&aqs=chrome..69i57&sourceid=chrome&ie=UTF-8

"Those who can, do. Those who can't, teach."
More...

I have his book

userque · Jan 8, 2018

gon said:
Isn't that similar to finding KNNs based upon weighted euclidean distances?
More...

Not at all.

Using the same data:

2343143 ........ 3452452
456745674 ........ 6789789
2342 ....9869
2343143 ........ 3452452
456745674 ........ 6789789
2342 ....9869
2343143 ........ 3452452
456745674 ........ 6789789
2342 ....9869

Suppose we have a new input of 2343144. The kNN will simply output 3452452. It won't try to figure out a formula to match all of the data, and then plug this new input into it.

And whatever the correct output turns out to be, it would then be assimilated into the ever growing data set.

These are simple examples. It gets more complicated and in my custom implementation, proprietary.

ironchef · Feb 1, 2018

userque said:
A NN will complicate the latter example by generating a network/formula that will 'fit' the data.
More...

But isn't that what us dumb humans are doing predicting short term PA with our "brain power"?

userque · Feb 1, 2018

ironchef said:
But isn't that what us dumb humans are doing predicting short term PA with our "brain power"?
More...

After tutoring HS kids in Math and Science during summer breaks; and after debating/explaining things in forums like this one; and after reading 1,000's of posts in the same forums ... I am absolutely clueless as to wth is going on in some human brains.

PredictorY · Mar 29, 2018

damien.dualsigma said:
If provided with good training data, and not overtrained, are they capable of “muting” features that are not statistically relevant? In this way they would stand a better chance once tried out of sample. Whereas non-statistically relevant features mess up the statistical basis of decision trees
More...

Broadly, neither technique is more likely to overfit than the other. Overfitting (and underfitting, for that matter) may be avoided in any type of empirical model by appropriately establishing model complexity. This is usually tested via error resampling (holdout testing, cross-validation or bootstrapping, for instance). In the case of feedforward neural networks, complexity is typically controlled through early stopping of training or varying the number of hidden nodes. Growth of decision trees is most often constrained via pruning, small sample adjustment of test probabilities or requiring a minimum number of samples for splitting.

PredictorY · Mar 29, 2018

userque said:
Not at all.

Using the same data:

2343143 ........ 3452452
456745674 ........ 6789789
2342 ....9869
2343143 ........ 3452452
456745674 ........ 6789789
2342 ....9869
2343143 ........ 3452452
456745674 ........ 6789789
2342 ....9869

Suppose we have a new input of 2343144. The kNN will simply output 3452452. It won't try to figure out a formula to match all of the data, and then plug this new input into it.

And whatever the correct output turns out to be, it would then be assimilated into the ever growing data set.

These are simple examples. It gets more complicated and in my custom implementation, proprietary.
More...

This is true when k = 1, but such models can be extremely noisy: When input values change slightly, the model can swing wildly.