Do Neural Networks overfit less than Decision Trees?

userque · Mar 31, 2018

PredictorY said:
This is true when k = 1, but such models can be extremely noisy: When input values change slightly, the model can swing wildly.
More...

Hi PredictorY,

Yes, for this specific example--an example where there is no non-random relation between the input and the output.

(Regarding market data) What is noise? And how do you differentiate "noise" from error? On what basis do you measure "noise" such that you can compare it with the "noise" resulting from a different, 'non-noisy,' algorithm? Suppose there is no noise, and that what you've concluded to be noise was simply the result of external and random factors causing the forecasts to appear "noisy?"

I'm not sure what model you're looking at that shows such "wild" swings; but perhaps that model needs fixing; rather than conclude, so generally, that those results apply to all kNN models. How do you know that the forecasts should not swing "wildly" with a slight change in an input? Shouldn't a model (for example) of the function y=x^99 swing wildly with a slight change in 'x?'

PredictorY · Jun 1, 2018

userque said:
Hi PredictorY,

Yes, for this specific example--an example where there is no non-random relation between the input and the output.

(Regarding market data) What is noise? And how do you differentiate "noise" from error? On what basis do you measure "noise" such that you can compare it with the "noise" resulting from a different, 'non-noisy,' algorithm? Suppose there is no noise, and that what you've concluded to be noise was simply the result of external and random factors causing the forecasts to appear "noisy?"

I'm not sure what model you're looking at that shows such "wild" swings; but perhaps that model needs fixing; rather than conclude, so generally, that those results apply to all kNN models. How do you know that the forecasts should not swing "wildly" with a slight change in an input? Shouldn't a model (for example) of the function y=x^99 swing wildly with a slight change in 'x?'
More...

Sorry, I should have been more specific. I was trying to make the point that this assertion of yours:

"Suppose we have a new input of 2343144. The kNN will simply output 3452452. It won't try to figure out a formula to match all of the data, and then plug this new input into it."

...is strictly only true when k=1. With any other value of k, a k-nearest neighbor model must somehow combine the dependent variable values of the k select historic cases. In most cases, this combination is an averaging or a voting; Regardless, the generated prediction is typically not a value drawn directly from the training examples. Such smoothing (averaging, voting, etc.), although not an explicit "formula", as you say, is in fact an interpolation among the training cases.

PredictorY · Jun 1, 2018

userque said:
A NN will build a network designed to forecast lottery drawings, based upon past drawings. It will deem some past drawing inputs more relevant that others.

NN's will find all sorts 'relationships' in the training data, even if they are meaningless on unseen data.
More...

I would say rather that "A neural network can be badly trained to "predict" all sorts of random stuff." A competent analyst, however, would easily see that this is happening, via error resampling (holdout testing, k-fold cross validation, etc.). An important part of the job of the analyst is to rigorously validate the constructed models.

userque · Jun 2, 2018

PredictorY said:
Sorry, I should have been more specific. I was trying to make the point that this assertion of yours:

"Suppose we have a new input of 2343144. The kNN will simply output 3452452. It won't try to figure out a formula to match all of the data, and then plug this new input into it."

...is strictly only true when k=1. With any other value of k, a k-nearest neighbor model must somehow combine the dependent variable values of the k select historic cases. In most cases, this combination is an averaging or a voting; Regardless, the generated prediction is typically not a value drawn directly from the training examples. Such smoothing (averaging, voting, etc.), although not an explicit "formula", as you say, is in fact an interpolation among the training cases.
More...

Typically (not strictly) true only when k=1. Yes.

Additionally, the 'averaging' in a kNN more closely relates to the training data; whereas NN's can produce outputs far exceeding the bounds of the training data.

ElectricSavant · Jun 2, 2018

Pardon the interruption...but are there any advantages to "hive" trading and how would it compare to "neuro network trading? What is so magical about a hive-type model?
http://www.wave59.com/About/GeniusFeatures/HiveTechnology.aspx

userque · Jun 3, 2018

ElectricSavant said:
Pardon the interruption...but are there any advantages to "hive" trading and how would it compare to "neuro network trading? What is so magical about a hive-type model?
http://www.wave59.com/About/GeniusFeatures/HiveTechnology.aspx
More...

Here's what I think is going on:

Something similar to the link, but going forward rather than training on past data. (There is still a little training with recent (but still 'past') data, imo.)

https://en.wikipedia.org/wiki/Ensemble_learning#Bucket_of_models

I've briefly built and tested this type of setup before.

They run multiple systems going forward, and use some algorithm (They call it 'Hive') to determine which system to follow currently. A 'follow the recent leader' sort of thing. Past data is probably only used going back far enough to break ties.

IMO, it is still optimizing, but much much less so than a NN. Therefore, it is much much less likely to overfit.

However, I suspect that their custom adaptive indicators are actually optimized to a much greater degree. But since one or more of their parameters are left unset and included in 'the bucket,' overfitting is still much less than a traditional NN.

ElectricSavant · Jun 3, 2018

TY userque for taking a look at this.

ES

userque said:
Here's what I think is going on:

Something similar to the link, but going forward rather than training on past data. (There is still a little training with recent (but still 'past') data, imo.)

https://en.wikipedia.org/wiki/Ensemble_learning#Bucket_of_models

I've briefly built and tested this type of setup before.

They run multiple systems going forward, and use some algorithm (They call it 'Hive') to determine which system to follow currently. A 'follow the recent leader' sort of thing. Past data is probably only used going back far enough to break ties.

IMO, it is still optimizing, but much much less so than a NN. Therefore, it is much much less likely to overfit.

However, I suspect that their custom adaptive indicators are actually optimized to a much greater degree. But since one or more of their parameters are left unset and included in 'the bucket,' overfitting is still much less than a traditional NN.
More...