Do Neural Networks overfit less than Decision Trees?

Discussion in 'Strategy Building' started by damien.dualsigma, Oct 28, 2017.

  1. userque

    userque

    Hi PredictorY,

    Yes, for this specific example--an example where there is no non-random relation between the input and the output.

    (Regarding market data) What is noise? And how do you differentiate "noise" from error? On what basis do you measure "noise" such that you can compare it with the "noise" resulting from a different, 'non-noisy,' algorithm? Suppose there is no noise, and that what you've concluded to be noise was simply the result of external and random factors causing the forecasts to appear "noisy?"

    I'm not sure what model you're looking at that shows such "wild" swings; but perhaps that model needs fixing; rather than conclude, so generally, that those results apply to all kNN models. How do you know that the forecasts should not swing "wildly" with a slight change in an input? Shouldn't a model (for example) of the function y=x^99 swing wildly with a slight change in 'x?'
     
    Last edited: Mar 31, 2018
    #41     Mar 31, 2018
    damien.dualsigma likes this.
  2. Sorry, I should have been more specific. I was trying to make the point that this assertion of yours:

    "Suppose we have a new input of 2343144. The kNN will simply output 3452452. It won't try to figure out a formula to match all of the data, and then plug this new input into it."

    ...is strictly only true when k=1. With any other value of k, a k-nearest neighbor model must somehow combine the dependent variable values of the k select historic cases. In most cases, this combination is an averaging or a voting; Regardless, the generated prediction is typically not a value drawn directly from the training examples. Such smoothing (averaging, voting, etc.), although not an explicit "formula", as you say, is in fact an interpolation among the training cases.
     
    #42     Jun 1, 2018
  3. I would say rather that "A neural network can be badly trained to "predict" all sorts of random stuff." A competent analyst, however, would easily see that this is happening, via error resampling (holdout testing, k-fold cross validation, etc.). An important part of the job of the analyst is to rigorously validate the constructed models.
     
    #43     Jun 1, 2018
  4. userque

    userque

    Typically (not strictly) true only when k=1. Yes.

    Additionally, the 'averaging' in a kNN more closely relates to the training data; whereas NN's can produce outputs far exceeding the bounds of the training data.
     
    #44     Jun 2, 2018
  5. #45     Jun 2, 2018
  6. userque

    userque

    Here's what I think is going on:

    Something similar to the link, but going forward rather than training on past data. (There is still a little training with recent (but still 'past') data, imo.)

    https://en.wikipedia.org/wiki/Ensemble_learning#Bucket_of_models

    I've briefly built and tested this type of setup before.

    They run multiple systems going forward, and use some algorithm (They call it 'Hive') to determine which system to follow currently. A 'follow the recent leader' sort of thing. Past data is probably only used going back far enough to break ties.

    IMO, it is still optimizing, but much much less so than a NN. Therefore, it is much much less likely to overfit.

    However, I suspect that their custom adaptive indicators are actually optimized to a much greater degree. But since one or more of their parameters are left unset and included in 'the bucket,' overfitting is still much less than a traditional NN.
     
    Last edited: Jun 3, 2018
    #46     Jun 3, 2018
  7. TY userque for taking a look at this.

    ES

     
    #47     Jun 3, 2018