Recommendations on time-series price prediction models?

Discussion in 'Automated Trading' started by jublin, Apr 5, 2021.

  1. userque

    userque

    upload_2021-4-21_22-26-39.png
    Ok, stopped it here. Here's what it came up with:
    Code:
    0.100239*(row-(1.87231*tan(cos(sqrt(5.69164+row)+0.832774)/0.662045)+atanh(cos(-250.997*row-0.00765016))+1.89037*atanh(cos(-0.2448*row))))+56.9427
    Max. complexity left at default setting: 60
    No validation.

    Next, I'll see what the forecast looks like, and compare it to actual data.
    I'm expecting horrible results.

    Later, I'll try it with validation; and then limiting it to whatever you use (a few sine or cosine waves?) with and without validation.
     
    #41     Apr 21, 2021
    ph1l likes this.
  2. userque

    userque

    Actual in blue this time, out of sample forecast in orange.

    upload_2021-4-21_22-42-17.png

    Code:
    CLOSE FORECAST ERROR
    66.68 66.48 0.20
    66.41 66.83 0.42
    66.53 66.51 0.02
    66.86 66.44 0.42
    66.99 66.40 0.59
    67.12 66.36 0.76
    67.25 66.15 1.10
    66.67 66.41 0.26
    67.12 66.53 0.59
    67.44 66.61 0.83
    68.00 66.68 1.32
    67.91 66.73 1.18
    67.83 66.76 1.07
    67.74 66.71 1.03
    TOTAL ERROR= 9.80
    
    Your fitted data only had 101 rows; the seen and unseen data totals 103, so I'm not sure how to line up your data, as it only included the close; so I left it out of the graph/chart.

    *** My Apologies to the OP! I've moved my discussion to:
    https://www.elitetrader.com/et/thre...symbolic-regression-model-experiments.357998/
     
    Last edited: Apr 22, 2021
    #42     Apr 21, 2021
  3. ph1l

    ph1l

    Thanks for posting this.

    I agree that genetic programming that creates models like these (mine too) probably won't extrapolate well. It's just too easy for the model to overfit. The parabolic trend plus the sum of a few sinusoids method is harder to overfit with and has some theory behind it (prices oscillate to form a channel around a trend).
     
    #43     Apr 22, 2021
    userque likes this.
  4. userque

    userque

    BTW, I'll be updating this at:
    https://www.elitetrader.com/et/thre...symbolic-regression-model-experiments.357998/

    Don't want to trample all over OP's thread. I just realized that the only input I can give it is the row number, not additional features like I planned. If I did give it features, then I wouldn't be able to forecast more than one bar ahead (with random access into any bar in the future), so I see why you did it that way now. Plus, it reduces the overfitting potential.

    Currently running the algo using the last 20% of the seen data to validate.
     
    #44     Apr 22, 2021
    ph1l likes this.
  5. jublin

    jublin

    #45     May 18, 2021
  6. jublin

    jublin

    Hey Phil, I tried to replicate the parabolic + cosines method you outlined. However I got stuck on the GA step and wonder if you can give me any help. I'm using python's geneticalgorithm package (https://pypi.org/project/geneticalgorithm/) and the results (after 1 hour) is much worse than the results you got. I attached a comparison.

    I thought that this is a plain nonconvex optimization problem and using default parameters and a commonly available package like geneticalgorithm would be able to crack it easily. After reading a few of your others posts, I realizes that you are probably using a hand tuned GA solver. I wonder what kind of tweaks or knowledge do you think is essential to add to the solver in order to fit it nicely?


     
    #46     May 18, 2021
  7. jublin

    jublin

    Yes I did, I was debating whether I should normalize it to normal distribution or something else.
    Eventually I did:

    Xmean = mean from input X
    X = (X - Xmean) / Xmean
    output Y = (Y - Xmean) / Xmean

    so everything is normalized to the mean of the input data. But I'm not sure if this is the best way to normalize it.
     
    #47     May 18, 2021
  8. userque

    userque

    I would standardize, rather than normalize, each input independently.
    If you decide to go with multiple outputs, I would also standardize those independently before processing; then inverse the standardization to obtain the proper forecasted values.
     
    Last edited: May 18, 2021
    #48     May 18, 2021
  9. userque

    userque

    BTW, to your original question, and since you use Python, I believe Sktime will forecast time series recursively.

     
    #49     May 18, 2021
  10. jublin

    jublin

    Got it, thanks, let me take a look.
     
    #50     May 18, 2021