Ok, stopped it here. Here's what it came up with: Code: 0.100239*(row-(1.87231*tan(cos(sqrt(5.69164+row)+0.832774)/0.662045)+atanh(cos(-250.997*row-0.00765016))+1.89037*atanh(cos(-0.2448*row))))+56.9427 Max. complexity left at default setting: 60 No validation. Next, I'll see what the forecast looks like, and compare it to actual data. I'm expecting horrible results. Later, I'll try it with validation; and then limiting it to whatever you use (a few sine or cosine waves?) with and without validation.
Actual in blue this time, out of sample forecast in orange. Code: CLOSE FORECAST ERROR 66.68 66.48 0.20 66.41 66.83 0.42 66.53 66.51 0.02 66.86 66.44 0.42 66.99 66.40 0.59 67.12 66.36 0.76 67.25 66.15 1.10 66.67 66.41 0.26 67.12 66.53 0.59 67.44 66.61 0.83 68.00 66.68 1.32 67.91 66.73 1.18 67.83 66.76 1.07 67.74 66.71 1.03 TOTAL ERROR= 9.80 Your fitted data only had 101 rows; the seen and unseen data totals 103, so I'm not sure how to line up your data, as it only included the close; so I left it out of the graph/chart. *** My Apologies to the OP! I've moved my discussion to: https://www.elitetrader.com/et/thre...symbolic-regression-model-experiments.357998/
Thanks for posting this. I agree that genetic programming that creates models like these (mine too) probably won't extrapolate well. It's just too easy for the model to overfit. The parabolic trend plus the sum of a few sinusoids method is harder to overfit with and has some theory behind it (prices oscillate to form a channel around a trend).
BTW, I'll be updating this at: https://www.elitetrader.com/et/thre...symbolic-regression-model-experiments.357998/ Don't want to trample all over OP's thread. I just realized that the only input I can give it is the row number, not additional features like I planned. If I did give it features, then I wouldn't be able to forecast more than one bar ahead (with random access into any bar in the future), so I see why you did it that way now. Plus, it reduces the overfitting potential. Currently running the algo using the last 20% of the seen data to validate.
Hey Phil, I tried to replicate the parabolic + cosines method you outlined. However I got stuck on the GA step and wonder if you can give me any help. I'm using python's geneticalgorithm package (https://pypi.org/project/geneticalgorithm/) and the results (after 1 hour) is much worse than the results you got. I attached a comparison. I thought that this is a plain nonconvex optimization problem and using default parameters and a commonly available package like geneticalgorithm would be able to crack it easily. After reading a few of your others posts, I realizes that you are probably using a hand tuned GA solver. I wonder what kind of tweaks or knowledge do you think is essential to add to the solver in order to fit it nicely?
Yes I did, I was debating whether I should normalize it to normal distribution or something else. Eventually I did: Xmean = mean from input X X = (X - Xmean) / Xmean output Y = (Y - Xmean) / Xmean so everything is normalized to the mean of the input data. But I'm not sure if this is the best way to normalize it.
I would standardize, rather than normalize, each input independently. If you decide to go with multiple outputs, I would also standardize those independently before processing; then inverse the standardization to obtain the proper forecasted values.
BTW, to your original question, and since you use Python, I believe Sktime will forecast time series recursively.