Genetic Programming / Symbolic Regression Model Experiments

Discussion in 'Strategy Building' started by userque, Apr 22, 2021.

  1. userque


  2. userque



    The last 17/18 points were validation points.

    Unseen data. Series 2 is the forecast:


    Last edited: Apr 22, 2021
    ph1l likes this.
  3. userque


    Did a run with 6000+ days of SPY data.

    Limited function to only use Sine and Cosine.

    Instead of only using the row number, I used the features derived from the date (month, day, etc.) ... as well as the row number.

    As expected, results weren't good enough in the short time I allotted for the run.

    Will try again later, but with much less data, around 200 rows or so. 90 rows of SPY wouldn't contain enough price action diversity, imo.

    After this test, I may one-hot encode some of the features, and try again.
  4. ph1l


    That sounded like a good idea to me.

    So, I tried some more experimenting with generated code by starting with a template of partial instructions and genetically optimizing the missing parts. Using the same input data in this post, I ran this 10 times where the only difference between any two runs was the sequence of pseudorandom numbers.

    The result of the fits and 14-bar future predictions for the 10 models is
    All 10 models follow the input prices (+ signs) about the same, and the predictions all have the similar shapes and seem ok with predicting short-term turning points for this input data.

    A sample generated model is
    y =
           0: R0 = 56.2733
           1: R1 = 0.106131 * x
           2: R0 = R0 + R1
           3: R1 = x * x
           4: R1 = 0.000229828 * R1
           5: R0 = R0 + R1
           6: R2 = 0.000227567 * x
           7: R1 = cosh (R2)
           8: R1 = R2 * R1
           9: R0 = R0 + R1
          10: R1 = 0.0315204 * x
          11: R1 = asinh (R1)
          12: R2 = -0.9609
          13: R2 = sign (R2)
          14: R1 = R2 * R1
          15: R0 = R0 + R1
          16: R1 = 0.371744 * x
          17: R1 = R1 + 0.602163
          18: R1 = cos (R1)
          19: R1 = 0.716786 * R1
          20: R0 = R0 + R1
          21: R1 = 0.102066 * x
          22: R1 = R1 + 0.0979074
          23: R1 = cos (R1)
          24: R1 = 0.825596 * R1
          25: R0 = R0 + R1
          26: R1 = 0.215541 * x
          27: R1 = R1 + 4.59353
          28: R1 = cos (R1)
          29: R1 = 1.10603 * R1
          30: R0 = R0 + R1
          return R0
    The code is similar to the other code in this post except it uses the input data (offset in bars from the starting point of the input data) as operands to instructions (x in the code) instead of using initialized values of registers.

    The fit for this model plus a parabolic, least-squares trend of the fitted curve is.

    The prices and fitted curve with a parabolic, least-squares trend of the fitted curve subtracted (shows cyclic turning points) are:
  5. userque


    That looks very promising! Especially how the models converged to correlated outputs!

    I have yet to run my new tests with 89 bars. The 200+ bar tests weren't promising, imo. I'll post when I get a chance to run the next one.
    ph1l likes this.
  6. userque


    If you don't mind telling, what's your reasoning behind using about 89 bar sliding window? Result of backtests?
  7. ph1l


    89 is a Fibonacci number. I'm not a believer that Fibonacci numbers are magic. Start a Fibonacci sequence with any two numbers with at least one != 0, and the ratio of successive numbers quickly converges to the golden ratio. For example,
    perl -e '
    use warnings;
    use strict;
    my $v1 = 28.745; my $v2 = -103.02;
    for (my $i = 3; $i < 70; ++$i)
        my $vn = $v1 + $v2;
        my $ratio = $vn / $v2;
        print "$i $ratio\n";
        $v1 = $v2; $v2 = $vn;
    ' | tail
    On my computer, the result is
    60 1.61803398874989
    61 1.61803398874989
    62 1.61803398874989
    63 1.61803398874989
    64 1.61803398874989
    65 1.61803398874989
    66 1.61803398874989
    67 1.61803398874989
    68 1.61803398874989
    69 1.61803398874989
    And that's math.

    Three lunar months == 88.59177 days, and rounding to the nearest day == 89 calendar days. The Moon obviously influences Earth, but does it really make much difference with all the computers doing trading? I don't think so.

    So, I was thinking asset prices can be modeled by adding a trend with cycles. About three months seems to be enough time to be able to detect the trend and cycles (even if some cycles are incomplete) without the trend and cycles changing too much. I chose calendar days because cycle literature sometimes implies calendar time is better for cyclic analysis. For example, "The Profit Magic of Stock Transaction Timing," by J.M. Hurst has:
    My goal is to be able to capture enough of market swings lasting a few days to about three weeks. 89 calendar days is at least four times these durations, so it might be a good enough amount of history.

    I haven't done any other tests yet with this latest method. I'll probably try something eventually.
    userque likes this.
  8. LM3886


    Thanks for the interesting study. However the prediction error still seems too large to be useful in the trading strategies that I can think of.
  9. userque


    If you are referring to me, I don't use this. I was/am curious about what @ph1l posted, and ran a couple of non-rigorous experiments myself.

    Sometimes, even if I don't think something is viable, my mind won't rest about it until I test it out anyway ... even if superficially.

    Nevertheless, unless I'm mistaken, I think @ph1l has something along these lines that works for him.?
  10. ph1l


    I'm currently testing variations of the genetically-optimizing missing parts of interpreted code generated from a template above in this thread. So far, the results for timing swing trades lasting a few days to a few weeks don't look as good as those from the similar trend plus sinusoids optimized genetically as in this post.
    #10     May 5, 2021
    userque likes this.