Recommendations on time-series price prediction models?

Discussion in 'Automated Trading' started by jublin, Apr 5, 2021.

  1. jublin

    jublin

    Thanks for the info! Yeah the training I did already include volume information. But the model trained are probably not as sophisticated in characterizing volatility as semi-manually designed models.

    Also sniffing out larger orders is very interesting. In manual trading, I also found it to be a pretty useful (dark pool, etc.). Some trading platform like webull provides inferred capital flows categorized into buckets of large, medium and small orders (but seems not very useful.) A search online shows some articles on methods to sniff out large orders, like this one: https://exegy-signum.com/insights/hiding-and-seeking-with-iceberg-orders/, but requires level 2 or 3 data which I don't have.
     
    #21     Apr 15, 2021
  2. ph1l

    ph1l

    As I mentioned in this post, The Profit Magic of Stock Transaction Timing by J.M. Hurst covers the concept. "Technical Analysis of the Financial Markets," by John J. Murphy covers this in the "Time Cycles" chapter.

    "Decoding The Hidden Market Rhythm - Part 1: Dynamic Cycles," by Lars von Thienen covers a similar concept with detrended sums of sinusoids projecting turning points.



    The cycles in the sinusoids and the trend of asset prices continually change, so they need to be recalculated (e.g., for each new bar).

    I'd think an LSTM as you proposed in the first post of this thread could be used to do something similar.
     
    #22     Apr 15, 2021
    jublin likes this.
  3. userque

    userque

    You can use multiple outputs, instead of just one.

    You can train a model to forecast multiple bars ahead, non-recursively.

    The loss function would take into account the multiple bars ahead, rather than just one.

    You can "take into account" more than one bar ahead, without being recursive.
     
    #23     Apr 16, 2021
  4. jublin

    jublin


    Got it, thanks!
     
    #24     Apr 16, 2021
  5. jublin

    jublin

    Yeah, I forgot to mention I tried that, like using 5 days of data, instead of 1, in the training. The results are bad. I probably can tune the network to improve the results but I don't think it's a good investment of time at this point. I was looking for some online article talking about some widely accepted starting point for predicting multiple days ahead using deep learning. Can't find any. Everything I found only predict one day ahead. Anyone can conceive such a network, but I'm looking for an article showing an example where it actually works.
     
    #25     Apr 16, 2021
  6. userque

    userque

    Did you standardize/normalize the multiple outputs ... as well as the inputs?
     
    #26     Apr 16, 2021
  7. ph1l

    ph1l

    This might be one.
    https://analyticsindiamag.com/hands...t-neural-network-for-stock-market-prediction/
    upload_2021-4-16_21-41-53.png

    I am illiterate in Python, so I can't tell how long "a few days" is. This line
    Code:
    X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))
    makes me think few == 1.
     
    #27     Apr 16, 2021
  8. ph1l

    ph1l

    For the same data I used in this post, I tried to fit a curve using genetic programming.

    The generated, fitted function for the close prices is
    Code:
    y =
           0: R4 = R2 * cos (-81.5485)
           1: R0 = 50.7399 - R4
           2: R4 = R4 * cos (0.199104)
           3: R3 = sqrt (R4)
           4: R4 = 41.8048 * cos (R3)
           5: R4 = 4.3255 / R4
           6: R4 = atan2 (R4 / R0)
           7: R0 = R0 * cos (87.6408)
           8: R0 = R4 + R0
           9: R4 = R4 * sin (-4.13973)
          10: R0 = R4 + R0
          11: R2 = -44.0889 * sinh (R0)
          12: R4 = abs (R0)
          13: R3 = log (R4)
          14: R4 = atan2 (R2 / 70.0589)
          15: R0 = R4 + R0
          16: R2 = asinh (R0)
          17: R1 = 13.707 * sin (R3)
          18: R1 = R1 * sin (-37.162)
          19: R4 = 1.39025 * sin (R4)
          20: R1 = R1 * cos (68.1254)
          21: R3 = tanh (R1 * R3 + R4)
          22: R2 = R2 - R3
          23: R4 = asin (R1)
          24: R1 = sigmoid (-12.9007 * R1 + R4)
          25: R1 = R1 * cosh (R3)
          26: R2 = R1 + R2
          27: R2 = R2 * sin (1.16991)
          28: R2 = R2 * sin (-79.8879)
          29: R0 = 62.65 - R2
          return R0
    
    As before, the only input value is time represented as the offset in calendar days from the start of the data.
    R0, R1, etc. are registers which are initialized to the input value and get operated on by mathematical functions.

    The fit to the data is comparable in closeness to the fit before, but the predicted future prices are very different.
    upload_2021-4-19_20-18-15.png
    The overall predicted direction for the fit before and this fit are both still up.

    The prices and fitted curve with a parabolic, least squares trend of the fitted curve subtracted are:
    upload_2021-4-19_20-18-39.png
    Here, the detrended, fitted curve is pointing downward which is the opposite of the detrended, fitted curve from before.
     
    Last edited: Apr 19, 2021
    #28     Apr 19, 2021
    userque likes this.
  9. userque

    userque

    Thanks.

    Looks like you're using a Python Library for the genetic programming?

    I don't see where your function takes in any past closing prices as inputs, nor any other 'x' values as inputs; only constants.?! What is 'y' a function of ?????? The R variables are simply a way to simplify the notation for a long function.

    I have a stand along genetic programming app. I'm tempted to run your data and see what pops out.

    Did you hold out any data for validation? Doesn't seem likely with so little data.
     
    #29     Apr 19, 2021
  10. ph1l

    ph1l

    I wrote the genetic programming part with C++ and opencl. The calculations for the function are done in opencl with single-precision floating point arithmetic. The controlling part is perl and shell (bash). The images are from gnuplot.

    The only input to the function is time in the form of number of bars relative to the start of the data (0 through 88 calendar days for the example's data that was fitted). This allows the function to be applied for any time.

    The attached inputData.csv has the input data with comma-separated format
    <TICKER>,<DTYYYYMMDD>,<TIME>,<OPEN>,<HIGH>,<LOW>,<CLOSE>,<VOLUME>,<UNADJCLOSE>,<UNADJVOLUME>
    Calendar days in the data when U.S. stock markets were closed are linearly-interpolated from the previous trading close price.
    The candlestick chart has the <OPEN>,<HIGH>,<LOW>,<CLOSE> columns.
    The function is fitted on the <CLOSE> column only.
    The fitted data and parabolic, least squares trend of the fitted data past the candlesticks is the predicted data (12 bars).

    The raw, fitted data including the extra 12 predicted bars is in the attached fitted.txt. This data looks like it has more precision than the actual data because perl converts the single-precision floating point to double-precision.

    The actual future data is in the attached unseendata.csv. Since this is recent data for an ETF, there isn't too much of it. This data wasn't used in any calculations or measurements.
     
    #30     Apr 20, 2021
    userque likes this.