How to calculate realtime stats?

Discussion in 'Automated Trading' started by xappppp, Jul 24, 2021.

  1. xappppp

    xappppp

    for example, I want the average price change or standard deviation in the past 5 minutes ? and updated based on the real time ticker feeds? what would be the way to design the data pipeline (not sure if this is even the right term)
     
  2. Excel. You can use iex data or pay for something integrated. Some software will also let you calculate it in app.
     
  3. xappppp

    xappppp

    sorry, I'm using python + sqlite, will adjust my question
     
  4. kmiklas

    kmiklas

    Spin up a thread to fill a circular FIFO queue with data, and another to read the data and do the math (calculate mean median mode sd etc) every t time units, find the delta and determine if its good enough for a buy/sell signal.

    That being said, in my opinion and having been down this path in 2016, you're barking up the wrong tree. The hedge funds have got Ph.D. Physicists, Mathematicians, and Computer Scientists doing same, but they have the fastest MD news and systems on earth crunching these numbers. They'll eat up that alpha before it even reaches your network card.

    Unless you think Citadel hasn't thought of this, you must be more clever. Not a lot of room for the retail investor in today's market. It's like playing chess against a computer on highest level.
     
    Last edited: Jul 24, 2021
    VicBee and KCalhoun like this.
  5. Girija

    Girija

    Search Google/ git for SD indicator code that's probably what you are looking for or you can take the code and change.
     
    xappppp likes this.
  6. xappppp

    xappppp

    Thanks a lot, I understand the opportunity there is pretty small, but more for my own learning purpose. Maybe its not bad idea to go throught what has been done by people 20 years ago to undetstand more.



     
  7. xappppp

    xappppp

    Or to be more specific, I want to set up a experiment to test that the alpha is there but just cannot be captured in time, I assume this can be proven even with delayed data.

     
  8. kmiklas

    kmiklas

    Can you imagine and define a simple statement of the indicator that you think is there?

    Such as, "Over 1000 trials, in cases where the volume-weighted average price from t0 to t1 increased by 0.5%, 57% of the time, continued positively from t1 to t2 by at least 0.1%. This percentage is beyond the first standard deviation, and statistically significant; so, if an equity is found to increase at least 0.5% in time t, it makes sense to take a position"

    I was once offered a job in matrix calculation where an institutional investment house was modeling the entire market; at the time, over 23,000,000 variables across all markets worldwide. It took all their computing power 3 minutes to solve with the absolute fastest linear algebraic theory available at the time.
     
    Last edited: Jul 24, 2021
  9. xappppp

    xappppp

    I'm just trying to find a way to calculate the sd, mean etc. of SPY between t0 and t1 where it need to be varying from 5m, 1hr, or 10 days. As t1 is defined as "now".

    I want to use these kind of indicators (t0-t1) to predict price change at next time interval ( say t2) based on simple linear regression, which might very well be just pure noise around zero, but that confidence band of the prediction interval might also be very useful by itself.



     
  10. kmiklas

    kmiklas

    Note that you probably don't want your t0 and t1 to be a single price quote or you risk running your maths on an outlier.

    I did it by first storing 15 minutes of price data in a circular FIFO. Then I'd compute my t0 as an average of n minutes, and my t1 as 15-n minutes. Ex: t0 as an average of 10 mins of prices; t1 as 5 mins of prices.

    I was getting price data from IB, which was a VWAP of 250 Nasdaq price points at 1/ms (nasdaq runs at 1000 ticks/s). Four IB quotes/second * 60 seconds * 15 mins = 3600 price points in memory, per instrument tracked. FIFO Array: 0 to 3599 XD

    I then computed the slope and some other statistical stuff. I recomputed every 10 seconds. If the stars aligned, wrote code to automatically take a position with a limit sell and stop.

    The data collection is entirely separate from the statistical calculations. Nowadays they have stream handlers to make this easier; I had to write a lot myself in C++. Today I'd use Python or Kotlin, unless I had a screaming fast data feed (which is effing expensive).

    I did a ton of work on this stuff.
     
    Last edited: Jul 24, 2021
    #10     Jul 24, 2021
    xappppp likes this.