Vectorized backtesting with pandas

Discussion in 'App Development' started by nooby_mcnoob, Apr 30, 2019.

  1. jharmon

    jharmon

    Seems to be a big concern of yours given you want to backtest 10 years of data in 15 seconds - your first post.

    15 seconds per security is too long. 15 seconds for several thousand securities is difficult without some serious thought to parallel processing.

    Hard to figure out what you are really wanting.
     
    #21     May 1, 2019
  2. Well clearly what I did was good enough for me. Haven't needed to go beyond. Hard to know if I need to spell out everything for everyone
     
    #22     May 1, 2019
  3. jharmon

    jharmon

    Err OK - still not sure what your universe is.
     
    #23     May 1, 2019
  4. Why is there always someone litigating words on these threads
     
    #24     May 1, 2019
  5. jharmon

    jharmon

    Your blanket statement about backtesting taking a maximum of 15 seconds started it dude.

    Explain yourself - what are you trading? 1 stock/security? 3000? All listed stocks? A handful of futures contracts? Clearly if you are talking performance you need to give some further metrics on your universe.
     
    #25     May 1, 2019
    GRULSTMRNN likes this.
  6. I started this thread not to talk about what I need, but how I solved a technical problem that may be useful to others. It is a tool in your belt and not a holy grail. Why you keep going on your pet tangent is probably something to do with you.
     
    #26     May 1, 2019
    IAS_LLC likes this.
  7. Most of these answers are crap. Either use the multiprocessing package (groupby vectorization for multiple cores) or Dask.
     
    #27     May 5, 2019
  8. Irony of ironies I started doing this yesterday after claiming I didn't need it. Thanks ET for continually calling me out.
     
    #28     May 5, 2019
  9. May I recommend you to go back to the drawing board and rethink your entire premise. You want to vectorize a backtest. Vectorization only works when a process is not path dependent. So, you can vectorize a backtest over multiple symbols given that the performance of each symbol is independent of the performance of the other symbols and that there are no other dependencies. So, you iterate over your data set and in parallel feed data points to each individual backtest that consists of a symbol. You cannot vectorize the data feed itself because you then make the assumptions that performance tomorrow is independent of performance today, which is inherently wrong in financial trading. Imagine you enter into a position today, tomorrow your backtest needs to know that you are in position, else your entire backtest will be flawed.

    Long story short, you cannot vectorize a path-dependent process.

     
    #29     May 9, 2019
  10. what exactly are you vectorizing? As I mentioned in my above post, you can vectorize multiple independent backtests over identical data sets but you cannot vectorize an individual data set that is path dependent over one backtest. I stated the proof why that is the case in my above post.

     
    #30     May 9, 2019