Vectorized backtesting with pandas

Discussion in 'App Development' started by nooby_mcnoob, Apr 30, 2019.

  1. R1234

    R1234

    Gotta vectorize. I did an initial backtest in Python with lots of large dataframes using looping. That took almost an hour to complete.

    Then I vectorized everything and it ran in 4 minutes.
     
    #11     Apr 30, 2019
    nooby_mcnoob likes this.
  2. d08

    d08

    Depends how you loop. Sounds like you were using iterrows instead of itertuples or iteritems. iterrows requires typing of every item and that's ridiculously slow.
     
    #12     Apr 30, 2019
    quant1 likes this.
  3. I don't use iterrows but nice to learn about itertuples!
     
    #13     Apr 30, 2019
  4. Nice post on the performance differences with various options. Not sure why itertuples isn't there:

     
    #14     Apr 30, 2019
  5. fan27

    fan27

    I don't have anything to add that is python specific. I find optimizing code one of the great joys of programming. First, I never optimize unless I find my self saying..."This is slow. This is painful!" So first, there must be pain. Next, I will look for obvious performance bottle necks. If after that I am still feeling pain, I will start to question my original design, sometimes puzzling for days over the problem. The key is you have to be willing to scrap code already written. I once had to optimize some javascript (Node.js) code and I said to the product owner we need to rewrite the entire application in GoLang and it would take 9 months. Of course I was joking...he was not amused.
     
    #15     Apr 30, 2019
    d08 and nooby_mcnoob like this.
  6. Haha go and nodejs.
     
    #16     Apr 30, 2019
  7. quant1

    quant1

    See how you feel after backtesting a strategy on a day's worth of raw NASDAQ feed.

    OP, I use pandas extensively for backtesting and it certianly helps circumvent the shortcomings of Python. The process you outlined above is a sinple and effective workflow.
     
    #17     Apr 30, 2019
    nooby_mcnoob likes this.
  8. I've been thinking of keeping bid/ask data around for experimentation. Grows at the rate of 1G per week. Should do it...
     
    #18     Apr 30, 2019
  9. IAS_LLC

    IAS_LLC

    If speed is your biggest concern....checkout dask and/or pytorch. Vectorization on crack....if you have a Nvidia cuda enabled gpu.

    Dask distributed is also nice, If you want to split your work across multiple machines.
     
    #19     Apr 30, 2019
    nooby_mcnoob and d08 like this.
  10. Not so far... But good to know about it, thanks!
     
    #20     May 1, 2019