Vectorized backtesting with pandas

IAS_LLC · May 9, 2019

You CAN vectorize a back test, to a certain extent. Generate your signals in serial, and then vectorize the fill modeling and do path path length dependency correction.

I do it all the time, and it saves me a ton of time. And no, I don't care to elaborate further..

GRULSTMRNN · May 9, 2019

Then what's the thread or the entire site good for? It's not like anyone is asking you for your "trade secrets". It's impossible to image what you might mean with

*generate signal in serial
* vectorize the fill modeling
* path path length dependency correction

But it surely sounds cute. A simple example would make it crystal clear.

I am by the way not asking for myself, I run backrests in a pure even driven architecture with tens of millions of tick based data points per symbol and it performs perfectly fine. I am not interested in a vectorize approach because it is technically impossible to do so because I use a portfolio backtest approach where even multiple strategies are dependent on each other.

IAS_LLC said:
You CAN vectorize a back test, to a certain extent. Generate your signals in serial, and then vectorize the fill modeling and do path path length dependency correction.

I do it all the time, and it saves me a ton of time. And no, I don't care to elaborate further..
More...

nooby_mcnoob · May 9, 2019

Hey @GRULSTMRNN tell us more about this event based backtester. What does it mean to "perform perfectly fine" and do you use it to hypothesize or do full tests? To make it clear, the original post was meant to help me hypothesize and not full backtests.

IAS_LLC · May 9, 2019

GRULSTMRNN said:
Then what's the thread or the entire site good for? It's not like anyone is asking you for your "trade secrets". It's impossible to image what you might mean with

*generate signal in serial
* vectorize the fill modeling
* path path length dependency correction

But it surely sounds cute. A simple example would make it crystal clear.

I am by the way not asking for myself, I run backrests in a pure even driven architecture with tens of millions of tick based data points per symbol and it performs perfectly fine. I am not interested in a vectorize approach because it is technically impossible to do so because I use a portfolio backtest approach where even multiple strategies are dependent on each other.
More...

I'm not going to recreate what's already in open source literature (for free) because you lack the imagination to think about ways of doing things differently than your current methods. See one of the first few chapters of the Marcos de Prado Lopez book for some inspiration on the subject.

If you'd like, I can provide a minimal example with full commented source code. Just Google-Pay me $1489.95 first

GRULSTMRNN · May 10, 2019

Translation: you are talking out of your ass and are just too damn proud to admit. Thanks for the confirmation. Just sad there are so many posers and liars on this site.

IAS_LLC said:
I'm not going to recreate what's already in open source literature (for free) because you lack the imagination to think about ways of doing things differently than your current methods. See one of the first few chapters of the Marcos de Prado Lopez book for some inspiration on the subject.

If you'd like, I can provide a minimal example with full commented source code. Just Google-Pay me $1489.95 first
More...

GRULSTMRNN · May 10, 2019

For the benefit of others here is a link to someone who implemented a vectorized backtest in Python. In the following I outline all the things that went wrong with this approach:

https://tim-zhang.com/2016/06/12/algo-series-2-vectorized-backtesting-module/

The astude reader will notice that the backtest results are completely unrealistic. I am not talking about a few percent or dozens of percent deviation from realistic performance metrics but completely and utterly wrong results: one year of 1-minute data, 95,000+ trades implying on average a trade every 4 minutes with double digit Sharpe ratios. Anyone who can think a little further immediately realizes the origin of the problem. The assumption in any vectorized backtest is made that there are no path dependencies, which in simple English means that what happend yesterday or a minute ago has zero bearing on decision-making today. In this particular example the algorithm trades at each point in time without any knowledge whether a trade has been taken a minute before or an hour before. This leads to tons of trades and hence a tying up of tons of margin/capital.

That is why I said in my first post in this thread that vectorized backrests only make sense when there is no path dependency and that such assumption is completely unrealistic in financial trading. Capital is limited, so are risk limits. That means when an algorithm makes a decision whether to buy or sell or do nothing it must know how much capital is currently used and whether the algorithm is currently already long short or flat. But a vectorized backtest does not allow for knowledge of state.

In summary, my point was that a vectorized backtest does not make sense in financial trading other than for some rudimentary initial idea profiling, for example, to visualize when signals were triggered. For any backtest that incorporates performance metrics and involves the utilization of capital and information on risk/reward a vectorized backtest will always fail.

Don't believe anyone who only uses buzz words and otherwise is unable to walk through a simple example. Chances are high that such person either does not know what he is talking about or that he is intentionally misleading or both.

nooby_mcnoob · May 10, 2019

Wow pretty much what I said that it was about idea validation. I think this guy just reads what he wants to read. He may not only be retarded but blind.

IAS_LLC · May 10, 2019

GRULSTMRNN said:
Translation: you are talking out of your ass and are just too damn proud to admit. Thanks for the confirmation. Just sad there are so many posers and liars on this site.
More...

You're right. My mistake. It was my ass talking again.

nooby_mcnoob · May 19, 2019

I was using Dask on and off (just converting from Pandas when necessary) but I found the overhead when running with multiple processes was a bit too much. I found a happy medium by just writing my own grouped apply. Thought it could be helpful to someone.

In particular, I cannot use chunksize from pool.starmap/map because this quickly runs out of memory (I think it processes arguments eagerly or something). It isn't optimally parallel since there are times when it sits more idle than it would ideally, but it does let me backtest an intraday strategy over 10 years in ~25 seconds on my threadripper vs about 5 minutes serially.
Code:
from multiprocessing import cpu_count, Pool
import pandas as pd
import numpy as np

def _func(func,name,group):
    return func(group), name

def df_parallel_apply(grouped,func):
 chunksize=cpu_count()
    with Pool( chunksize) as p:
        args = [(func,name,group) for name, group in grouped]
        # list of tuple(result,index)
        results = []
        while len(args):
            args,args2 = args[chunksize:],args[:chunksize]
            results2 = p.starmap(_func,args2)
            results += results2
       ret_list,index = zip(*results)
    if len(ret_list):
        if type(ret_list[0]) not in [pd.DataFrame,pd.Series]:
            df = pd.DataFrame(ret_list,index=index)
            df.index.names = grouped.keys
        else:
            df = pd.concat(ret_list,keys=index)
            df.index.names = grouped.keys + (ret_list[0].index.names)
        return df
    else:
        return pd.DataFrame()
Some threadripper porn (note that significant data copying overhead still exists - that's ~ the red). I think this is due to suboptimal grouping, as far as computation is concerned.

nooby_mcnoob · Nov 3, 2019

GRULSTMRNN said:
Check out this thread, it's absolutely hilarious

https://www.elitetrader.com/et/threads/spy-going-to-260.332384/

Shocking how someone with such flawed knowledge and skillset can bullshit everyone in this thread throughout 5 pages and everyone just goes along with it. Tells you everything about the average intellect of the ET crowd, and this is the technical forum section. Shudder.
More...

It went to $274. Not exactly $260 but I did make money on the way down aside from the obviously retarded way I went about it <3