Backtester for C++

Discussion in 'App Development' started by thefairarbiter, Jul 29, 2020.

  1. guru

    guru

    #11     Jul 29, 2020
  2. At my shop, we have developed and use both a separate C++ engine for backtesting high frequency strategies (though it's a bit of a tricky game and is a "glimpse" at best) and a separate python backtest engine for lower turnover strategies. Sadly I can't share or contribute, but I can give my (biased but professional) opinion on the state of backtesting engines in the open source space as well as opine on what you are trying to build.

    • Pretty much every open source backtesting engine I've looked at is not really suitable for hard-core use. That includes various venture sponsored projects like Lean or Zipline, as well as hobby projects like backtrader. Because most developers (and startup founders) lack actual quantitative trading experience, they have built features that are kinda useless and left out features that are a must have, at least in the institutional setting. @globalarbtrader s https://github.com/robcarver17/pysystemtrade is probably the closest I've seen to an institutional-quality product, but it's geared toward a very specific type of strategies.

    • There are two distinct types of backtesting engines. A fixed interval (bar-based) backtester creates a target position based on a snapshot of the market at some given point in time. An event-based backtest processes updates to the market state as they arrive. The latter usually consumes tick data, you can build a variety of market state simulations and latency responses - that makes them suitable for simulating higher-turnover intraday strategies. The former take bar data, allows for more complex computations and thus is more suitable for lower-turnover statistical strategies. It's very difficult to combine these two in a single product and there really is no reason for it - decide on one or the other. Unless you have real life LL development experience, I'd avoid anything order-book based.

    • A lot of what you are trying to build does not belong in the backtesting engine. Indicators are part of your alpha and should live in a separate, unrelated library. The state of your account and broker should not be part of your backtesting process. Also, do not try to combine backtesting engine with actual live execution, it's silly and will give you more problems than benefits.

    • Unless you are building an event-driven tool to test HFT strategies, do not bother with simulating the fills and trading costs. General idea for lower-turnover strategies is to backtest at mid, establish your PnL/tradevalue and use various portfolio formation/trade reduction techniques to increase it until you get above your expected cost of execution. I.e. you are breaking TCA and alpha into two separate threads and can use TCA from your prior experience instead of building uncertain costs of execution it into the actual backtest.

    • It makes sense to spend a lot of time and effort on backtest analysis, especially on turnover, pnl/trade analysis, drawdowns (both depth and length) and market correlation. This said, I am not sure it makes sense to write all that in C++ or export the backtest into a separate analysis tool. We have build a pretty Excel tool for it and I am using it for both HF and LF strategies.

    • Make it easy to store backtests together with the relevant version of the alpha code and all of the parameters (we dump the whole things into Mongo, for example, with code and parameters). Create a tool that allows you to read and compare multiple backtests graphically, as well as allows you to compare your backtest with live trading results. As you are start running multiple live strategies and start making changes/revisions, you will appreciate these features.
     
    #12     Jul 29, 2020
    peechu, Elji, Metamega and 3 others like this.
  3. First, awesome post. Thank you.

    I think the larger narrative you're conveying here is that the professional world of quantitative finance has a much harder time building a backtester that can actually emulate live trading. I totally believe that. But I'm not convinced that an amateur, do-it-yourself'er like me should be quite as concerned with the constraints of strategies like yours. In short, I've tried to build a live trading simulator as a backtester (hence the account and broker and stuff), and I believe that the strategies and technical implementations of my IB paper trader are simple enough for this to be possible. Let me elaborate:

    Definitely fall into this category. This kind of backtesting is simple, and it's really just an abstraction of a live trader, but with pre-existing data. Hence my integration of the backtesting framework into a live trading application. Wouldn't you say that having a simple setting like this minimizes the drawbacks that I think you're talking about?
    Touched on this above. I suspect that this piece of advice is from professional experience at a real investment firm with complicated strategies. I imagine that the strategies you're trying to make are difficult to simulate in the first place, and may not be possible at all unless you start going crazy with randomization algorithms, since the latency of your data stream has a high variance in proportion to the mean (whereas my simple model has a variance in much lower proportion to the mean). Milliseconds just don't matter in the latter case, but matter a lot in the former. Is this is the main motivation of your perspective here?

    This is a great insight. I believe what you're trying to say is that the backtester is really just supposed to tell you what your strategy's expectations (as in probability) are in terms of trade volume, trade profit, available capital, and relative performance, for a given piece of data in a known market setting. It will also make 1:1 comparisons like you mentioned easy. This kind of approach will definitely make its way into my current project.

    Order book-based (synonym for depth-of-book right, or L2 data right?) is supposed to mean HF strategies? Also, LL development?

    Two more questions: how many trades a day does your typical "low frequency" strategy make? How many for the high frequency strategies?
     
    #13     Jul 29, 2020
  4. traider

    traider

    I don't see any advantages to developing no HFT strategies in C++ especially when Python can easily do the job much faster. Also as your ideas become more complex and require machine learning it will be very tough to implement this in C++
    IB has a python api so you might want to check that out
     
    #14     Jul 30, 2020
  5. You don't need C++ for writing a backtester.
    Python is much faster to develop and it's easier due to ample third party libraries available.

    C++ should only be used for trade execution.
     
    #15     Jul 30, 2020
  6. For most part, yes, you'd use full order book (at least a few levels away from the touch) to figure out order book pressures, overhangs etc. It's HFTs bread and butter. LL = low latency. Some trade/book pressure strategies would only use top of the book, especially for situations where you are doing it across related products.

    I think it's best to think of this in turnover terms (since my target positions are sliced for execution, sometimes I get a lot of trades that are not really "trades"). Stuff that turns over from once a day and higher is "non-HFT" for the purposes of this discussion. To be honest, even higher turnover strategies can be simulated well enough using bar data (e.g. secondly bars) as long as you make some assumptions about the market microstructure.

    Best way to think of your backtest is that it's a Tinder profile. The ones that look ugly you swipe left right away and that's that. The ones that look passable (depending on how desperate you are) you swipe right, only to frequently discover flaming pimples and hairy armpits. Whenever that happens, you always wish that you had more pictures beforehand. Similarly, the task of a good backtesting framework is to to show you the potential issues with the strategy in every way possible. Sometimes it's possible to fix these issues without overfitting and go on. Sometimes you drop the idea all together after seeing these issues.
     
    #16     Jul 30, 2020
  7. 931

    931

    If we can find common ground for future plans maybe develop some aspects together?

    I would not think of open sourcing in this type of competitive field.

    Had to cut corners like documentation to do things quicker but still took years.

    I also use common timings for all instruments to save on memory and get faster seek times if accessing instruments by timing.

    But also keep tick file position references to do accurate simulation of orders.

    In developing backtester id reccomend to implement bid and ask data, at least for order simulation if working on lower timeframes or penny stocks.

    [​IMG]
    The right chart is displaying bid-ask bars of AAPL for example.
    White is bid center of bar is shared etc.
    Depending on what tone you focus on will determine if you see bid or ask.

    With visible spreads like these its important to use both in simulation IMO.
     
    Last edited: Jul 31, 2020
    #17     Jul 31, 2020
  8. 931

    931

    If using python ML libs then those are C anyway.

    But if large portion of bottleneck code runs python then slow python runtime could slow development if model generation would take days for example.


    In this scenario C++ both runs and allows faster development.
     
    Last edited: Jul 31, 2020
    #18     Jul 31, 2020
  9. I would say exactly the opposite, unless you are dealing with very short holding times or doing any sort of market making. Avoid using bid/ask spread, order simulation or simulating fills directly in the backtest process and simulate everything at mid, while maintaining good alpha/trade and volume participation statistics.

    Execution is highly uncertain and you would rather separate it into your implementation analysis that includes TCA/volume/slicing etc. This way you can also optimize execution holistically instead of doing it on per-strategy basis.
     
    #19     Jul 31, 2020
    eternaldelight likes this.
  10. 931

    931

    Without spreads its much easyer to find too much unrealistic oppertunity IMO.

    Especially in penny stocks where spreads are enormous. Or even with sp500 low spread ones if going lower timeframes.

    Generating signals and simulation can be separated and its great for rendering tests fast.
    But even in this stage you could use bid ask strength multiplier or something that pulls to midprice or further using both bid and ask data for more accurate? results.

    I am thinking about retail level spreads...
    Takes alot to even overcome those on lower timeframes IMO.
    Thats why i include in tests.

    For algos i use mid in many places.
    But cant fully understand why midprice would be better in historical order simulation.

    So far i have gone from bid->mid->bid-ask simulation
    While keeping backward compatability in the form of preprocessor macros to disable ask data.

    Using mid+simulated spread will make it easyer but does not seem realistic if using sl & tp.
     
    Last edited: Jul 31, 2020
    #20     Jul 31, 2020