Backtester for C++

Discussion in 'App Development' started by thefairarbiter, Jul 29, 2020.

  1. Good question. In both cases you will have to create a model that will get over the execution threshold, obviously, and thus have to apply transaction costs to the backtest. If you are backtesting it at mid as I am suggesting, you will know your strategy risk parameters as well as trade value (i.e. changes in target quantity), PnL per tradevalue and periodic volume participation. Taking the TCA data for your execution setup (which can be arbitrarily complex even if you are a retail trader), you will see if your strategy is making enough to overcome your expected execution cost. The benefits are (in no particular order):

    • being able to use much larger set of TCA data to optimize execution. If you want to be anal, you can have your strategy also output tradevalue per intraday buckets so you can use TCA specific to the volume skew.

    • take into account recent changes in liquidity or microstructure. Imagine that something has changed fairly recently, but you are backtesting for the last 10 years. You can use the most recent data to figure out what the expected T-costs are currently and apply that to your full backtest.

    • you will be instantly able to re-evaluate viability of each strategy (live or in-conservation) when you change your execution setup by adding new broker algos or new order types. That also includes the case where you add another strategy and start crossing positions between the two strategies when necessary.

    • finally, it will give you an easy set of metrics to watch if you are tweaking parameters to increase PnL/tradevalue. There are various techniques to increasing PnL per trade without overfitting the strategy such as adding hysteresis or cost-of-risk parameters.
     
    #21     Jul 31, 2020
  2. 931

    931

    I guess with mid it would also be possible to collect spread info from bid-ask data and create avg spread tables for each stock , for various days and times. Without actually storing all bid ask info.

    Also doing it on few years of data and still using 10 years might work ... If conditions changed in recent years , it might make old data more valid.

    Could be hours and weekdays and avg spreads for times?

    With month it would get complex as not many samples to get in few years.

    Id guess avg might be quite accurate this way if using avg spread pricing table per each stock, unless edges found will be low spread related and actual spread is higher than avg under those circumstances.

    That type of table would take lot less memory compared to full bid-ask data.
    And it would be easy to compare accuracy vs full bid-ask. If make 2 versions and run same parameters.

    Some NN dev might make neural net that estimates spread, not just based on time and avg values.

    Its probably good example of what would be easy for custom backtester but hard if not hopeless on proprietary.

    More simpler solution would be bid or midprice + fixed spread per stocks but that is bad idea imo.
     
    Last edited: Aug 1, 2020
    #22     Aug 1, 2020
  3. 931

    931

    Actually i think it might not be valid.
    If spread was higher before then price would have reflected avalible strategies at that time.
     
    #23     Aug 1, 2020
  4. SteveH

    SteveH

    You could use Amibroker and access the backtester through its COM interface. You won't find anything faster or more robust.
     
    #24     Aug 1, 2020
    Metamega likes this.
  5. Elji

    Elji

    @thefairarbiter
    Maybe you could consider running your backtests on a trading platform that runs on C++.
    I would recommend Sierra: it is fast, stable, and interfaced for c++.
     
    #25     Aug 1, 2020
  6. It would only be true if you are leveraging some form of liquidity premiums or somehow exploiting the microstructure. Imagine that you have found that new moon influences the overnight returns of the French stock market - the moon has been around forever, the French stock market has been around for a fair bit, but tight spreads have only been a thing for the last ten-fifteen years. There is no reason to assume that your the effect of the moon is liquidity driven, so you can reasonably assume that your 30-40 year backtest is valid, while assuming current transaction costs.

    Well, ideally if you have been trading for a while and have been saving your fills vs arrival prices, you can create a reasonable execution data-set that would reflect your specific setup (access to venues, algos, latency etc) and use that. In absence of that, I'd ask the broker for some TCA data (any broker that has some institutional presence would have that available).
     
    #26     Aug 2, 2020
    eternaldelight likes this.
  7. 931

    931

    For constructing midprice , what type of data do you use?
     
    #27     Aug 3, 2020
  8. It's a tricky question, especially true if you only have access to the top of the book information. For market that are always quoted single tick wide and has a nice thick book, I'd use weighted mid, i.e. bid*ask_size/(ask_size+bid_size) + ask*bid_size/(ask_size+bid_size). However, for something that frequently is quoted wide I'd use a simple arithmetic mid - otherwise, someone pennying a large order will actually bias your mid the wrong way.

    If you happen to have the full order book dataset, there are several clever ways to construct probabilistic micro-mid (most notable one from Sasha Stoikov, it's on SSRN). I'd only engage in implementing something like that if you are locked out of Netflix and cut off from porn sites.
     
    #28     Aug 3, 2020
    931 and eternaldelight like this.
  9. 931

    931

    Nothing to be locked out, no netflix or porn accounts.

    Probably not doing order book based sim at this stage, too much learning, data and processing.

    For positions where spread is small fraction it seems to be wise using mid prices + some way to generate spreads.2x less memory needed.

    But i have few ideas involving scalping.

    In that case could historic prices & wider spread reflect unavailable opportunities that are unrealistic with current spreads as price could be also affected by spreads?

    Penny stocks have gigantic spreads, perhaps prices+spreads reflect opportunities.
     
    Last edited: Aug 6, 2020
    #29     Aug 6, 2020
  10. Algokd

    Algokd

    Sorry to resurrect this thread, but I'm curious for your opinion on why the open source solutions are unsuitable for hardcore use. What are some of the features you find lacking in them?
     
    #30     Mar 23, 2021