Advanced Backtesting

Discussion in 'Strategy Development' started by UMU, Apr 18, 2004.

  1. UMU


    During real trading besides the current price (ie. last trade price) IMO nearly everybody watches also the actual Bid/Ask prices
    and sets his own price based on these 3. Usually Bid <= Last <= Ask, but this isn't always the case.

    While developing my own RT system I made the following observation: if one only had also the Bid/Ask in historical data then one could do a better/more realistic backtesting. Usually we do add a slippage % to the Last price. But if Bid/Ask were available too, and this price is better than the Last +/- the calculated slippage then one could use the better Bid/Ask price instead, much like in real trading.
    And IMO this procedure would give not only a more realistic backtest results but obviously also an improved performance.

    The problem is that many data vendors don't understand what data is really necessary for doing realistic backtests. So, besides Date+Time,OHLCV one also needs the Bid and Ask prices in historical data.

    What do other system developers and testers think?
  2. prophet


    I use pure Bid/Ask price levels and depth to simulate market orders. If you have bid/ask data, why use the last at all? The last price can be a very poor indication of the true market. It will lag in periods if illiquidity. Execution at the last is dependent on orders crossing the spread, and limit order queues, all of which are unpredictable. On the other hand, the bid/ask market represents easily executable prices and available depth… much more reliable to backtest against.

    I log all my data in real-time for this reason. L2 depth can be a ton of data depending on time resolution. However, I compress it better than most. Many data vendors seem to be limited by their inefficient data compression. You’d be surprised how few bits it takes to encode a tick, including T&S, L2 prices and depth.

    3rd party historical data can be a risk. Are the real time data characteristics consistent with your historical data?
  3. UMU


    Actually I too am using my own collected data. But I would like to have much more data and for a longer timeframe. The only vendor I found is TickData but they ask $18 per year per stock. Their complete DB costs about $35k. But I think it isn't even having the Bid/Ask data.
    Self collected data is of course better but one can't do that for 1000 or 2000 stocks, which I'm mainly interessted in.

    BTW, I've heard about "ftp'ing TAQ data" which seems to be all the tick data of the day for all NYSE(?) stocks, but couldn't find any further info. Anybody know more? Pricing etc.

    I must admit I'm using no compression at all yet, I store them primarily in memory and then dump to simple text files for backtesting purposes later. I know compression should be applied to save on HD space.

    What is your recommendation on which data one should collect?
    What is exactly meant by Time&Sales data?
    How deep in L2 to go?
  4. WinSum



    What software are you using for backtesting that can use bid/ask price levels to simulate market orders ?

    Also, how do you log real-time Level 2 data ? Is there a software that will capture the live data so it can be used later on for backtesting ?


  5. prophet


    You should be able to log real time data for 2K stocks using Realtick over a T1 or from a co-located server, or maybe over a fast cable/dsl connection. I’m not sure having never done that many stocks in real time. You can also do an end-of-day download of Realtick ticks for between 8K and 10K stocks. That takes roughly 30 minutes based on only downloading T&S, no bid/ask or L2. Adding inside bid/ask quotes and sizes would have at least doubled the amount of data. Outside levels (L2) are not available historically due to the bandwidth required.

    NYSE TAQ can be a gold mine of data. I purchased 3 years of TAQ data a few years ago to help backtest 3K to 5K stock systems. TAQ has some errors, including missing ticks, out of order trades and missing/invalid CUSIPs that complicated the data linking process and degraded my system results. Splits are not corrected and there is no accurate table of splits… had to hack around this. I was using end-of-day Realtick to incrementally patch my database. Realtick’s stock data suffers from some of the same problems. Despite some advanced filters I eventually got tired of dealing with the data and CPU complexity issues (being on a budget) and scraped the systems to instead work on fewer-market (10 to 20 symbols) futures systems based on much cleaner data.

    If you are involved in an academic project through a member college/professor, the entire TAQ dataset is available here:

    IB’s real time data is excellent.

    T&S is the time, price and size of executed trades.

    Required L2 depth depends on your methods or models. The downside of L2 is the data density. It will eat up bandwidth and limit you to fewer symbols.
  6. prophet


    Everything I use is custom code, written in C and Matlab. Real time data is accessed through the IB TwsSocketClient API. There probably is 3rd party software to log real time L2 data. Sorry I don't know of any. I would guess that most who use L2 in systems write their own code to access it.

    I’m fairly sure Tradestation and Wealth Lab can use bid/ask prices to simulate market orders.
  7. maxpi


    TS2000 can use historical bid/ask data, I got mine from Esignal. TS8 has realtime bid/ask only, you would have to collect your own data to have historical.

  8. ddog



    Did Tradestation actually include this in the 8.0 release? I had heard rumors that it wasn't going to be in 8.0 but something later.