Trade data discrepancy (Tick Data)

Discussion in 'Data Sets and Feeds' started by rudi20, Jan 9, 2020.

  1. rudi20

    rudi20

    I've noticed data from Tick Data differs to TradingView and InteractiveBrokers. TV and IB match exactly, Tick Data has many differences each day.

    For example:
    AAPL, 2019-11-25 16:37:34 there are two trades on Tick Data that are not present on TV or IB:

    Price 264, Volume 40, Exchange: NASD ADF (FINRA)
    Price 264, Volume 30, Exchange: NASD ADF (FINRA)

    The lowest price for this candle on TV and IB is 266.25, so that is a significant difference.

    What would be the reason for such a discrepancy and how should I approach this in regards to backtesting?
     
  2. AndyM

    AndyM

    Tradingview uses Bat exchange as their primary source. There are 16 exchanges in the US + dark pools. So it really depends on which exchanges are used as the data source, your tick data can be a little different.
     
  3. rudi20

    rudi20

    Tick Data specifies the Exchange where the trade took place. Should I exclude entries from certain Exchanges so I can obtain a match with IB and TV?

    If so, where can I get a list of the Exchanges I should include/exclude?
     
  4. Metamega

    Metamega

    https://www.interactivebrokers.com/en/index.php?f=1562
    Quick google of exchanges but these would be the lit exchanges. You'd still have darkpools and other venues.

    Stock data is a nightmare. I'd say a good chunk of your time, if trying to get a good data set is scrubbing and maintaining the data you got. Many tick data vendors and data providers provide raw data. Meaning they send all the data and you have to manage it. When I used IQFeed, if I used tick data, I'd have to use a ATR filter in Amibroker to help filter the "bad ticks" out. Yahoo or many EOD providers filter these out or should say the exchanges filter them out with closing data.

    I know in Amibroker, theirs a simple utility to help with bad ticks that's based off "Average True Range". You can turn it up or down to your liking and it just picks up those ticks that are way out of line with average. Simple concept to help out.

    I've kind of come to the conclusion that the perfect stock database is just really hard to maintain. On one hand you want raw tick data as it paints the whole story but then you want to filter out those bad ticks from some off exchange block trade that you couldn't participate in if you wanted too. Then you have de-listings, stock splits, mergers, name changes, re-listings etc.

    Then you got dividends to deal with. Simple solution is to just back adjust your prices on ex-dividend but then your price levels aren't reflecting trading prices in the preceding bars.

    Throw in the fact that the daily open and close prices we all see daily don't reflect opening auction and closing prices as their just the first trade after 9:30 and last trade at 4:00.

    You start getting into it and it's half the battle getting a database that's been scanned for holes, missed splits, etc. Once you get that done you get that sucker on a back up hardrive because you don't want to do that again.

    If your lucky you just want EOD data and use something like Norgate data and Amibroker and let their plugin features do the hard work for you. Get a de-listed database and let their plugin handle monthly historical constituent lists and watch Amibroker's backtester handle historical index constituents with a simple line of code .
    "NorgateIndexConstituentTimeSeries("S&P 500")"

    Futures guys just have to worry about how to stitch some contracts together. Forex guys just need to figure out what their data is and where its from lol.
     
    jharmon and rudi20 like this.
  5. AndyM

    AndyM

    It's particularly challenging to reconcile tick data from 2 different data providers as you don't really know which exchanges they use for their tick data. You just have to take their words for it. A rule of thumb is that the more exchanges you use, the more accurate your data.

    Also you can check out https://finnhub.io/docs/api#stock-tick . They have tick data from all US exchanges + other trading venues (dark pools) at a very affordable price. I believe their websocket for live update is free as well. This can guarantee the accuracy of your strategy and save you quite a bit of money compared to Tick Data
     
  6. rudi20

    rudi20

    Exactly!

    I'm considering two strategies:
    1) I'll get historical 5m candles from InteractiveBrokers, then run each trade against the High and Low of that 5m candle. Anything that falls outside the high or low I'll discard.
    2) I'll run each trade against the Level 1 Quote data of the time and if the trade is far from the bid/ask, I'll discard.

    I'm open to other suggestions or comments on the above.

    Regarding simply dropping trades reported by a particular exchange, would it be sensible to drop trades reported by these 'Exchanges'?

    • NASD ADF (FINRA)
    • Market Independent (SIP - Generated)
    • Consolidated Tape System

    I've been able to match up all the other Exchanges with those on the list of Interactive Brokers, only the above three entries don't tally with anything on the IB end.

    Thanks.