Question about what trades print to the tape

Discussion in 'Automated Trading' started by two_iron, Sep 19, 2020.

  1. two_iron

    two_iron

    Hi there....My son and I are building an app to trade automatically using the TDAmeritrade API. I'm stuck on one issue and really need some help. I've tried other places, including TDA with no luck. Thought I'd give it a try here.

    The API gives us streaming data and historical data. We trade 1-min candles. The streaming (on-the-fly) volume data does not add up to the volume posted at the close of the 1-min candle. For instance, we're monitoring ROKU and it's 12:05pm EDT.... the streaming data, which provides every transaction (Last Price and Volume of trade) as it occurs on the fly may show a total volume of 35,000 shares trades between 12:05 and 12:06. However, the historical data, as the 12:05 candle closes and becomes history, may show a total volume of 31,440 shares for that same minute. They are parsing out some of the trades that come in. The historical volume data from this TDA API agrees with all other historical charts (Yahoo, ThinkorSwim, Schwab, Barchart.com, etc.). In other words the parsing is a protocol that all brokers comply with.

    I spoke with TDA and their best guess was that "odd lots" are parsed out (not counted) in the official trade history. That sounded great, but after experimenting with it, that is not the answer. Odd Lots are any trades with less than 100 shares. There are not enough Odd Lots in a minute of trading to make up the difference we are seeing between the sum of the streaming volume and historical data that is posted 1 second after the finish of the 1-min candle. I've looked at "Mixed Lots" (over 100 shares but not a multiple of 100, such as 3140 shares) and dropped the odd shares off but that doesn't work either.

    My question in a nutshell - What data "prints to the tape", i.e. is officially incorporated into all the stock charts that all agree with each other? Moreover, what data is excluded and does not print to tape. Is it related to odd lots? lot sizes? Do they parse out different exchanges? Is dark pool data not counted?

    Thanks much for your help!
     
    howdoyouturnthison likes this.
  2. terr

    terr

    One thing: could it be a slight shift in time that causes you to record volumes in a candle that AMTD would sort to be in an adjacent candle?

    One way to check would be to do 5-min or even 30-min candles - two ways, one aggregated from the streaming data and the other from backfills. If the difference is still fairly slight, then it is the time shift. If the difference is 5- or 30-times higher, than there are some streaming ticks that are not included in historical data.

    On edit: no, that is not a good explanation. The discrepancies are too big and always in one direction.

    example: for PTON Friday, one minute candles from TDA API:

    Time/Vol from Stream/Vol from Backfill

    15:10 / 70,866 / 65,558
    15:15 / 86,108 / 75,346
    15:20 / 71,091 / 62,689
    15:25 / 120,561 / 103,825

    On the other hand, from Google Finance Interactive Chart, and exactly same from Yahoo Finance Interactive Chart

    15:10 / 70,89k
    15:15 / 88.04k
    15:20 / 71.07k
    15:25 / 115.98k

    Doesn't match either, but is closer to the streaming data.

    Barchart.com matches TDA backfill data almost exactly.

    Tradingview is completely off, by the way. At least the non-subscription one. The candles are not the same, the volumes are way smaller than they are supposed to be, etc. Where are they getting data from? For example, for 15:10 the volume on TradingView chart is 700.

    So the answer may be: there is no such thing as "tape", at least not universally. Different data repositories have different data.
     
    Last edited: Sep 19, 2020
    Ayn Rand likes this.
  3. two_iron

    two_iron

    Thanks for the reply terr. We have tried to sync backwards and forwards to no avail. The trades are all timestamped so we know exactly which minute candle they belong to. Also there is always more volume in the streaming minute than the closed minute. If it was out of sync you would have sometimes higher, sometimes lower.

    Thanks again.
     
    Ayn Rand likes this.
  4. qlai

    qlai

    You are getting too deep. Can’t take cheap/crappy data source and expect it to be precise. Basically, when exchange data consolidators (SIPs) send trades, they are marked as to whether they should be included in volume calculations or not. These indicators are usually not passed through. If you are getting all four trade/sale conditions, you can figure it out using the specs ( search for UTDF and CTS).
     
    Ayn Rand and guru like this.
  5. guru

    guru


    Your question includes wrong assumptions since you are assuming that "printing to the tape" means "officially incorporating into all the stock charts". That's simply not true.
    The tape includes EVERYTHING, and then the various brokers & stock charts can decide what and how they want to include, each one potentially interpreting the tape/data differently. At least the process of cleaning the data may be subject to interpretation, and bugs.
    The tape includes trades/ticks where each one may be subject to different conditions, including trade cancellations, corrections, Form-T, odd lots, some trade summaries, opening and closing prices, off-exchange trades, dark pool trades, and late-reported trades. Here are most of those conditions, though probably not all or missing some details (because I've seen more elsewhere):
    https://polygon.io/glossary/us/stocks/trade-conditions

    Some dark pool trades may also be reported late with wrong time-stamp, though those usually come after hours. Though the day tape can also be messy and it's up to you to put it back together. You may also not be getting all the data from all the exchanges and dark pools. I don't know what TDA provides.
     
    Last edited: Sep 19, 2020
    Sprout, Ayn Rand and qlai like this.
  6. two_iron

    two_iron

    Thanks for the replies! Much to think about....
     
    guru likes this.
  7. ValeryN

    ValeryN

    What you think is a tape might not be a tape. Many brokers try to stay away from providing true tape data, they might call it real time but it's only real time looking, for humans. On practice it is either consolidated few times a second or coming from only one exchange, lets say BATS.

    While I don't know what TD provides exactly my guess would be it is not a real tape.

    Reason why brokers do that is - true tick data is expensive, puts way more pressure on IO/servers, and gives no benefit to 99% of traders who are using their apps.

    I recommend to check a vendor who specializes on data. IQFeed will be one of cheaper ones. Nanex Core is around 10x price but a good deal comparing with other options.

    Val
     
    Ayn Rand and qlai like this.
  8. two_iron

    two_iron

    Hey Val, the API is quite comprehensive, including identifying the exchange each transaction went thru. We can easily filter out the more obscure exchange (and have) and that usually gets us close to the historical data.... but then along comes a candle that blows us out of the water. Like 50,000 shares for the minute on-the-fly, and then the candle closing with 17,000 shares.

    We have not figured out the proper combination of filters to obtain the volume data that the rest of the world sees on their charts. I'm sure it's a universal filtering protocol because all the charts agree with each other on historical price/volume. Unfortunately our strategy is very sensitive to the current volume and relies on trading against what everybody else sees on their charts. I'll check out IQFeed and the other one. Thanks for the help!
     
    ValeryN likes this.
  9. stepan7

    stepan7

    It's known issue with TDA data feed. They are not reporting equities volume from all exchanges in real time only in backfill. It was suggested to use third party data feed like IQFeed for correct RT volume.
     
  10. terr

    terr

    As I showed, though, both Google Finance and Yahoo Finance have values for candle volumes that are closer to TDA's streaming data than to TDA's backfill. Could it be they use the same feed as TDA and aggregate the data?
     
    #10     Sep 21, 2020