DATA FEED: Totally Shocked!!! Who Can You Trust???

Discussion in 'Data Sets and Feeds' started by ET70424, Jan 9, 2008.

  1. ET70424

    ET70424

    Hello, All:

    I was truly SHOCKED to find the great disparity between the 1-minute data from DTN (IQFeed) and that from Quote.com/QCharts.com

    For example, for AAPL (Apple Inc.), you get the following from DTN:

    Date,Time,Open,High,Low,Close,Volume
    20070702,09:30:00,121.0500,121.1200,120.7000,120.9600,1091368
    20070702,09:31:00,120.9700,121.5000,120.7200,121.5000,387602
    20070702,09:32:00,121.5000,121.7200,121.0100,121.4200,359397
    20070702,09:33:00,121.4100,121.4983,121.2500,121.3400,230790
    20070702,09:34:00,121.3400,121.3600,120.5000,120.5150,441636
    20070702,09:35:00,120.5000,120.7100,120.4100,120.4400,368037
    20070702,09:36:00,120.4400,121.3900,119.8100,119.8400,582012
    20070702,09:37:00,119.8400,120.0000,119.6600,119.8400,497708
    20070702,09:38:00,119.8500,119.8600,119.3000,119.5500,440919
    20070702,09:39:00,119.5700,120.0300,119.3400,119.9000,511771
    20070702,09:40:00,119.9000,120.4500,119.8700,120.3300,400258
    20070702,09:41:00,120.3300,120.3800,119.9300,120.2000,265160
    20070702,09:42:00,120.2100,120.4200,120.1000,120.1500,168681
    20070702,09:43:00,120.1400,120.2900,120.0500,120.1700,180477
    20070702,09:44:00,120.1700,120.3400,120.1200,120.1800,190279
    20070702,09:45:00,120.1700,120.2000,120.0000,120.0300,178249
    20070702,09:46:00,120.0200,120.1300,119.9800,120.0800,201295
    20070702,09:47:00,120.0800,120.4400,120.0200,120.3595,197620
    20070702,09:48:00,120.3600,120.4000,120.3000,120.3500,154873
    20070702,09:49:00,120.3500,120.8000,120.3500,120.8000,346926
    20070702,09:50:00,120.8095,121.0000,120.4150,121.0000,299239

    Now, compare that with the data covering the same stock and time period from Quote.Com, shown below:

    20070702,09:30:00,121.08,121.12,120.7,120.94,203522
    20070702,09:31:00,121.04,121.5,120.835,121.47,149807
    20070702,09:32:00,121.497,121.72,121.01,121.3,134121
    20070702,09:33:00,121.32,121.49,121.25,121.35,91263
    20070702,09:34:00,121.35,121.41,120.56,120.61,152710
    20070702,09:35:00,120.55,120.71,120.44,120.56,118052
    20070702,09:36:00,120.5501,120.5501,119.83,119.84,229697
    20070702,09:37:00,119.85,120,119.66,119.86,178190
    20070702,09:38:00,119.87,119.87,119.3,119.399,204212
    20070702,09:39:00,119.54,120.03,119.36,119.96,160439
    20070702,09:40:00,119.97,120.45,119.88,120.39,143295
    20070702,09:41:00,120.39,120.409,119.93,120.19,84366
    20070702,09:42:00,120.18,120.42,120.1,120.13,86411
    20070702,09:43:00,120.13,120.29,120.05,120.181,61818
    20070702,09:44:00,120.18,120.339,120.13,120.18,79605
    20070702,09:45:00,120.1895,120.2,120,120.018,45673
    20070702,09:46:00,120.01,120.13,119.98,120.09,95859
    20070702,09:47:00,120.09,120.42,120.03,120.4,56599
    20070702,09:48:00,120.39,120.4,120.3,120.36,82845
    20070702,09:49:00,120.36,120.8,120.35,120.8,139759
    20070702,09:50:00,120.8,121,120.79,120.99,113031

    Notice the big differences, especially volume? Surprising, isn't it? Obviously, both can NOT be right. Or may be BOTH are wrong?!!

    I wonder why? Any ideas? Any explanations?

    If you had to pick one or the other, which would you pick? Or is there a third option?

    I like to see what users of other datafeeds find. Let's compare notes.

    Regards
    ET
     
  2. I$land

    I$land

    If you compare any 2 data feeds you will get differences like these. Sometimes your data feed will be the right one and sometimes it won't. If it's 50/50 than you're okay. No data provider is always right or wrong.

    Trust me I have been down this road before...
     
  3. ET70424

    ET70424

    I don't expect perfect match.

    I'll even tolerate differences in O,H,L,C, if they are not too large.

    I'll even tolerate up to 30% difference in volume.

    But an error margin of 60% to > 100% or more in volume is just too much.

    So, we've huge error margins. The only solace is if the error margins are even consistent. If they're not, then the data is only worth a fraction of its purported value, and one doesn't even know how big that fraction is, 10%, 30%, 50%?

    Regards.
     
  4. thrunner

    thrunner

    Tradestation: I suspect TS is reporting the data at the close of the each minute (eg close of 0930 is reported as 0931) while the others are reporting the data as if it is the beginning of each minute.

    Please note that if you start the TS data at 07/02/07, the volume will be zero for the first minute, but if you start the TS data at 06/29/07, the volume will be 1098451 for the first minute, presumably due to trades outside of RTH.

    if started data over the previous weekend 06/29/07
    07/02/2007,0931,121.08,121.12,120.70,120.97,831018,267433,1098451.00


    "Date","Time","Open","High","Low","Close","Up","Down","Volume"
    07/02/2007,0931,121.08,121.12,120.70,120.97,831018,267433,0.00
    07/02/2007,0932,120.96,121.50,120.88,121.50,218784,164185,382969.00
    07/02/2007,0933,121.50,121.72,121.21,121.42,189072,169675,358747.00
    07/02/2007,0934,121.41,121.45,121.25,121.34,98246,126553,224799.00
    07/02/2007,0935,121.34,121.36,120.50,120.50,174740,267586,442326.00
    07/02/2007,0936,120.50,120.71,120.41,120.44,160598,204964,365562.00
    07/02/2007,0937,120.44,120.46,119.81,119.82,209969,342614,552583.00
    07/02/2007,0938,119.82,120.00,119.66,119.84,266834,229209,496043.00
    07/02/2007,0939,119.86,119.86,119.30,119.57,193214,242405,435619.00
    07/02/2007,0940,119.57,120.03,119.34,119.88,279953,233018,512971.00
    07/02/2007,0941,119.89,120.45,119.88,120.33,225920,172938,398858.00
    07/02/2007,0942,120.33,120.38,120.05,120.20,122790,136640,259430.00
    07/02/2007,0943,120.21,120.27,120.10,120.14,78711,90070,168781.00
    07/02/2007,0944,120.14,120.29,120.05,120.17,91967,88610,180577.00
    07/02/2007,0945,120.17,120.34,120.12,120.18,97036,93043,190079.00
    07/02/2007,0946,120.17,120.20,120.00,120.03,82100,96149,178249.00
    07/02/2007,0947,120.02,120.13,119.98,120.08,77434,123861,201295.00
    07/02/2007,0948,120.08,120.44,120.02,120.40,111266,85604,196870.00
    07/02/2007,0949,120.35,120.39,120.30,120.35,80889,73334,154223.00
    07/02/2007,0950,120.35,120.80,120.35,120.80,228543,113283,341826.00
    07/02/2007,0951,120.81,121.00,120.79,120.99,190437,103402,293839.00
    07/02/2007,0952,120.99,121.01,120.69,120.97,144767,128611,273378.00
    07/02/2007,0953,120.88,120.95,120.72,120.77,59413,83397,142810.00
    07/02/2007,0954,120.81,120.85,120.58,120.71,186023,144122,330145.00
    07/02/2007,0955,120.69,120.83,120.66,120.81,49981,55274,105255.00
    07/02/2007,0956,120.80,120.82,120.66,120.73,45926,68285,114211.00
    07/02/2007,0957,120.75,121.13,120.68,121.11,200952,162194,363146.00
    07/02/2007,0958,121.13,121.13,120.83,121.02,144476,137511,281987.00
    07/02/2007,0959,121.05,121.34,121.00,121.28,141705,106981,248686.00
    07/02/2007,1000,121.28,122.09,121.14,121.70,247518,148334,395852.00
     
  5. FWIW, Tradestation shows totally different volume numbers for the same period.

    9:30 (eastern) - ~55k shares
    9:31 (eastern) - ~84k shares

    Differences in regional exchange inclusion may account for major differences. Minor differences can be accounted for by the exact timing of when one minute starts and ends in their aggregation scheme.

    If you're really hard up for accurate numbers, compare ticks during that period and see whose missing what data.
     
  6. ET70424

    ET70424

    OK, may be AAPL was not the best for comparisons.

    One should pick a stock that is traded, if possible, on one exchange only, and preferably highly active, i.e. high volume.

    I suspect this would be a stock traded on either AMEX or NYSE.

    Besides the stock (ticker symbol) we finalize on, I am also open as to which day and which time window. Let's keep it to a 30-minute window of 1-minute data, just so that it won't take up too much space.

    A recent date is fine, although I wouldn't use the latest date since there may be trade corrections or false spikes that different vendors may take different amounts of time to correct. So something like a month ago or older would be better. I have no problem using July 2, 2007 (i.e. first trading day of July, 2007, 6 months ago) as any errors would have been corrected by now by all data vendors (if they do correct errors).

    As for time window, the first half hour of trading would seem to be a good choice, e.g. From opening bell to 10:00 a.m. New York Time. I suggest using 1-minute data from opening bell of 9:30 to 10:00 a.m., unless someone has a better suggestion.

    So for now, tentatively, Date & Time Window is:
    Monday, July 2, 1007, from opening bell (9:30 a.m.) to 10:00 a.m., New York Time.

    Obviously, the most important thing is to decide on the best stock (i.e. the ticker symbol) for this comparison project. Is SPY, or DIA a good choice?

    Does someone have a better suggestion? I am all ears.

    As soon as we can agree on the stock, and if the time window above is acceptable, we can then post the data results and compare.


    Regards.