Unreliable historical intraday data

Discussion in 'Data Sets and Feeds' started by I$land, Feb 24, 2006.

  1. I$land


    I downloaded historical intraday (1 min., 5 min.) data from both ESignal and Prophet.net. To my big surprise every stock had missing bars ! Sometimes 1 bar in a day but some other times more than 20 bars in a day ! By the way, I determined this using a script I wrote in Wealth-Lab.

    It is pretty hard to backtest any system if the data used is unreliable...

    What data provider do you guys use to backtest your systems ?
  2. I record the data myself from my broker.. of course I have to know the symbols I want to record ahead of time. You can get historical files from INET at data.inetats.com, also from ARCA, but they are pretty expensive last I checked (at least $1000 a month)

  3. High quality data is absolutely key. I use tick data directly recorded from the exchanges, so I paid a heavy price (exchange connectivity, colo, etc) for around 5-6 months just for the data.
  4. Yep, exactly what I do as well. I don't connect directly to the exchanges but I get a very raw feed thru limebrokerage's api which has some nice stuff around it to make all the data from the different exchanges come in the same data structure. Very fast and the timestamps on the data are as accurate as what the exchange provides.

  5. Well, those guys at Lime Brokerage know what they are doing, Mark Gorton (?) and Alastair (the cto) are pretty sharp guys. I met with them when I was with the ibank back in '01 - early '02, interesting meeting, they were abit head of the curve back then. But they were also very small back then, I remember they were renovating their office when we met, nice high corner on Broadway.
  6. Digressing here, but they are also owned by the lime group, creators of the popular filesharing program LimeWire. Very diverse business ventures it seems. :p And all the guys I've talked to there are really at the the top of their game.

    Back on topic, data from INET used to be very cheap ($100 a year), but got very exepensive when nasdaq started taking over. Also, the staff seems to have gotten more arrogant and probably don't care about you unless you are some big institution... probably not a good route to go and there are probably better data providers, customer-service wise.

  7. One


    I've used several sources for minute and tick data and found TickData's data base to be excellent and most consistent with my needs. You may find it informative to read the white paper on their site regarding inherent problems in data collection and a general discussion of their approach.
  8. GTG


    I see people recommending this place a lot on this board:


    They have an interesting white paper about filtering high frequency data on their main page which is informative and at the very least shows that they at least understand the problem.

    I haven't used them yet, but probably will for my next project...a minor concern I have is that if their data filtering is as sophisticated as they claim it is, that perhaps my backtesting will be invalid because my own filtering I can implement on my real-time data feed would likely not be as good as theirs at eliminating false signals.
  9. I'll take a look at their paper. But are there filters causal or not? If not, then I'd say its useless.. if so then that would be great.

  10. I$land


    Thank you all for all your inputs.

    stephencrowley :

    About limebrokerage and their API, how much software development was required ? The problem it seems you cannot get historical data but just current day data ? So getting 6 months of 5 min. bars from 20 companies would not be possible. Unless you are willing to wait 6 months to actually collect the data.

    rufus_4000, one, GTC :

    I guess someone serious about systems trading must absolutely use a service like TickData. It is a bit pricy (18$ per ticker per year of data) if you want to have a solid database for backtesting.


    Someone (privetely) suggested me to use QCollector with ESignal, which is another option I will have to look into.

    Having access to a raw datafeed, does this mean you only get tick data and you have to manage to do the rest (create 1 min, 5 min. bars) yourself ?

    Thanks !

    #10     Feb 24, 2006