On a quest for better data ...

Discussion in 'Data Sets and Feeds' started by abattia, Apr 26, 2010.

  1. I autotrade US equities, mainly ETFs, intraday. I use eSignal’s real-time data feed (NYSE + AMEX + NASDAQ, Level 1) to generate entry/exit signals (I essentially ignore my broker’s data feed from the perspective of signal generation). I backtest/optimize systems using historical data also from eSignal.

    (As I wrote previously in another post) I’ve observed that historical eSignal data available after the fact does not equal real-time eSignal data; OHLC and volume can be substantially different (e.g. 15% volume difference in a 1-min bar between real-time and historical), to the extent that I can get different trading signals generated in real-time compared to playback using historical data.

    As I “train” my systems on historical data, I want to use historical data that’s as close as practical to what the systems will work with in real-time.

    I have had several helpful responses to my prior post, one in particular suggesting that some of the differences might relate to real-time delays occurring when eSignal consolidates data across different exchanges. Before looking at alternatives to eSignal, I wondered therefore if I could perhaps improve the situation by simplifying my eSignal data feed to just one exchange and ignoring data from the others.


    If not madness, which exchange should I focus on? NYSE, AMEX or NASDAQ?

  2. thstart


    With the new market structure (Regulation NMS) NYSE listed securities for example are traded not only at NYSE.
  3. If you are back-testing over a very large sample set, any unique qualities of the data should normalize over the testing period. i.e. 1-min data over 5 years for the S&P 500 is a very large set; 15 min over 2 years for the Dow 30 is not.

    If you design your system for any one data feed, then you are curve-fitting (BAD). No historical data feed will match another 100% and no real-time broker or quote system will match 100% of the time. Therefore your system should be robust enough to handle multiple feeds. A sound idea will work over multiple feeds/brokers and not be locked to just one.

    I do a great deal of testing for my own auto-trading systems, and if I think I have a method that works, I will test on at least 3 historical data feeds.
  4. thstart


    Another part of the market structure changed recently.

    When you place an order the chain is working as follows:
    1) your broker/dealer
    2) pools
    3) exchange

    If 1) can get your order you don't go to 2) and 3). If 1) and 2) don't got your order and you are on 3) then there is a good reason for this.

    So, where you get the feeds from matters a lot.