For those experienced with Genesis API/data feed

Discussion in 'Data Sets and Feeds' started by Kohanz, Oct 31, 2007.

  1. Kohanz


    Over the past couple of years, we (a small group) have developed an automated intraday trading program in C++ using the Genesis API, which has the ability to execute a given strategy in real-time (live) and test the strategy over historical data (backtesting).

    One problem we have had for quite some time, and have not yet been able to solve is outlined below. For those who have experience with Genesis, E-signal historical data, or can shed any other light on this issue, your input is much appreciated. If you are interested in a more detailed discussion, feel free to PM me.

    The problem: (bear with me, this is long)

    We get our real-time tick data through the Genesis API, and build OHLC bars from this data to run our strategies. They might be 1, 3, or 5 minute bars, for example, it all depends on the strategy. We then execute our strategy based, partially based on these candles.

    We cannot get historical data from Genesis, but we are able to get it from E-signal. We can get upto 6 months of 1 minute bars from them for any stock, so we use this data for our backtesting, since we can build any size bar from 1 minute bars. We realize that by using this data, our exit price may sometimes be a bit off, and we accept that, but our entries, if they are based on bar data only (and often they are), should be perfectly reproducible.

    The problem is that we find that the bars we generate from the Genesis feed are always somewhat different from the E-signal historical data, so that if we test a day we just traded, the results sometimes match well, but other times the bars are just different enough that the backtesting trades differ greatly from the live trades, rendering the backtesting nearly useless.

    We first suspected synchronization to be an issue, however we now use the ECN timestamp on each print to build our bars, so our local clock is not involved in the system any more.

    We also suspected that the "filtering" of certain prints (out of the money, etc.) in the historical data may result in the differences, but we don't see BIG differences in a few spots, we often see VERY SLIGHT differences in almost every candle... almost as if the time window for each bar was shifted by a few seconds, which made us suspect synchronization in the first place.

    So, the question is, has anyone experienced a similar situation? For those who use the Genesis API, what historical data source do you use and do you find that it matches your Genesis tick data perfectly? Do you do any sort of additional tick filtering? For those who use E-signal data for backtesting your system that operates on a non-E-signal data feed, do you find that your historical data matches your live data?

    Any help would be much appreciated, and if anyone has other problems with a Genesis-based trading system, or anything else I can help with, I would be glad to offer my advice in return.

  2. squeeze


    This happens and is just one of the reasons that systems tested on back-tested intra-day bar data never quite match real-time trading.
    Small shifts in timestamping can create quite different trading patterns.
  3. ryleg


    This sucks. We run into data inconsistencies all the time too.. but not exactly like this.
  4. Bergen288


    Did you try to filter out pre/post market data? I believe there is no pre and post market data in history data, but your broker may provide pre and post market data in live trading which might cause different outcomes of your trading indicators. At least that is a big reason for the differtial of my real time auto trading and back testing. Try to filter out pre market data and see how it works.

    My 2 cents.

  5. Kohanz


    Thanks for the tip, but I'm pretty sure that's not our issue - as I said, we use the ECN timestamp of each print/tick to identify at what time that trade was executed, so if it was pre/post market, it would not be included in our data.
  6. Bergen288


    Oh, another thing I can think of is different time definition. When I bought some 1 minute history data for back testing, they all started from 9:31am every day and there was nothing at 9:30am. I asked seller why and was told price bar at 9:31am is acturally started from 9:30:01am to 9:31:00am (I need double check his explanation). But in real time data, I think 1 minute price is from 9:30:00am to 9:30:59am. If that is the case, OCHL information could be totally different in fast moving price change. But I am not 100% sure about your data source. You should check with your history and real time data vendors to see if their time definition is the same or not.

    My 2 new cents.

  7. Bergen288


    I read your description again and it looks your symptoms can be explained perfectly by defferent time definitions.

  8. toe


    haven't used esignal for while but dont they live filter their data for outliers and such.

    also one prerequisite for having two identicle data streams is that both should source their data from the same exchange list, if the data includes ecns etc then thats probably not the case. an easy way to tell would be to count the daily volume on both data sources, if the two are repeatedly very different then its likely they have different sources of data (well you could ask the providers, but the data itself doesnt lie).

    my personal opinion on this is that its not worth the effort to make your data that precise, given the uncertainty of slippage and so on. for me a good backtest only gives me the confidence to start trading a system in a small way, then i get more confident if the slip between live trades and simulated trades is small. that slip would also include differences between data sources as well as normal slippage.

    also you could try collecting a one week sample of genesis data and compare a simulation of the same timeframe on each of the data sets. if the results are not very different then you know the issue is with slippage, if they are very different then the data is an issue for you.
  9. fatrat


    Let me tell you right now that Genesis will never give you every quote from the exchange correctly. If you take an ARCABook/ITCH feed straight from the exchange and run it in comparison to Genesis, it will not give you everything. It drops quotes and L2 book updates from time to time.

    In high speed situations, Genesis will also drop time and sales information. There's other issues with their infrastructure also. Have you ever noticed in LASER that some quote servers are slower than others? Well, in certain situations, some quote servers get bogged down and their data output isn't the same as other quote servers.

    Don't design your trading systems on Genesis data unless your system is designed to mathematically manage and transform the genesis/e-signal data into a general picture you can use.

    I had to learn this the hard way.
  10. I read this with great interest. I think the basic problem is that streaming data feeds are problematic because data can be lost one of 2 ways:
    1) volume of quotes/prints exceeds the server's ability to send them....either because of CPU load or TCP/IP traffic load.
    2) time-sampling of quotes/prints naturally drops some of the data (i.e. Interactive Brokers approach)

    What's really needed here is a BarInterval server. I believe Interactive has solved this problem with their latest API which has a reqRealTimeBars function. This function, once called, establishes a "subscription" whereby OHLC data with volume and tick count is sent every 5 seconds to a callback procedure. This is probably the best approach. However, as in all new Interactive API implementations, their first attempt is rough as the only interval that can be specified is 5 seconds. Optimally, their servers should derive 1 minute, 5 minute, 20 minutes bars for instance. Instead, one now must establish a one minute Timer in the trading app so that a new interval instance is built from the OHLC data collected with the 5 second calls using logic to determine the high, low, total volume, tick count, and VWAP. Unfortunately, you cannot rely upon 12 callbacks being they will only send a callback notification if there was a print for that 5 second interval. Hope springs eternal however, and hopefully future releases of Interactive's API will improve. Genesis naturally should take advantage of Interactive's concept here and do it "right" the first time. Naturally extending this approach to support tick intervals would go a long way in that regard.
    Comments please.
    #10     Feb 2, 2008