time & sales, testing setup

Discussion in 'Trading Software' started by jaypaul, Feb 11, 2002.

  1. jaypaul

    jaypaul

    I mainly have a question about time & sales vendors, but would love to discuss testing methods and setup. Let me first explain what I’m doing.

    I have designed and back-tested a few automatic trading systems that require a good-sized chunk of time & sales data. Why use time & sales? I initially tried ordinary price and volume, but could not demonstrate acceptable risk-adjusted returns for my required position length, level of risk and statistical stability of trained weights. So, I purchased some time & sales data. In contrast, my time & sales systems meet my statistical requirements and generate low risk returns, for both long positions and short positions as well.

    All this testing has given me a few opinions, likely controversial. For one, back-testing set-up and methods seem infinitely more important than the actual quantitative trade determination. So far I’ve identified a few whole-market inefficiencies that seem stable over my entire 36 month testing period. But I don’t think I could have done that with a less robust or less data intensive experimental setup. I’ll develop a quantitative hypothesis and try vigorously to disprove it. What I can’t disprove immediately, I try to analyze and understand, attempting to identify the underlying market inefficiency, and then hopefully discard or refine the original hypothesis. I have a log of over 500 failed hypotheses versus about 7 that seem fairly stable, altogether representing about 100K lines of vectorized MATLAB code, 98% of it tested, scrapped and archived as examples of what doesn’t work.

    I force my systems to pick thousands of long-term trades spanning 3000 stocks and 36 months (8/98 through 7/01), with equity risk spread uniformly across time (index-fund style, no leverage or margin required) and any number of individual issues (diversification). Those tests eliminate a lot of hypotheses. Then I will examine multi dimensional statistics over independent trials, over rolling, out-of-set train/test periods, using survivorship-free stock sets, and considering various portfolio sizes, capitalization biases, sector biases, normalizations, and other factors either set by the user or determined by the system. The biggest challenge is trying to structure the networks to capture the inefficiency I want without employing too many degrees of freedom. Another challenge is figuring out how best to normalize for volatility both per-stock and over time so that no group of stocks or single-stock/whole-market price shocks can disproportionately influence trained weights. Individual stock effects and news-related events represent a highly unpredictable component that can destroy perfectly good systems. Ordinary price changes, especially the after-hours and pre-open components seem highly unpredictable as well.

    To make a long story short… in order to start paper-trading or real-funds trading of my systems, this is the minimum time & sales data I will need:

    3000 (top-liquidity) stocks, up to 1 month delayed (i.e. NYSE TAQ)
    500 stocks at a 50 stock/month turnover rate, delivered at most a few days delayed, preferably end-of-week. (i.e. Bloomberg, Esignal or Qfeed)
    After hours time & sales would be helpful.

    NYSE TAQ is perfect except for the delay. Qfeed, whenever it becomes available may not be reliable enough for the amount of data I need. That leaves Bloomberg and Esignal, which can get pricey if I want more than my minimum amount of data. Although I have plenty of private capital, I’m simply trying to avoid huge data expenses. I’ll probably just bite the bullet, buy the data. But does anyone have any better ideas, advice or know of other vendors?

    Thanks,
    Jay
     
  2. jay,

    it looks like we are doing the same thing. i cannot help you with a data propvider, but you can contact me via email if you are interested in sharing some interesting points.

    regards,
    sascha3@web.de
     
  3. jaan

    jaan

    well, we use TAL (www.taltrade.com) to retrieve T&S data for about 2000 nasdaq stocks daily. professional subscription costs about $200/month, non-pro is probably considerably less.

    - jaan