Formatting my data for backtests

Discussion in 'Data Sets and Feeds' started by mcrip, Sep 21, 2006.

  1. mcrip


    at the moment I am developing a software which stores data from my data-provider and runs backtests on the data.
    Now I have to face a choice, which data-format to take and especially how to generate it.
    I am getting tick-data from my data-provider (history and realtime). I am looking forward to develop a system which trades intraday, but not so frequently, that it is scalping.
    At the moment I am using OHLC-values based on occured trades. I think this is fine if the backtested symbol trades frequently, but what if the symbol is rarely traded?
    Because of this issue I think I now will form OHLC-values out of the occured ticks (changes in bid/ask and bid/ask-volume). But how should I form f.e. the open? Should I use the mid-point between bid and ask of the first tick occured in the timeframe as the open? And what about the high/low? Use the lowest bid as low and the highest ask as high?

    So I am asking you what to do. Is it really the best way to run the backtests with OHLC-values generate from tick-data? Or should I even backtest with the raw tickdata? But then, on which basis should I calculate my indicators? Compress f.e. 200ticks to one tick and use the mid-point?
    Questions over questions... :)

    Best regards and thank you.
  2. man


    first of all, if you don't want to scalp the details you mention won't matter to much. second i would choose to use traded prices only, going deeper down is an overkill if you want to trade less than 20 times a day.

    format is always an issue. i think text files are easiest, binary files are fastest. if everything has to be very fast you might want to only save changes and not entire numbers ... we don't do it so far, but it should save time and storage capacity.

    if you do not want to scalp and somehow seem to start backtesting why don't you go for tradestation, wealthlab, amibroker or things alike? you can get lost for decades in programming a better and better testing surface without ever coming close to having a tradeable system. this is a very vital threat! and i know what i am talking about here ... :)

    the best.
  3. fader


    indeed, you are going to a significant level of detail here - the right answers to your questions will depend on the requirements of your trading system, so in reality only you will be able to decide what level of detail and what data aggregation logic is appropriate.

    for example, you have a question on the opening price - well, how to calc. it depends on what you need it for - if you need to trade right at the open, you may decide to use the first print, bid / ask quotes, and market depth to estimate where you will be executed - or you may just take the opening print and estimate ballpark slippage value - how much will the differences matter for your system? only you know.

    if you use the opening price for an indicator calc., then you may decide it's more logical to take an average of prints over the first few seconds or minutes - the period will again depend on the requirements of your system.

    basically, the approach is not: how do i collect the data and how much? - it is more: what data do i need for my system? from all available data, what is the relevant subset that will accommodate the needs of my system - then you go and grab that data.

    there is not one answer to whether you need tick level or ohlc - your system should tell you what you need - or, if you still can't decide, get a sample of both and run your system off both subsets and you will see how much difference that produces, then make a decision.

    all the best.
  4. trady1



    From my many years experience in both computer architecture, software development and several years of very intensive day trading (for living) and stock research I would recommend to store both price (as REAL format, go with 3 digit beyond decomal point) and volume (INTEGER, I recommend on INT64 if you can store data this way). Other parameters like change, change in % and so on can be calculated on the fly, no need to store it.

    Buky (Trady1)

    TradySupport @ Trady1.Com
  5. mcrip


    I thought about purchasing tradestation etc. and got my hands on a tradestation 2000i. But it is quite limited in its functionality.
    I do not only want to use technical indicators which f.e. cross over and generate a buy-signal but I want to build a neural network out of a combination of technical data (simple indicators which represent the technical side of the market) and some fundamentals (f.e. difference between interest-rates in two countries). I think investox is able to do that, but the "klicking-together-a-neural-networl"-approach is too intransparent for me. So I have to build the software by myself. And on the other hand it is a very interesting task to do :)

    I know, that finally I have to decide myself on which data to backtest, but I think here are some experienced system-builders around which faced the same problem and could give me some hints. f.e. how the build a ohlc-tick-time-series out of a tick-bid/ask-time-series :)
    I agree that it is overkill to backtest EVERY tick but I am thinking about only transmitting a tick if the bid/ask has changed and to ignore the bid/ask-size which changes much more frequently. And only if I enter/close a trade to look at the bid/ask-volume to make slippage-calculations and to estimate if there is a chance that the order is filled.
  6. trady1


    Take a look on my article "Automatci Day Trading - Is There any thing like that ?", it should be in this board or around, I've built a fully multi computing system that tracks symbols 10 times per minute... I guess you'll find further info over there and in the yest to come articles.

    Buky Carmeli

    TradySupport @ Trady1.Com