project: Intraday Historical Data mining ( storing in files)

Discussion in 'Data Sets and Feeds' started by InvestVision, May 1, 2011.

  1. <b>project: Intraday Historical Data mining ( storing in files) </b>

    Hi I am starting a project to store "historical Intraday data " into files so that different Charting program can use this data
    to do technical analysis/Back testing .

    I did fair amount of research in this area for some time came up with this plan.

    What I am looking for:
    ----------------------------
    - since this data collection is tedious process, I am looking for like minded people Colloboration , from people who are
    in similar need of historical intraday data.
    - we can share the cost and work to get this done right , of course share the results.
    - this is not just one time quick thing, I am looking for people who are commited for looking for LONG term CLEAN historical data.
    - my background: computer engineer, so I can take part in programming part of it.

    - also looking from people who are ALREADY in the process of 'historical data' mining job, I can SHARE the COSTS and we can take it further if any improvements needed.

    From everybody:
    ---------------------
    - your input/suggestions/experiences are welcome, as some of you have spent lots of time on this subject.
    - I gave some possible resources , please suggest if you see any better alternatives and <b> critique any blunders in my approach </b>
    - Thanks in advance for all your suggestions/inputs .........

    In my view, Historical data mining consists of 2 step process.

    part 1: first getting a snapsnot of intraday historical data ( say last 10 years )

    part 2: having an automated program to run everyday to add "new Intraday data" as it comes to 'maintain intraday data SET' UPTO DATE current.

    Part 2:
    ---------
    I will handle Part 2 first, In my opinion 'part 2' is easy since we (traders) all have access to real time FREE data from our brokers , by having an automated API program running at the end of the DAY will add 'this days' data to existing historical data .

    for Interactive Brokers customers here are links for API for mininig 'intraday data '
    http://www.interactivebrokers.com/php/apiUsersGuide/apiguide.htm
    http://www.interactivebrokers.com/en/p.php?f=api&ib_entity=llc


    part 1:
    --------
    for getting initial snapshot of intraday data say for last 10 Years , I identified few sources.

    source 1: / Esignal desktop API: http://www.esignal.com/esignal/features/activex/
    - seems this will cost for esignal subscription $150/month + $100 additional
    - it seems having one can have one month subscription and store 'intraday data' into files and CANCEL the subscription.
    - assuming with Just 1 month subscriptioin , we can get historical 10 years intrday ( am I getting GREEDY here, any catch here , I need to verify :p)

    source 2: I see this as cheap and acceptable quality data provider for $100 (one time ) giving at least 10 years intraday for
    - forex, Futures (Continuous contracts) and the major indices historical data At this URL
    - Http://disktrading.is99.com/disktrading/#order1
    - this DATA quality may not be as good as Esignal

    some more details ..
    ------------------------
    Markets: crude oil CL, S&P 500 _ES, Currentcies EURO and dollar index USD , etc.. <b>( once we have program we can get DATA for any STOCK, future Contract) </b>
    Time frame: 1 minute , 5 minutes ( tick data may be overkill )
    ( I think once you have 1,5 min, charts can construct any 15 , 30, 60 minute etc..)

    Format: ASCII ( not tied to platforms ) , planning for FORMAT conversion progarm so that MANY charting programs can use this.
    Contracts: monthly contract symbol data and continuous contract data

    <b>Thanks in advance for all your suggestions/inputs ......... </b>
     
  2. Al_Ano1

    Al_Ano1

    Intraday tick stuff is tricky. It produces a lot of noise of peaks in prices and volumes, including typos. You have to filter it anyway. It is difficult to filter properly and to make sense of it.
    Commercial data suppliers filter the data ready for you. But it is expensive stuff.

    Ten years history can be useless because market dynamics has changed mostly due to high frequency trading. One year may be enough.

    Instead of tick data we can take a more statistical approach. Minute by minute daily data from your broker enables to cover vast amount of stocks and options, and the quantity of data is still manageable.

    I have collected some daily data both during trading session and using historical data query from Interactive Brokers with mixed success. From users point of view their historical data system architecture is not commercial quality. It is full of unexpected strange behavores. F.ex. throttling without warnings is not acceptable at all, suddenly queries stop working, timestamps are mistaken and it is dangerous to use the data without checking timing, simple query of ticker of 360 rows may take 5 minutes and result is empty datasheets etc. It is random success system. But for statistics it is enough when when it happens to work.

    Very sensitive system creates too many triggers and they wipe you out too early. Your success is more probable if you have accurate statistical measures to trigger trades in favorable conditions.

    Opions are priced on daily close-close basis. Therefore we can use daily data and pring model to begin with. Then we need minute-minute price of underlying. For backtesting we should have some months data collected. For time being I share your problem.