Searching historical data in Sql

Discussion in 'Data Sets and Feeds' started by morphtrade, Feb 15, 2008.

  1. That would be Celko. With a "c."
     
    #11     Feb 17, 2008
  2. I may look at the option of saving historical data in binary files but will it not take long time to load a big binary into memory and searching through them unless there is a tool to stream binary data (without loading a whole file) and query them. I would definitely like to explore that area if someone points me to right tools or ideas.
    Table structure for one min data:
    [Date] [Time] [Open] [High], [Low], [Close], [Adjusted_Close], [MA], [DI].....

    Tick Data:
    [Date] [Time] [Trade]
    I have clustered index on [Time] and [Date] columns as they are primary key.
    Most time consuming queries are with lots of inner joins. So for example if I have to compare first few mins data then I have to do inner join like:

    With IntervalData AS
    (
    SELECT [Date], Sum(CASE WHEN 1430 = [Time] THEN [PriceRange] END) AS '1430',
    Sum(CASE WHEN 1431 = [Time] THEN [PriceRange] END) AS '1431',
    Sum(CASE WHEN 1432 = [Time] THEN [PriceRange] END) AS '1432'
    FROM [INDU_1] GROUP BY [Date]
    )
    --Select * from IntervalData order by [Date] DESC;
    SELECT [Date] ,[1430], [1431], [1432], [1431] - [1430] As 'Range' from IntervalData
    WHERE ([1430] > 0 AND [1431] < 0 AND [1432] < 0) OR ([1430] < 0 AND [1431] > 0 AND [1430] > 0)
    ------------------------------------------------------------------------

    select ind1.[Time], ind1.PriceRange,ind2.[Time], ind2.PriceRange from INDU_1 ind1
    INNER JOIN INDU_1 ind2 ON ind1.[Time] = ind2.[Time] - 1 AND ind1.[Date] = ind2.[Date]
    where (ind1.[Time] = 2058) AND ((ind1.PriceRange > 0 AND ind2.PriceRange >0) OR (ind2.PriceRange < 0 AND ind1.PriceRange < 0))
    ORDER BY ind1.[Date] DESC;

    This query might end up with quite a few inner joins if you have to compare several mins of data and will severely effect execution time.

    And for that reason, I might have to start looking at time series database. Any suggestion on which one I should start with?
     
    #12     Feb 17, 2008
  3. Here's what I would argue is the best. It's used by very many in scientists (ie: remote sensing satellite data) as well as the engineering and aerospace industries. Not to mention me...

    HDF5
     
    #13     Feb 17, 2008
  4. andread

    andread

    doesn't really say that much, does it? What's this technology they talk about?
     
    #14     Feb 17, 2008
  5. rosy2

    rosy2

    i use the python interface to this. its fast enough for me and i can slice and dice as needed. for a commercial product look at

    http://kx.com/
     
    #15     Feb 17, 2008
  6. Search around the site a little bit. Actually its a relatively new web-site and I didn't give you the home page. Here it is: HDF Group. Oh and of course, if I use it, it's Open-Source as well... :)

    ______________________________
    Support OpenSource, OpenAccess, and OpenStandards.

    -kt
     
    #16     Feb 17, 2008