Database Issue

Discussion in 'Data Sets and Feeds' started by sfunds, Apr 3, 2004.

  1. sfunds

    sfunds

    Hi,

    I am having Ms Sql Server.

    I have one Tables for ScriptDetails
    (columns Symbol,company_name,industry)

    For each company I have a separate Table with Columns
    (open,high,low,close,volume,date)

    each table has data from 1995.

    How you are storing the data and what is the best way to store.

    Thanks
     
  2. certainly storing each company in a separate table goes against common wisdom (unless you have a small fix set of companies you are interested in ...)
     
  3. For all time series data, you might use one table and each row stores a symbol that serializes the whole time series. This is mainly for real-time read-only purpose. If the interval is intraday, you might want to store serialized time series, say 5 min, and you can roll up to higher timeframe in memory, or you can choose to store the higher timeframe in separate series, depending on your usage pattern.

    It's still possible to update your serialized time series though less efficient than more normalized data structure.

    Here's an example to store market data in chunks such as weeks, months or years and different time frames.

    MarketDataTable
    Symbol varchar(50)
    Data image (serialized market data simliar to BLOB)
    Timeframe long (in minutes)
    DateFrom datetime
    DateTo datetime
     
  4. sfunds

    sfunds

    Hi
    Total Number of companies are around 1500

    Previously I had one table for all scripts but as i added for more years the queries took long time

    Anyhow how commercial software store the data in Database.

    Thanks
     
  5. sfunds

    sfunds

    Hi
    Trader.Net

    I have only EOD data
    Can you give some indepth way to do according to your Suggestion And how will I insert the data at the end of EveryDay.

    I download the data available in CSV format and update the database at 5 Pm by script

    Thanks
     
  6. Unless you have need to be able to apply relational algebra to the data, a simple and very fast approach is to simply store the data in a series of dynamically extending linked lists using some fairly rudimentary block oriented buffer management and linked list code.

    It's a commonly used general architecture for storing data that requires near realtime access (have used it dozens of times for everything from realtime message switching systems to sub-second response transaction systems to low latency data processing systems to applications needing dynamically retrieved/updated object archives).

    For your specific use, it allows you to store an effectively unlimited amount of price data for each symbol with the absolute minimum # of disk accesses needed to load any given dataset on the fly without the additional overhead that a traditional SQL database would incur. It can be tweaked for specific performance requirements by using variable cluster size, memory caching, hashing, etc.

    If near realtime/low latency data access isn't an issue (and since you're unlikely to have the kind of high volume/high speed requirements of the systems I noted above), you could probably just store the price data in an ordinary SQL table, i.e., symbol (or alternatively a binary symbol ID - referenced from the symbol table), date, time (for intraday data only), open, high, low, close, volume, etc. and then use a normal SQL Select statement to retrieve and process any given symbol's price dataset (primary key would be symbol (or binary ID) plus date (plus time if it's intraday data). Definitely no need (or technical desire) to have to use a separate table per symbol.

    As with any application, exactly what you intend to do with the data, how often you'll access it, and what your performance requirements are will be what dictates the target architecture - there's no generalized, one size fits all solution that solves every potential application.
     
  7. simsim

    simsim

    in your Scenario u need to keep two tables,

    1 Scrpt details (u have)
    2. End of day data. ( all the symbols go in this table)

    This way you will have only one script to load the data. It is far more easier.

    Now regarding the performance issue. keep in mind that SQL server is meant to store milliosn of records. all u need is to create index's and views.,. to take care the performance issue.