SQL db design recommendation

chs245 · Nov 14, 2007

Hi
I'm planning to host my own MSSQL stock data db. I want to store 1min OHLCV for 2000 tickers every day (ca 7.5hours x 60min x 2000 tickers = 900k rows per day)
I want to use the database to store the data locally and to easily be able to retrieve it into other applicaions.
What I'm not sure of is how to design the database: store the data in tables by ticker ? by date ?
Can anyone offer advice as to this ?

Best regards,
Oliver

gwb-trading · Nov 14, 2007

I would store the information by ticker symbol, and also include a field for a date/time stamp as well as the OHLCV information.

This will allow you to have a continuous database for a symbol across multiple days, and seach on time/date across an individual symbol or multiple symbols for correlation.

- Greg

Quote from chs245:

Hi
I'm planning to host my own MSSQL stock data db. I want to store 1min OHLCV for 2000 tickers every day (ca 7.5hours x 60min x 2000 tickers = 900k rows per day)
I want to use the database to store the data locally and to easily be able to retrieve it into other applicaions.
What I'm not sure of is how to design the database: store the data in tables by ticker ? by date ?
Can anyone offer advice as to this ?

Best regards,
Oliver
More...

andread · Nov 14, 2007

Quote from chs245:

Hi
I'm planning to host my own MSSQL stock data db. I want to store 1min OHLCV for 2000 tickers every day (ca 7.5hours x 60min x 2000 tickers = 900k rows per day)
I want to use the database to store the data locally and to easily be able to retrieve it into other applicaions.
What I'm not sure of is how to design the database: store the data in tables by ticker ? by date ?
Can anyone offer advice as to this ?

Best regards,
Oliver
More...

One table with tick id (a sequence), a symbol id (and a separate table for symbols), timestamp, open, close, high, low.

Pro_Trader720 · Nov 14, 2007

Quote from andread:

One table with tick id (a sequence), a symbol id (and a separate table for symbols), timestamp, open, close, high, low.
More...

agreed. You may store volume as well. Tradestation tracks upticks and downticks.

ramora · Nov 14, 2007

Quote from andread:

One table with tick id (a sequence), a symbol id (and a separate table for symbols), timestamp, open, close, high, low.
More...

Would the database be faster if each symbol had its own table?

Can an SQL DB do 900K inserts during the day without any load problem?

Simple questions, but thank you for the response...

fader · Nov 14, 2007

i am not an expert by any means, but i have looked into the subject - there is a lot of info here if you search the threads, as well as on some other forums, concerning optimal db architecture for financial series data - i ended up having a separate table for each security - this made data retreival a lot faster than having one huge table with combined data for all securities - also i used MS Access, not sql - in the end after looking at options, i just ended up running time tests on the various alternatives and it became pretty clear which was the fastest with my setup.

Vorpal · Nov 14, 2007

I would agree with a security table and a market data table for proper design.

"security" table
market_id autonumber int [primary key]
symbol varchar(5)
name varchar(50)

(IE: 1, 'AA', 'Alcoa Inc')

(possibly add fields for CUSIP or "delisted", because stocks can share the same symbol over time or you may only want to query active stocks).

"market_data" table
market_id int [primary key]
timestamp datetime [primary key]
openprice decimal(7,2)
highprice decimal(7,2)
lowprice decimal(7,2)
closeprice decimal(7,2)
volume int

(IE: 1, '11/14/2007 03:01:00 PM', 38, 39, 37, 37.5, 1000)

I chose decimal(7,2) for the price data so you could allow 5 digits to the left of the decimal place. You can't store large values like BRK/A in there, but you keep your field size to 5 bytes instead of 8 bytes. This can add up...

Performance and disk space will be your biggest issue with that much data. Somebody check my math, but based on the above design, you are looking at generating 36 bytes per record. That doesn't sound like much, but 36bytes x 900k rows per day = 31mb per day (1024 bytes per KB and 1024 KB per MB). And that doesn't count the size of the index on the primary key (which you need if you want to be able to query the data with any speed). It won't take long for this database to get huge if you plan on archiving the data indefinitely. The free versions of MS SQL only permit a max database size of a few GB, so you may run through that pretty fast unless you want to go with a commercial version that doesn't have database size restriction. Obviously if you settle for hourly, etc. bars, you will use far less space.

You >could< set up one database per symbol to get around size/performance, but this would be manual labor-intensive to manage. Commercial versions of SQL do this clustering for you in the background. There are some other databases out there that emphasize performance that you can consider... or free ones like mysql. But MS SQL is nice, and I use it for my stuff (daily bars only, so my DB is not that big). But MS SQL is a database is primarily built for transaction processing and not raw speed.

Businessman · Nov 14, 2007

Quote from ramora:

Would the database be faster if each symbol had its own table?

Can an SQL DB do 900K inserts during the day without any load problem?

Simple questions, but thank you for the response...
More...

If the OP used a one table design then we would need to do just one bulk insert statement every minute.

chs245 · Nov 15, 2007

Hi Vorpal

why would you choose your design over a one ticker per table design ? What advantage is there to hold the tickers in one table and market data in the other ?

Thanks for all the suggestions so far,

Oliver

Quote from Vorpal:

I would agree with a security table and a market data table for proper design.

"security" table
market_id autonumber int [primary key]
symbol varchar(5)
name varchar(50)

(IE: 1, 'AA', 'Alcoa Inc')

(possibly add fields for CUSIP or "delisted", because stocks can share the same symbol over time or you may only want to query active stocks).

"market_data" table
market_id int [primary key]
timestamp datetime [primary key]
openprice decimal(7,2)
highprice decimal(7,2)
lowprice decimal(7,2)
closeprice decimal(7,2)
volume int

(IE: 1, '11/14/2007 03:01:00 PM', 38, 39, 37, 37.5, 1000)

I chose decimal(7,2) for the price data so you could allow 5 digits to the left of the decimal place. You can't store large values like BRK/A in there, but you keep your field size to 5 bytes instead of 8 bytes. This can add up...

Performance and disk space will be your biggest issue with that much data. Somebody check my math, but based on the above design, you are looking at generating 36 bytes per record. That doesn't sound like much, but 36bytes x 900k rows per day = 31mb per day (1024 bytes per KB and 1024 KB per MB). And that doesn't count the size of the index on the primary key (which you need if you want to be able to query the data with any speed). It won't take long for this database to get huge if you plan on archiving the data indefinitely. The free versions of MS SQL only permit a max database size of a few GB, so you may run through that pretty fast unless you want to go with a commercial version that doesn't have database size restriction. Obviously if you settle for hourly, etc. bars, you will use far less space.

You >could< set up one database per symbol to get around size/performance, but this would be manual labor-intensive to manage. Commercial versions of SQL do this clustering for you in the background. There are some other databases out there that emphasize performance that you can consider... or free ones like mysql. But MS SQL is nice, and I use it for my stuff (daily bars only, so my DB is not that big). But MS SQL is a database is primarily built for transaction processing and not raw speed.
More...

trendmomentum · Nov 15, 2007

Oliver,

Any database design must take into consideration the purpose of the data and the intended use. There will always be trade-offs between the ease of use and the performance factors.

Obviously, the smaller the size of the rows (and table) the faster the access will be.

Relational database designs such as MSSQL consist of one or more tables.

Data is normally split into two types: static and dynamic. You will normally want to separate these two. Combining these two types of data into one table will make it easier for you to access the data with a simple query, but it will increase the size of the rows unnecessarily and it will make the data maintenance more cumbersome at least performance wise.

Hence, vorpal suggested the design above. The security table stores the static data and the market data only uses the data that changes on a minute basis, enabling you to keep the rows at the minimum possible size, without duplicating unnecessary data per row and making the inserts faster.

Having dealt with issues like this before, relational databases are not the best option for these kind of data, but if you must, I would use one table per symbol to hold only the dynamic data.

The aggregated queries can then be done outside the sql. Some more sophisticated databases such as Sybase and Oracle will let you create a single view of all the data.

Hope it helps.

Jose