SQL db design recommendation

rosy2 · Nov 16, 2007

have a security table with fields [id,ticker,...]. You need this for corporate actions.

have tickdata table either weekly or monthly like tick200711 for november ticks

if you put all your ticks into one table it will be unusable rather quickly.

you might want a source table too. like [id,source] ie. 1, reuters

mr19 · Nov 16, 2007

What about storing in flat files? I have used the zlib library to store compressed tick data on the fly and it works well without consuming much space. Just thinking about the costs of indexing all that data sounds like a pita to me. If you used fixed column sizes you can parse the data quickly when loading.

I do keep my daily ohlc data in a relational db (postgres), which works very well.

HoundDogOne · Nov 17, 2007

Quote from mr19:

What about storing in flat files? I have used the zlib library to store compressed tick data on the fly and it works well without consuming much space. Just thinking about the costs of indexing all that data sounds like a pita to me. If you used fixed column sizes you can parse the data quickly when loading.

I do keep my daily ohlc data in a relational db (postgres), which works very well.
More...

Custom data structures...
Are always much faster than any off-the-shelf database.

GTG · Nov 17, 2007

I originally went with storing data in a MySQL database myself. I basically stored all the ticks in one table, with the records indexed to a security table. (I'm simplifying things a bit.) It eventually became extremely slow to do my bulk updates, probably because of all of the work the database had to do updating the indices. If, I were to do it again, I'd use the one table per security approach, or perhaps even separated out like rosy2 suggested, into separate tables per month. I think using a database like MSSQL or MySQL though is not worth the hassle, and the loss of speed. Simply storing my tick data in binary files, has proven to be a much better solution for me.

MGB · Nov 17, 2007

The aggregated queries can then be done outside the sql. Some more sophisticated databases such as Sybase and Oracle will let you create a single view of all the data.
More...

SQL Server can do aggregated queries. SQL Server can do a single view of multiple tables from multiple databases (and multiple servers using linked servers).

andread · Nov 20, 2007

Quote from GTG:

I originally went with storing data in a MySQL database myself. I basically stored all the ticks in one table, with the records indexed to a security table. (I'm simplifying things a bit.) It eventually became extremely slow to do my bulk updates, probably because of all of the work the database had to do updating the indices. If, I were to do it again, I'd use the one table per security approach, or perhaps even separated out like rosy2 suggested, into separate tables per month. I think using a database like MSSQL or MySQL though is not worth the hassle, and the loss of speed. Simply storing my tick data in binary files, has proven to be a much better solution for me.
More...

You know, with all these posts of people concerned about the size of the table I'm starting to have some concerns too, and I think I'll ask someone more knowledgeable than me. But I think that with a decent database (like Oracle) a table of a few gigs shouldn't be a problem. If it is, I'm not sure what I would do. The idea of one table with the data is the only one that makes sense to me

dcraig · Nov 21, 2007

I use MySQL for the time series. One table per period per instrument ie 1min bars in a different table from daily bars, timestamped with TIMESTAMP.

For other data associated with instruments I use an XML document. (One document for all instruments).

The document has instrument nodes, each node with attributes for such things as company name, time zone the instrument trades in, currency, CUISP etc, and intend to keep fundamental data here as well. This type of data is fairly static, so concurrent access is not much of an issue as it is with time series.

There are a couple of advantages to using XML. You can go in and fix it with a text editor or text processing tools (eg ed, grep, awk etc etc) if things get screwed up. You can also change the schema much more easily to add in new stuff - it is much more flexible than the relational model. Processing time is not an issue with the current document I have containing about 12K instruments.

No doubt rolling your own could performa a lot better than any RDB, but to do it well is a non-trivial exercise, particularly if you need to support real time concurrent access.

andread · Nov 21, 2007

Quote from andread:

You know, with all these posts of people concerned about the size of the table I'm starting to have some concerns too, and I think I'll ask someone more knowledgeable than me. But I think that with a decent database (like Oracle) a table of a few gigs shouldn't be a problem. If it is, I'm not sure what I would do. The idea of one table with the data is the only one that makes sense to me
More...

okay, I asked. The guy suggested to use partitioning (never heard of it). He didn't spend much time talking about it, but he said that partitioning is basically splitting a table in something like multiple tables (for example you can partition by symbol), but still keeping everything as a single table. He also mentioned partitioning on a Sun workstation on multiple drives, but I guess he was going too far . Apparently partitioning is the way to go for big tables.
I looked it up a bit, and it turns out that partitioning has been implemented first by Oracle (by far this guy's strong point), and later by SQL Server, DB2, and even MySQL.
I honestly don't know that much, but on the internet it seems that partioning has a lot of consideration. Besides, if this guy says it I believe him. He has something like 15 years experience contracting with Oracle. Before that, he worked at Oracle.
Now you can see why for big applications the DB implementation is left to DB professionals

mr19 · Nov 21, 2007

That is one of the paths I went down with MySQL. While it IS better than stuffing everything in one table it will still be much slower than working with compressed flat files and require a lot more maintenance.

While I'm sure I'm not the Oracle expert your friend is, I have worked quite extensively with the product both as admin & developer. IMHO Oracle is no silver bullet for this one and will most likely provider no better results (maybe even worse) than MySQL for something like this (I am currently using Postgres but do not store large data sets such as tick data).

andread · Nov 21, 2007

Quote from mr19:

That is one of the paths I went down with MySQL. While it IS better than stuffing everything in one table it will still be much slower than working with compressed flat files and require a lot more maintenance.

While I'm sure I'm not the Oracle expert your friend is, I have worked quite extensively with the product both as admin & developer. IMHO Oracle is no silver bullet for this one and will most likely provider no better results (maybe even worse) than MySQL for something like this (I am currently using Postgres but do not store large data sets such as tick data).
More...

No, Oracle is no silver bullet, it's just a good database. And I honestly find it difficult to believe that MySQL can be better, even if it's my favorite database. Maybe DB2 can be better. I was mainly focused on Oracle because I have someone working with me who's very skilled with it, but the discussion can be quite generic.
You might be right, but the question in this case is rather what you are doing with the data. If you just do a linear search then a flat file is great, but if you try for example to do some reporting you should consider a database. Besides, as far as I understand from this guy there might be a better solution even when using a database, depending on how you use the data.