SQL db design recommendation

Discussion in 'Strategy Building' started by chs245, Nov 14, 2007.

  1. rosy2

    rosy2

    have a security table with fields [id,ticker,...]. You need this for corporate actions.

    have tickdata table either weekly or monthly like tick200711 for november ticks

    if you put all your ticks into one table it will be unusable rather quickly.

    you might want a source table too. like [id,source] ie. 1, reuters
     
    #11     Nov 16, 2007
  2. mr19

    mr19

    What about storing in flat files? I have used the zlib library to store compressed tick data on the fly and it works well without consuming much space. Just thinking about the costs of indexing all that data sounds like a pita to me. If you used fixed column sizes you can parse the data quickly when loading.

    I do keep my daily ohlc data in a relational db (postgres), which works very well.
     
    #12     Nov 16, 2007
  3. Custom data structures...
    Are always much faster than any off-the-shelf database.
     
    #13     Nov 17, 2007
  4. GTG

    GTG

    I originally went with storing data in a MySQL database myself. I basically stored all the ticks in one table, with the records indexed to a security table. (I'm simplifying things a bit.) It eventually became extremely slow to do my bulk updates, probably because of all of the work the database had to do updating the indices. If, I were to do it again, I'd use the one table per security approach, or perhaps even separated out like rosy2 suggested, into separate tables per month. I think using a database like MSSQL or MySQL though is not worth the hassle, and the loss of speed. Simply storing my tick data in binary files, has proven to be a much better solution for me.
     
    #14     Nov 17, 2007
  5. MGB

    MGB

    SQL Server can do aggregated queries. SQL Server can do a single view of multiple tables from multiple databases (and multiple servers using linked servers).
     
    #15     Nov 17, 2007
  6. andread

    andread

    You know, with all these posts of people concerned about the size of the table I'm starting to have some concerns too, and I think I'll ask someone more knowledgeable than me. But I think that with a decent database (like Oracle) a table of a few gigs shouldn't be a problem. If it is, I'm not sure what I would do. The idea of one table with the data is the only one that makes sense to me
     
    #16     Nov 20, 2007
  7. I use MySQL for the time series. One table per period per instrument ie 1min bars in a different table from daily bars, timestamped with TIMESTAMP.

    For other data associated with instruments I use an XML document. (One document for all instruments).

    The document has instrument nodes, each node with attributes for such things as company name, time zone the instrument trades in, currency, CUISP etc, and intend to keep fundamental data here as well. This type of data is fairly static, so concurrent access is not much of an issue as it is with time series.

    There are a couple of advantages to using XML. You can go in and fix it with a text editor or text processing tools (eg ed, grep, awk etc etc) if things get screwed up. You can also change the schema much more easily to add in new stuff - it is much more flexible than the relational model. Processing time is not an issue with the current document I have containing about 12K instruments.

    No doubt rolling your own could performa a lot better than any RDB, but to do it well is a non-trivial exercise, particularly if you need to support real time concurrent access.
     
    #17     Nov 21, 2007
  8. andread

    andread

    okay, I asked. The guy suggested to use partitioning (never heard of it). He didn't spend much time talking about it, but he said that partitioning is basically splitting a table in something like multiple tables (for example you can partition by symbol), but still keeping everything as a single table. He also mentioned partitioning on a Sun workstation on multiple drives, but I guess he was going too far :D. Apparently partitioning is the way to go for big tables.
    I looked it up a bit, and it turns out that partitioning has been implemented first by Oracle (by far this guy's strong point), and later by SQL Server, DB2, and even MySQL.
    I honestly don't know that much, but on the internet it seems that partioning has a lot of consideration. Besides, if this guy says it I believe him. He has something like 15 years experience contracting with Oracle. Before that, he worked at Oracle.
    Now you can see why for big applications the DB implementation is left to DB professionals
     
    #18     Nov 21, 2007
  9. mr19

    mr19

    That is one of the paths I went down with MySQL. While it IS better than stuffing everything in one table it will still be much slower than working with compressed flat files and require a lot more maintenance.

    While I'm sure I'm not the Oracle expert your friend is, I have worked quite extensively with the product both as admin & developer. IMHO Oracle is no silver bullet for this one and will most likely provider no better results (maybe even worse) than MySQL for something like this (I am currently using Postgres but do not store large data sets such as tick data).
     
    #19     Nov 21, 2007
  10. andread

    andread

    No, Oracle is no silver bullet, it's just a good database. And I honestly find it difficult to believe that MySQL can be better, even if it's my favorite database. Maybe DB2 can be better. I was mainly focused on Oracle because I have someone working with me who's very skilled with it, but the discussion can be quite generic.
    You might be right, but the question in this case is rather what you are doing with the data. If you just do a linear search then a flat file is great, but if you try for example to do some reporting you should consider a database. Besides, as far as I understand from this guy there might be a better solution even when using a database, depending on how you use the data.
     
    #20     Nov 21, 2007