SQL db design recommendation

Discussion in 'Strategy Building' started by chs245, Nov 14, 2007.

  1. mr19

    mr19

    I agree that the way you use the data will dictate whether a flat file is even an option. Oracle is an excellent db, not really bashing as much as looking at it as too big of a hammer for this task (from my experience).
     
    #21     Nov 21, 2007
  2. Sql Server 2000 (and 2005) can handle DBs greater than 60 GB with no problems provided you have decent hardware. We have several DBs of this size and even 5 year old servers are able to access the data quickly (the old servers are what we use for development/testing).

    If you do go with SQL server and you are using the design posted above make sure you set up your clustered index correctly to avoid a page split. If I remember correctly this occurs when you try to insert data between two existing rows and the binary tree has no more space so it needs to move a lot of data around to reorganize your pages. This takes a LOOOOOOONG time. You're better off putting the clustered index on something like ONLY the timestamp (because you know you won't be inserted in between existing values, only new value), and then just put a regular index on anything else you need.

    SSB
     
    #22     Nov 21, 2007
  3. Bickz

    Bickz

    just a few thoughts of my own...

    just fyi... i personally like mySQL for various reasons, and would not hesitate using it for this application as well.

    it really comes down to your needs, and how you set up your system. that being, do you need access to nTimePeriod data from ALL of your securities at once? or are you going to handle each security individually? (are you going to gather your data from all of your securities and do one big query to insert/update your records? or are you going to gather your data on each security and update each one individually?)

    if you have a single large table with 2000 securities and 5 columns per security, you're looking at 10000 entries per row, per query. that's A LOT of data to sift through and return if you're only looking to query/update 1 (or just a few) securities at a time -- even if you do proper indexing. in my mind, if you're not working with most (if not all) of your securities with each and every query, then there just isn't any need to go that route. you would be expending many more CPU clicks than you would really need to.

    conversely, a previous poster mentioned using a database per security, but also mentioned the tedium in creating a database per security -- something i tend to agree with. but this would get you greater speed and better stability. plus, if your database became corrupt, you wouldn't lose EVERYTHING -- just the data for the one security.

    a happy medium -- use an individual table for each security within one database. i can't comment on some of the other databases out there, but mySQL uses a separate file for each table within a database, making a query to that table within the database faster for accessing data on a per-security/per-table basis. and the best news is, while you may not be able to create databases on the fly (you could with some DBs, actually =P) YOU CAN create (and drop) tables (at least on mySQL) on the fly with ease. so it would be a simple matter to write a script to create a table for each security within your database. personally, this is the way i would go.

    also, if you do not absolutely positively need access to ALL of your records ALL of the time within ONE database, then consider using an "offloading" database, which will hold the bulk of the older data, while your main database only holds what you need. this will increase the speed of your database queries tremendously. all you have to do is determine the amount of data you want to keep on your main working database, then have a routine that runs manually or automatically to copy the older data to your storage database. you'll still have access to the older data, and in mass, while still having better performance for your newest data. and i'm sure, if necessary, a relatively intelligent programmer could make a way to query both databases seamlessly when backtesting (or whatever) is needed.

    and one final note... i typically use PHP5 and mySQL for my needs. but i've noticed a small quirk with using "open, high, low and close" as column names. sometimes things will work quite the way you expect. so i would suggest using different column names to avoid this -- such as "openprice, highprice, lowprice and closeprice"

    i hope this helps!
     
    #23     Nov 23, 2007