My next motherboard

Discussion in 'Hardware' started by nitro, Feb 21, 2004.

  1. nitro

    nitro

    nononsense,

    I have always had Infiniband and Myrinet (and Quadrics) in my periphery. If it weren't so expensive...I would use it in my production cluster.

    Thanks for the link.

    nitro
     
    #321     Feb 26, 2005
  2. nitro

    nitro

    Ok,

    I finally got to the point where I have something working in MySQL with a C++ client storing fake qoutes to a database and table. I am having a little bit of an issue with my table designs and I don't quite understand why I would do one or the other.

    For example, I can create a table that looks like this:

    1) Trade:double, TS: timestamp

    and now have the table name be the symbol name. That way I am always updating for a given symbol to it's own table. Downside is having a table for each instrument.

    Then there is this approach:

    2) Symbol: string, Trade:double, TS:timestamp

    This also embeds in each row of the table the symbol name. I am not sure what this buys me...

    Another thing is, even in case 2 I have two choices: I can either only send trades for one symbol to that table, or I can send all trades for all symbols to that table. Again, other than saving having many tables, I am not sure what each buys me (perhaps faster lookup and aggregation times..)

    My guess is the correct method is to add the symbol name to the table, and only send trades from one symbol to that table. That means I will have a table for every symbol I am interested in storing trade data for, with the name of the table being the symbol name, and the table will (redundantly) contain the symbol name as one of the columns. I think this helps if I ever want to do joins for data mining, but I am not sure...

    I am also looking into lighter and heavier database management systems, like dbm and ObjectStore.

    nitro
     
    #322     Feb 27, 2005
  3. nitro,

    I store all my collected tickdata in one table, one for each day. In order to retrieve the data by symbol in an efficient way, you should build an index on the symbol column. (The db will do this for you if you ask it to.) This slows you down a bit when generating the table, but you can write the data without the index to speed things up and generate the index later when you close the day. When you are further along, you will want to retrieve data for one symbol over several days, i.e. tables. This is no problem as it is very easy to have the program open table after table and retrieving the data for the symbol in each table. This can be set up with straightforward SQL (see further).

    One more thing nitro. You probably know this, but as you said you start with db's and you specifically refer to writing in C/C++: NEVER PROGRAM ANY DB OPERATION EXPLICITELY if you can do it by writing an SQL query. db's like MySQL, Oracle, PostgreSQL etc are all SQL based. You realize tremendous speedups if you learn how to accomplish things with SQL instead of using programmed db operations. This holds for any application language. Of course you program these SQL statements as query strings in your application.

    In fact, MySQL carries several SQL based maintenance/query tools for downloading on its website.

    Starting out is rather hard, because it requires you to do things the way the db requires. If you programmed ad hoc without db, it looked like you were less constrained. After a while you will discover the tremendous power of a good db. You have to give it the time to grow upon you. With me, it took more than a couple of days.

    Be good,
    nononsense
     
    #323     Feb 27, 2005
  4. nitro

    nitro

    nononsense,

    Thanks for the tips. I am going to experiment with different designs and see what comes of it.

    nitro
     
    #324     Mar 3, 2005
  5. cmaxb

    cmaxb

    nitro,

    the relational way:

    A table full of symbols and their id's.
    A table full of "ticks", e.g. price and timestamp. Each row is marked by the id of the corresponding symbol.

    Personally, I would append a "tick" to the end of a file. One file per symbol. For storing a lot of data, txt files are fine. For storing relationships, a database is needed.

    My two cents.
     
    #325     Mar 3, 2005
  6. nitro

    nitro

    cmaxb,

    Thanks for the tips. I will relate (no pun intended :D ) my experience as I dwelve deeper and gain experience with different implementations.

    One thing that seems to me to be problematical is that having one table mixing lots of ticks for different symbols is that when you need to read the time series for a given symbol in realtime, that will take quite a bit longer than if there was one table per symbol and the TS was "sequential." Maybe the sequentialness of it is an illusion anyway since the ticks will be scattered all over the disk anyway?

    nitro
     
    #326     Mar 3, 2005
  7. cmaxb

    cmaxb

    I would index the table according to symbol id, then timestamp. Makes insertions slower, but makes retrieval *much* faster. Also, I believe indexing affects how the data is stored to disk. Could be wrong, tho.
     
    #327     Mar 4, 2005
  8. nitro

    nitro

    Ok, thanks - I will benchmark it in this form.

    nitro
     
    #328     Mar 4, 2005
  9. nitro

    nitro

    MSFT announces 64bit windows release version:

    http://www.microsoft.com/windowsserver2003/64bit/x64/overview.mspx

    I downloaded a trial yesterday and I am starting the port of my software in earnest to the x64 platform. I will start on the DELL since it is my research machine, and assuming all goes well move to production on the Quad Opteron.

    I will try to post my experiences.

    nitro
     
    #329     Apr 28, 2005
  10. You can try cluster index (the rows will be physically stored in sequential order if the indexed columns are in sequential order such as 1, 2, 3, 4, ... ) on the symbol id and timestamp using a high fill factor such as 90% to optimize query performance while adding some hit on inserts.

    Storing a series per symbol per table will be slow compared to storing a series in an image column per row per symbol.

    The fastest way to retrieve a series is to store it (an array of market data) in a serilalized blob like image column.

     
    #330     May 2, 2005