best way to hack up an rdbms to approximate something like kdb for poor/dumb people

Discussion in 'Automated Trading' started by garchbrooks, Mar 23, 2010.


  1. You don't have tick data. You have 1 minute data of 3000 stocks. Tick data is different than 1 minute data.

    Take each of the 3000 symbols and create one database per symbol or table per symbol. Call each database/table something like CSCO1Min, DELL1Min, etc.

    Then if you want to create 15 minute databases you can create CSCO15Min, DELL15Min, ,etc.

    Divide and conquer the problem space into smaller components istead of dealing with the whole crapload of data.
     
    #31     Mar 24, 2010
  2. Will do. Thanks Futs and Darth.

    Your help has been invaluable.
     
    #32     Mar 24, 2010
  3. Or course. setup
    A_m table (mindata of A)
    B_m table (mindata of B)
    getTuple("A", "B", from, to) <- stored proc

    getTuple can run the query and return the resultset. It will scan two tables only as opposed to many hundreds of millions of rows you have.

    on the minute data table, I have NO keys. they are MyISAM tables, and I ensure they are ordered. This is subject to experimentation and change
     
    #33     Mar 24, 2010
  4. Sometimes it isn't about dazzling everyone with hardware bullshit and going hardcore like some people do.

    Its about really thinking long and hard about the problem at hand and THEN spending money if you need it on the hardware to speed up your solution.

    You may just find that building a quote database and running queries on it shows you very little in your vested time or is good enough. No use making it worse by spending thousands of dollars on hardware just because you were so excited about the project.

    Just use Ubuntu 64 and MySql Cluster 7. Ubuntu 64 bit to address large physical memory and Cluster so that you can keep the tables in-core memory and fully use the 64 bit address space reducing IO.

    Make sense?
     
    #34     Mar 24, 2010
  5. Absolutely. (Plus, I'm not comfortable spending money on something I don't fully understand. I figure an unnecessary hardware purchase is just as bad as losing money on the market.)
     
    #35     Mar 24, 2010
  6. A correction. MyISAM tables (with minute data) does have a key on (date, time). Also, I just now tested with InnoDB tables. For typical queries (from datetime to datetime), MyISAM is faster, so I will stick with that for now.
     
    #36     Mar 24, 2010
  7. A suggestion before you get too carried away in this...
    You have to know what you are looking for, and most of the time, it won't be there.
     
    #37     Mar 24, 2010
  8. garch, where are you getting minute data on stocks / what is a good place to get it from? Thanks
     
    #38     Mar 24, 2010
  9. Sent PM; but the general answer is that I put a DLL in a retail broker platform with good historical data and funneled the data out.
     
    #39     Mar 24, 2010
  10. Garchbrooks,

    I have been reading your thread with some interest. I used to know everything about databases. I’m a retired Data Base Administrator (DB2 and Oracle). But now I trade full time to forget restoring DBs at 2:00 A.M. every other day. I’m not a SQL Server guru. I’m a mainframe DBA. But I will help if you want it.

    I see that you are trying to do the ‘impossible’ of using a PC (they still don’t match the bigger hardware) and run a monster app against big databases. Like you found out you are going to be tied up with I/O trying to access all the data in raw form. Unless you have an absolute need for raw data then do what they do in real apps and condense it into something that is usable.

    The first question I always get is why can’t a PC do what the mainframe does. So I will answer it in advance. On a mainframe you would spread your data across multiple disk packs. Next the data would be partitioned across the packs. When you ran your query each pack would grab part of the data. Imagine 20 packs working parallel on the I/O. That is a quick over view of what you can’t do on a P.C.

    I’m very familiar with processing this type of data.
     
    #40     Mar 24, 2010