best way to hack up an rdbms to approximate something like kdb for poor/dumb people

Discussion in 'Automated Trading' started by garchbrooks, Mar 23, 2010.

  1. I think the benefit actually comes from compression, because the storage on disk after compression is substantially less than without compression. The results are returned to me quite quickly, in under a second. 16GB vs much less.

    I'm not sure this is a long-term solution, but for my purposes, it's ok for now. I avoid sophisticated queries and use 'merge' in R, post-query, where date+time are in a range. I only use the database to fish queries between a certain date, like in your example. So I'm not really even using any sort of query optimizer or anything.
     
    #51     Mar 28, 2010
  2. ok thanks. Will take me some time to load all the data and do some cleanup. Will keep it standard mysql for now, and after everything is up and running would do some testing with infobright & infinidb.
     
    #52     Mar 28, 2010

  3. It boils down to the excercise of what a database is for - to find things. If you haven't come up with a plan on exactly what you are trying to find, all your energy is for naught getting caught up in technological issues, rather than the issue at hand.

    Its like people looking for the holy grail when it comes to building indicators. They just tinker around trying to stumble into something. Don't get caught up in that trap. Its a trapdoor many go through and get stuck in a world of "hope".

    You have time series data in your hands. The unique column is timedate stamp. Then OHLCV. Given that data, you can do SQL manipulations on it trying to find someting of interesting. You can create additional columns containing information related to the OHLCV DateTimestamp and search on it. You index on the stuff you are searching on to make it faster. Thats all this shit is about.

    Do you know exactly what you are looking for?
     
    #53     Apr 3, 2010
  4. Yes, I know what I am looking for. I designed a model based on an observation I saw in the market, and the performance of that model was so impressive I wanted to look for other candidates.

    I've actually got 12 or 13 cores mining the database now. It took a few iterations to get to the point where I could use the data, and then I filtered it a bit. I managed to get several new candidates for trading. Of course, I'm not ashamed to admit my implementation is a piece of shit and no real institution would ever do things so shoddily, but it is what it is and I'm fine with my limitations for now.

    So, yes, I am happy. infobright was the right tool for the job in the short run, but in the long run, I'm going to hit this problem again. Not much I can do, other than take what I have now, create some revenue, and use that revenue to finance more research and better operations.

    The next time I look into this problem, I will probably try infinidb because their community edition is not crippled.
     
    #54     Apr 3, 2010
  5. promagma

    promagma

    I was playing with the new InifiniDB Windows release this weekend. It doesn't use indexes, so to find a range of data it does a full column scan on one or more "extents" of 8 million rows. InfiniDB does it all in parallel, so it was cool to see all 4 cores running full blast. But even with 4 cores it was still a bit slower than MySQL MyISAM with a ridiculuosly huge & gratuitous index (with more cores it may be great though).

    My conclusion was that if you need to query/join huge data in all kinds of different ways (including some that MySQL would completely choke on), it is great. But if you want to optimize one type of range query, it's hard to beat MySQL doing a straight shot from the index.
     
    #55     May 23, 2010
  6. promagma

    promagma

    Also played with LucidDB this weekend. For my application, it is slower. InfiniDB was a tiny bit slower, LucidDB 2 times slower.
     
    #56     May 23, 2010
  7. promagma

    promagma

    Probably the best advice for me .... HDF5 or similar.
     
    #57     May 23, 2010
  8. I second the suggestions of using Cloud Computing. I use EC2 and it's great to be able to fire up an instance with my data loaded and 15 minutes later I can run a bunch of stuff, get results, and shut it down until I'm ready to test again.

    You can start with a "Large" instance which is the cheapest 64-bit instance type at $.48/hour. When you need to do a lot, you can fire up the same instance on a Quadruple Large instance for $2.88/hr with these specs:
    68.4 GB of memory
    26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
    1690 GB of instance storage
    64-bit platform
    I/O Performance: High
    API name: m2.4xlarge

    You can develop your scripts/analysis algos on your cheaper machine at home. Then when you need to crunch numbers you can use a cloud instance.

    The cost to hold 100GB in an EBS volume is $10/mo ($.10/GB/mo)

    If you're going to run stuff 24/7, probably better to buy a machine. But you can get a lot of hours on this hardware for a few hundred bucks.

    If you use EC2 (I haven't looked at all the alternatives) I'd recommend using the Elasticfox plugin for Firefox to make things easier.
     
    #58     May 23, 2010