Database engine: any ideas?

Discussion in 'Data Sets and Feeds' started by NoWorries, Apr 5, 2006.

  1. I use statistical software (e.g. Stata, R) for modelling of returns and large-scale portfolio optimizations. I usually can hold all my (end-of-day) data in memory, but am thinking of using a database engine to facilitate data storage. I might want to start running models on intraday data and will likely run out of memory using my current approach. The idea would be to have all data (open, close, return) stored in a database (e.g. 10,000 stocks x 100,000 observations) and load subsets of the data in memory for subsequent modeling. I currently do not need real-time updating of the database. Any suggestions are welcome!
     
  2. DB option could be performance issue, did you consider memory mapped files to store your data which would be best bet, but you should be familiar with system level calls. c++ would be best choice to write this.
     
  3. ids

    ids

    MySQL is the best candidate. It is pretty fast and convenient.
     
  4. Would using MySQL imply storing the data for each stock in a separate table, and indexing them by date/time? Do you have any experience with the time it takes for a typical query in this case (e.g. 1000 stocks x 10,000 observations).
     
  5. If you're using R, you already have an HDF5 api module in it.

    I use HDF5 heavily via the C and Python (PyTables) interfaces . If you have any questions, feel free to ask me.

    kt
     
  6. slacker

    slacker

    If you want something a little better than a flat file and less than a complete database server. Look at http://www.sqlite.org/

    It is embedded into your app, about 40k bytes very small. Check the site for speed comparisons with MySQL. It is fast.

    There are very good database table browsers for this and interfaces for c++, python, java, etc., etc.

    Moving to MySql or a 'real' database server is not much of a problem the sql queries are basic but standard.

    Database purist will not like it but I find it good for prototyping and caching stuff.

    You asked for suggestions!!!

    Good luck
     
  7. ids

    ids

    Using MySQL does not imply any special requirements. Time is hard to predict but it works much faster then all known me SQL servers.
     
  8. kt,
    I thought about HDF5 but it seems it doesn't allow indexing through the R api--I understand you either have to load the entire database or nothing at all.
     
  9. Well I'm surprised if that's the case. Are you sure, 'cause I'm not familiar with the api particulars that R provides. Maybe check the R mail list/forum for anybody else's comments/usages.
     
  10. #10     May 26, 2006