Database engine: any ideas?

NoWorries · Apr 5, 2006

I use statistical software (e.g. Stata, R) for modelling of returns and large-scale portfolio optimizations. I usually can hold all my (end-of-day) data in memory, but am thinking of using a database engine to facilitate data storage. I might want to start running models on intraday data and will likely run out of memory using my current approach. The idea would be to have all data (open, close, return) stored in a database (e.g. 10,000 stocks x 100,000 observations) and load subsets of the data in memory for subsequent modeling. I currently do not need real-time updating of the database. Any suggestions are welcome!

21Centtrader · Apr 5, 2006

DB option could be performance issue, did you consider memory mapped files to store your data which would be best bet, but you should be familiar with system level calls. c++ would be best choice to write this.

ids · Apr 5, 2006

MySQL is the best candidate. It is pretty fast and convenient.

NoWorries · Apr 5, 2006

Would using MySQL imply storing the data for each stock in a separate table, and indexing them by date/time? Do you have any experience with the time it takes for a typical query in this case (e.g. 1000 stocks x 10,000 observations).

ktmexc20 · Apr 5, 2006

If you're using R, you already have an HDF5 api module in it.

I use HDF5 heavily via the C and Python (PyTables) interfaces . If you have any questions, feel free to ask me.

kt

slacker · Apr 5, 2006

If you want something a little better than a flat file and less than a complete database server. Look at http://www.sqlite.org/

It is embedded into your app, about 40k bytes very small. Check the site for speed comparisons with MySQL. It is fast.

There are very good database table browsers for this and interfaces for c++, python, java, etc., etc.

Moving to MySql or a 'real' database server is not much of a problem the sql queries are basic but standard.

Database purist will not like it but I find it good for prototyping and caching stuff.

You asked for suggestions!!!

Good luck

ids · Apr 5, 2006

Using MySQL does not imply any special requirements. Time is hard to predict but it works much faster then all known me SQL servers.

NoWorries · Apr 5, 2006

kt,
I thought about HDF5 but it seems it doesn't allow indexing through the R api--I understand you either have to load the entire database or nothing at all.

ktmexc20 · Apr 5, 2006

Quote from NoWorries:

kt,
I thought about HDF5 but it seems it doesn't allow indexing through the R api--I understand you either have to load the entire database or nothing at all.
More...

Well I'm surprised if that's the case. Are you sure, 'cause I'm not familiar with the api particulars that R provides. Maybe check the R mail list/forum for anybody else's comments/usages.

brokershopping · May 26, 2006

Does anyone have any experience with netCDF? I believe it is used for storing large amounts of climate data, etc.

http://www.unidata.ucar.edu/software/netcdf/