Historical Quotes with MySQL

prt_systems · Feb 9, 2006

Quote from Paccc:

I was considering storing historical quotes in a database such as MySQL and interfacing with it for backtesting purposes. Has anyone explored this idea before?

......
More...

I currently maintain approximately 450 GBytes of data in MySQL databases. This data represents historical data for backtesting purposes and other storage data related to our trading systems.

Prior to this we had everything in SQL Server: We DRAMTICALLY cut our costs by executing this migration.

Typical backtesting against historical data from this is no problem.

For high volume OLTP types of applications and certain real time calculations we use in memory database solutions: variations of what we use are either available from the major for-fee vendors or are in the works.

So, to answer your question, yes, mysql works fine for typical backtesting scenarios and you cant beat the cost .....

Sleepycat is nice, postgres is nice but they each have strengths and weaknesses - like all products. I prefer mySQL for general purpose applications (primarily due to pricing).

ktmexc20 · Feb 9, 2006

Well I'm gonna take this opportunity to throw in my DB $0.02...

I am using HDF5 designed by the The National Center for Supercomputing Applications ( NCSA ). This is a very well designed and very flexible hierarchical DB library

The HDF5 serial and parallel I/O library is the result of the collaboration of NCSA with three DOE laboratories: Lawrence Livermore National Laboratory (LLNL), Sandia National Laboratory (SNL) and Los Alamos National Laboratory (LANL).
More...

The next version (which I'm already using) is coming out with some interesting improvements. Packet Tables, for one, is designed for online data acquisition at very rapid paces... which works very well for my collection of tick DBs.

Written in C and very portable, there are various APIs: C; C++; Fortran; Java; Python (Pytables), and possibly some others.

kt

prt_systems · Feb 9, 2006

Quote from promagma:

...One thing is a pain in MySQL .... subqueries are so poorly optimized, they are basically unusable.
More...

Not really .... Like any database system the optimizer is unique : its not Oracles, SQL server, DB's etc. Subqueries can be dealt with in MySQL if you know the vagarities of the optimizer and plan ahead for those issues ..... MySQL currently reminds me of Sybase/SQL server or earlier versions of Oracle .... you need to play a few tricks on the optimizer to get it to do what you want ...or in some cases restructure data ... Bummer... but if you plan ahead these types of operations can be reduced to near nil....

piphunter · Feb 9, 2006

This is a fascinating discussion. I guess it really comes down to each person's unique individual needs.

But don't overlook that databases can easily be swapped in and out. Design your programs with data connection classes and abstract out your database interaction from your trading/testing code.

stephencrowley · Feb 9, 2006

Interesting. I've used hdf years ago, but only for data visualization programs.. I never knew it was a general purpose high performance data library. I may look into this in the future. Thanks

Quote from ktmexc20:

Well I'm gonna take this opportunity to throw in my DB $0.02...

I am using HDF5 designed by the The National Center for Supercomputing Applications ( NCSA ). This is a very well designed and very flexible hierarchical DB libraryThe next version (which I'm already using) is coming out with some interesting improvements. Packet Tables, for one, is designed for online data acquisition at very rapid paces... which works very well for my collection of tick DBs.

Written in C and very portable, there are various APIs: C; C++; Fortran; Java; Python (Pytables), and possibly some others.

kt
More...

prt_systems · Feb 9, 2006

Quote from piphunter:

....
But don't overlook that databases can easily be swapped in and out. Design your programs with data connection classes and abstract out your database interaction from your trading/testing code.
More...

Exactly .... and we are fully prepared to dump our current storage mechanism if things change.

Also, for in-memory databases/datasystems we use the same approach. At design time our first priority is to avoid vendor lockin at (nearly) all costs.

promagma · Feb 9, 2006

PRT, MySQL treats every subquery as dependent, so it reevaluates for every row of data. The bug is here:

http://bugs.mysql.com/bug.php?id=12106

I tried a lot of ways but subqueries just won't work right. Other than that MySQL has been pretty fast and stable.

Sparohok · Feb 9, 2006

Here are some comments I made in another thread concerning HDF5 format and storage of time series data in SQL databases.

http://www.elitetrader.com/vb/showthread.php?s=&postid=870501#post870501

Martin

prt_systems · Feb 9, 2006

Quote from promagma:

PRT, MySQL treats every subquery as dependent, so it reevaluates for every row of data. The bug is here:

http://bugs.mysql.com/bug.php?id=12106

I tried a lot of ways but subqueries just won't work right. Other than that MySQL has been pretty fast and stable.
More...

Yes ... mySQL is still evolving ... using it is like working in the early days of SQL server and other now mature database systems.... Its not for the faint of heart. Still there are scenarios where its quite useful and cheap even with its current limitations.

ktmexc20 · Feb 9, 2006

Quote from Sparohok:
Here are some comments I made in another thread concerning HDF5 format and storage of time series data in SQL databases.
http://www.elitetrader.com/vb/showthread.php?s=&postid=870501#post870501
Martin
More...

Hi Martin,
I'm wondering what you mean exaclty by:

Quote from Sparohok:
I tried pytables and was very disappointed. HDF5 has no locking and does not guarantee data consistency. If your program fails with an HDF file open the entire file may be destroyed.
More...

"locking" ... Is this regarding threads? HDF5 is parrallel i/o optional and is thread safe. Python though, does of course have it's threading limitations.

"guarantee data consistency"... ???

"file may be destroyed"... You may be referring to very early version of Pytables when an explicit call to file.close() was required. But this hasn't been neccessary for quite a while now.

All in all I respectfully suggest you try a new version of Pytables and see if you like it's improved conveniences. And as far as HDF5 goes, take a spin through the pure C api. Whew..What a ride