Historical Quotes with MySQL

Discussion in 'Automated Trading' started by Paccc, Feb 8, 2006.

  1. Paccc


    I was considering storing historical quotes in a database such as MySQL and interfacing with it for backtesting purposes. Has anyone explored this idea before?

    Some concerns I have with this:

    - The speed that MySQL could retrieve the data. This would obviously be a local database, so bandwidth issues are not a problem. However, if the tables stored intraday data, and you wanted to retrieve EOD, it could go through and compile the dataset from the intraday quotes, or possibly keep a separate table that updated with EOD nightly. How are the execution speeds in this case? I know MySQL is fairly efficient (and much faster than a simple file based database), but would storing this much data get out of hand?

    - The database structure required to support a large number of quotes. You could potentially have a huge number of tables in your database (assuming you used one table per stock, etc. that you're tracking). Obviously a day's worth of intraday quotes will take up a good deal of space, especially taking into consideration the number of things you could possibly record (there are quite a few options quotes you could store per each equity).

    - I plan to update the database with new quotes in realtime. Would this cause undue stress on the system by constantly writing new data to hundreds of tables at once? Also, while writing data, MySQL locks the table from being read from other connections, could this pose a problem?

    - The size of the data accumulated shouldn't be a problem, since disk space is cheap. The only concern I would have with this is the amount of time it would take to search through 2gb+ of data.

    Has anyone attempted a setup similar to this? If so, was MySQL your DB of choice? What's the most efficient or logical DB structure? Is there a much simpler/easier approach I am missing completely? Any thoughts/comments are much appreciated. Thanks!

    -- Paccc
  2. Yes, don't do it. Mysql is a piece of garbage. If you are going to go SQL then go postgres. However, I wouldn't suggest SQL at all.

    For a really amazing database checkout Sleepycat (aka Berkeley DB).

    Network IO,sql,etc, you don't need any of that overheard. Sleepycat is great because you define your "queries" and database structure programatically. In java, it is really nice because it makes your data storage look like standard java collections.

    There is support for transactions, replication, foreign keys, indices, etc, etc.

    I currently use it to archive 400 million+ records per day while barely raising cpu usage above 1% on a dual P4-2ghz using the java/JNI version.

  3. Paccc


    Thanks for the reply, stephen. Berkeley DB is an interesting alternative that I had not considered before. I read through a lot of the documentation and noticed it was very different from the typical SQL database. It is a lot more programming oriented, rather than executing a simple SQL statement. I assume this is advantageous since you have more control over the execution of your program. I noticed that many of the features are the same as a traditional SQL-based databse, what specific advantages does it have over SQL (faster transaction speed/data access, lower memory/cpu usages, scalability)? I am already fairly familiar with SQL and how it works, would conquering the learning curve for BDB really be worth it? Also, if you wouldn't mind sharing, I'd be interested in how you have your whole system setup. Thanks!

    -- Paccc
  4. I'm using MySQL to store forex quotes and I think I have about a gig of data in there right now. All your points are fairly valid but you don't really need anything faster in my opinion. I mean as long as you aren't trading off of these quotes, you're just backtesting!

    It's fine for my usage anyways as I spend some time analyzing the data and developing backtesting programs. I let the program run for a while until it has the results of the test. I only used MySQL because I have it on another server I am borrowing already. If I had my own box here I would probably load SQL Server on it just because it would be easier to develop programs for me..
  5. For me it is very important to be able to quickly and efficiently store data, run simulations, optimize, test models, etc. All of the reaons you mentioned for using sleepycat are true. I initially started off using mysql, then postgres, and oracle, none of them could hold a candle to sleepy/BDB.

    The main reason is, sql is well suited for static data with dynamic queries. BDB is well suited for dynamic data with static queries. This is not required though, you can still do dynamic queries with sleepycat, it's just a bit more low level than probably what you are used to.

    Re: piphunter, I don't know about foreq quotes.. But I store 2G/day easily. The API, speed, transparent access, etc all huge reasons to use when performance matters.

    My setup is tons of custom java stuff, job scheduling, limit order book management, quote filtering, smoothing, position sizing, risk management, etc. No gui. All console based with extensive math libraries and ability to export/import data to/from matlab.

  6. I switched from MS SQL Server to MySQL since roughly 18 months ago.
    I keep about 80Gbytes of tick data in MySQL.

    Backtesting performance should not be a problem. Of course, one could easily imagine situations with improperly designed applications, this with any database.

    I only use the standard MySQL engine. I have been looking at PostgreSQL as well but I liked MySQL much better for what I need. Excellent support available for any environment and platform.

  7. promagma


    I use MySQL and have separate tables for EOD and tick data. Having separate tables is always going to be much faster. I chose MySQL over MS SQL Server, when after trying to optimize my queries on both, MySQL was about 30% faster. My queries are nothing fancy, along the lines of testing gap %, close %, high %, for one day back, two days, etc. One thing is a pain in MySQL .... subqueries are so poorly optimized, they are basically unusable.
  8. longsi


    Guys, what about using an object database like db4o ?
    That way i can model price as a link list for easy traversal.

    I would be able to
    1. implement behaviours for each price object like is clear candle, engulfing etc without having such codes elsewhere
    2. Do aggregation, tick aggregated to 1minute, 1minute aggregate to 5 minutes etc

    Modelling wise could be easier but performance wise not sure how it is compared to sleepycat/firebird/sqllite
    Have not done anything as yet.. just thinking about it.. appreciate any responses.
  9. Bob111


  10. This could sometimes be the case, but would in my experience be usually most impractical.
    It depends on what kind of data you keep, on what and how much you need to extract for what kind of backtesting you want to perform.

    I kind of make use of your approach for intermediary storage but would never think of keeping my primary data stored as binary.
    #10     Feb 9, 2006