Historical Quotes with MySQL

nononsense · Feb 9, 2006

Quote from prt_systems:

Yes ... mySQL is still evolving ... using it is like working in the early days of SQL server and other now mature database systems.... Its not for the faint of heart. Still there are scenarios where its quite useful and cheap even with its current limitations.
More...

prt,

I worked with SQL Server from the days it only ran on OS/2, not on NT yet.
I can tell you that there is absolutely NO comparison betweem MySQL and 'early' SQL Server. MySQL can be easily interfaced with almost anything.

With 'early' SQL Server you got only C requiring lots of coding. Later they plastered on VB going through many extremely buggy versions at db interfacing. I haven't seen this at all with MySQL.

I picked MySQL because it did what I wanted and helped me tremendously in migrating from windoz to linux. In fact you can simply pick up the individual db table files and move them over to the new OS! Try that with SQL Server ! Today, I still use this possibility daily in linux to simplify archiving.
Two years ago, I also tried PostgreSQL but I found that it was not so easy to get this going and it did not do anything better than what I needed.

prt_systems · Feb 9, 2006

Quote from nononsense:

prt,

I worked with SQL Server from the days it only ran on OS/2, not on NT yet.
I can tell you that there is absolutely NO comparison betweem MySQL and 'early' SQL Server. MySQL can be easily interfaced with almost anything.

With 'early' SQL Server you got only C requiring lots of coding. Later they plastered on VB going through many extremely buggy versions at db interfacing. I haven't seen this at all with MySQL.

I picked MySQL because it did what I wanted and helped me tremendously in migrating from windoz to linux. In fact you can simply pick up the individual db table files and move them over to the new OS! Try that with SQL Server ! Today, I still use this possibility daily in linux to simplify archiving.
Two years ago, I also tried PostgreSQL but I found that it was not so easy to get this going and it did not do anything better than what I needed.
More...

I worked with SybaseSQLServer starting in 1994 if memory servers me ..... From 1994 to 1995 we used Sybase Open Client as our basis of communications from various tiers. After that I used the very first beta of ASP and the drivers that appeared - things were modest then and could be accomplished on Windows - we also used unix clients/tiers.

Thinking back to the development in those early versions we had many many problems and bugs annd relatively poor ansi sql conformance (by todays standards). Mysql has problematic bugs as well. However I do agree with you that many things are simpler and easier in MySQL than they were with early SQL server versions ..... 12 years ago many solution archetypes were just being worked out ... today they are understood and commodity...

Sparohok · Feb 10, 2006

Quote from ktmexc20:

Is this regarding threads?
More...

No, processes. You can't use an HDF5 file from more than one process the way you can with a database server. There are no locking primitives.

"guarantee data consistency"... ???
More...

The "C" in "ACID".

"file may be destroyed"... You may be referring to very early version of Pytables when an explicit call to file.close() was required.
More...

I had to restore an obliterated HDF5 file more than once during my Pytables testing. I was using a recent version at the time. It may have been caused by signals rather than Python exceptions. You can't entirely avoid this sort of thing but HDF5 is inherently fragile compared with a database or filesystem.

I'm just reporting my own experiences based on my requirements, for other usage patterns I'm sure it's the bee's knees.

Martin

prt_systems · Feb 10, 2006

Quote from ktmexc20:

Hi Martin,
I'm wondering what you mean exaclty by:"locking" ... Is this regarding threads? HDF5 is parrallel i/o optional and is thread safe. Python though, does of course have it's threading limitations.

"guarantee data consistency"... ???

"file may be destroyed"... You may be referring to very early version of Pytables when an explicit call to file.close() was required. But this hasn't been neccessary for quite a while now.

All in all I respectfully suggest you try a new version of Pytables and see if you like it's improved conveniences. And as far as HDF5 goes, take a spin through the pure C api. Whew..What a ride
More...

... When you start looking at the reason you want an in memory database or a high performance IO solution you start looking beyond typical relational systems - there is a penalty for all of the useful things they do - but in general its still very useful. The trick is to partition your application in such a way that you get what you need from the relational database and still have the most demanding part of the system taken care of and scalable.

Here is the note from the HDF5 team ....
" It really depends on your application. HDF5 is tuned to do efficient I/O and storage for "big" data (hundreds of megabytes and more). It will not work well for small reads/writes. It doesn't have indexing capabilities, though we are working on some limited features. See the HDF5_Prototype_Indexing_Requirements for details. HDF5 was designed to complement DBs and not to compete with them. "

ktmexc20 · Feb 10, 2006

Hi Martin,
I'm teaching myself computer science and therefore completely respect the comments of all you guys.

Quote from Sparohok:
No, processes. You can't use an HDF5 file from more than one process the way you can with a database server. There are no locking primitives.

More...

Any implementation of parallelism is, for now, beyond my needs and still a bit away from my total comprehension. In this document on Thread Safe HDF-5 I verified your comment. But, my understanding is that the qualifier "volatile" (regarding C programming) could be used alleviate this restriction.

HDF5 is not interprocess-safe--two processes cannot simultaneously access an HDF5 file--so no attempt is made to prevent deadlocks in HDF5 by resetting state information with pthread_atfork(). Don't call HDF5 functions from a child process.
More...

In this comment...

Quote from Sparohok:
The "C" in "ACID".
More...

I found out that ACID means:

(Atomic, Consistent, Isolated, Durable) The properties of a CORBA transaction that satisfy the specification of the Object Management Group. The transaction should be atomic (all or nothing), consistent (no interim state of the data), isolated (does not affect or is affected by other transactions) and durable (will complete if system fails or can be reversed).
More...

So, I'm still not sure what you are referring to. Is it that the library is founded on portability and that there is an intermediary process of converting a servers file format to the clients native memory format, if they are not congruent? I am not familiar with any inconsistencies.

Quote from Sparohok: ...but HDF5 is inherently fragile compared with a database or filesystem.
More...

I can agree with this in the essence of lower level interfaces are subject to prudence and caution similar to how python (high level) is alot more forgiving than C is. Lower level languages give greater flexibility to the programmer at the expense of requiring more responsibility in terms of things like data alignment, etc.

Quote from Sparohok: I'm just reporting my own experiences based on my requirements, for other usage patterns I'm sure it's the bee's knees.
Martin
More...

Yes, I appreciate that. Thanks.

ktmexc20 · Feb 10, 2006

Quote from prt_systems:
...
Here is the note from the HDF5 team ....
" It really depends on your application. HDF5 is tuned to do efficient I/O and storage for "big" data (hundreds of megabytes and more). It will not work well for small reads/writes. It doesn't have indexing capabilities, though we are working on some limited features. See the HDF5_Prototype_Indexing_Requirements for details. HDF5 was designed to complement DBs and not to compete with them. "
More...

Hi prt,

First let me say that I have absolutely no experience with RDBs but I do basically understand their structure.

What I want to also point out about hdf5 is that the file/multi-file format is very similar to a directory structure in the operating system extended to the file itself. Flexible, extensible, and with customizable meta-data structures. Also, that any dataset contained in the file does not have to be brought into memory in it's entirety. Custom "Hyperslab", "point selection", inter-dataset references, variable length (ragged arrays), and field/row table/array extensibility are all extremely flexible.

This DB format is all I'm familiar with. First though pytables and now through the C api. For now, I find it very exciting and receptive to a trading application environment, Not to mention that I'm a runtime efficiency hound.

Thanks,
kt

Paccc · Feb 13, 2006

Thanks for all the responses guys, I appreciate it. After considering some of the other database choices (some very interesting alternatives), I think I am going to stick with MySQL, mostly because I am already very familiar with it and ways to optimize it.

Now that that is settled, I am ready to begin my trek to design the database. After scratching the surface on this task, I realized that I was not completely sure how I wanted to go about doing this. Since I don't want my design to be myopic in nature, I figured it would be best to ask those who are more knowledgeable than I. Here are some initial questions I have about this:

1. Which storage engine would be more suitable for storing quotes? MyISAM or InnoDB? It turns out that MyISAM may be a bit faster, but InnoDB is transactional and supports full ACID.

2. What types of data should I store? As of now I only plan to store tick data. For example: symbol, tradeTime, tradePrice, tradeSize. For those of you who have recorded data for a while, how long does simple data like this last you? Does storing something like market depth or bid/ask prices benefit you in the future? I don't have any strategies at the moment that would take advantage of this data, but I'm guessing that sometime later on it would be useful. Plus, disk space is cheap so amassing a huge amount of data wouldnt be too bad. Aside from what I mentioned, what other types of information have you guys found useful to record?

3. How precise should the trade times be? MySQL has a built-in UNIX_TIMESTAMP() function that returns the time in seconds. Would something more precise actually make a difference? Although it would be possible, it would be a pain to write a function that returned millisecond precision (I dont think MySQL has one built-in).

4. Right now I'm planning on making one table per symbol, and then possibly running a script to update a symbol_EOD table nightly (not sure about this yet). This would easily amount to over 100 tables in my database, or twice that with EOD tables too. Have you guys found it useful to compile other tables from the tick data, like EOD? As of now, I have something like:
Code:
INSERT INTO [i]symbol[/i] VALUES (ID,UNIX_TIMESTAMP(),[i]price[/i],[i]size[/i]);
5. I've had a bit of experience in the past with database clusters, and although this may be far in the future, how practical would setting up a cluster or parallel system actually be?

6. One last, and fairly important question I have, is how reliable would this be for realtime trading? I'm sure it'd be fine for backtesting, but trading realtime with this data is a whole different thing. I was considering using memory-based tables to store possibly the current day's data, thereby speeding up the processing time for recent quotes. This would require a bit of work to get it done reliably, is it worth the extra effort, or would standard tables be sufficient?

If anyone could give me some pointers on the above questions (or something I left out), I'd really appreciate it. Also, for those of you who already have something setup and running, I'm real interested to hear about what kinda setup you have, like what kind of data you you store, how much data you have total or how much you accumulate daily (# records or MB), and mostly, how well it is working for you, and any pitfalls you've encountered along the way. Thanks a lot!

-- Paccc

dcraig · Feb 13, 2006

Quote from Paccc:

3. How precise should the trade times be? MySQL has a built-in UNIX_TIMESTAMP() function that returns the time in seconds. Would something more precise actually make a difference? Although it would be possible, it would be a pain to write a function that returned millisecond precision (I dont think MySQL has one built-in).

More...

In Java

long System.currentTimeMillis ()

In C, POSIX system call

gettimeofday ()

prt_systems · Feb 13, 2006

Quote from Paccc:

... After considering some of the other database choices (some very interesting alternatives), I think I am going to stick with MySQL, mostly because I am already very familiar with it and ways to optimize it.

...

More...

1. Always turn in favor of ACID support unless your design requirements negate it ... and then you need to basicaly roll you own high performance support for this or use an in- memory high performance option.

2. Store as much data as possible ..... remember each piece of data stored has a cost associated with it ...... a good way to consider if you really need it ..

3. Use higher resolution timers only if needed ...... most systems ack back their transaction time of record .... consider using that as your time of record as well

4. Each table has a long term cost - think about it like that . Also if you had a lot of users hitting your data then the nature of that detail should drive your decision making as well ...

5. Clustering is only necessary if you need it - decided upon by the unit dollar loss per minute (or some such measure) if your system is unavailable. ...
High performance computing grids are necessary for almost nobody although many people and firms will try to sell you otherwise ....
6. memory based etc just depends upon your requirements ... run a few tests buld a few prototypes to help with your decision....

kalzayani · Feb 13, 2006

The same can be done using Excel Pivot..
Uses SQL...but in the backround + Visial SQL sort of deal
don't waste Your money dude