My next motherboard

nitro · Feb 26, 2005

Quote from nononsense:

nitro,

http://www.internetnews.com/dev-news/article.php/3485401

Be Good,
nononsense
More...

nononsense,

I have always had Infiniband and Myrinet (and Quadrics) in my periphery. If it weren't so expensive...I would use it in my production cluster.

Thanks for the link.

nitro

nitro · Feb 27, 2005

Ok,

I finally got to the point where I have something working in MySQL with a C++ client storing fake qoutes to a database and table. I am having a little bit of an issue with my table designs and I don't quite understand why I would do one or the other.

For example, I can create a table that looks like this:

1) Trade:double, TS: timestamp

and now have the table name be the symbol name. That way I am always updating for a given symbol to it's own table. Downside is having a table for each instrument.

Then there is this approach:

2) Symbol: string, Trade:double, TS:timestamp

This also embeds in each row of the table the symbol name. I am not sure what this buys me...

Another thing is, even in case 2 I have two choices: I can either only send trades for one symbol to that table, or I can send all trades for all symbols to that table. Again, other than saving having many tables, I am not sure what each buys me (perhaps faster lookup and aggregation times..)

My guess is the correct method is to add the symbol name to the table, and only send trades from one symbol to that table. That means I will have a table for every symbol I am interested in storing trade data for, with the name of the table being the symbol name, and the table will (redundantly) contain the symbol name as one of the columns. I think this helps if I ever want to do joins for data mining, but I am not sure...

I am also looking into lighter and heavier database management systems, like dbm and ObjectStore.

nitro

nononsense · Feb 27, 2005

Quote from nitro:

Ok,

I finally got to the point where I have something working in MySQL with a C++ client storing fake qoutes to a database and table. I am having a little bit of an issue with my table designs and I don't quite understand why I would do one or the other.

For example, I can create a table that looks like this:

1) Trade:double, TS: timestamp

and now have the table name be the symbol name. That way I am always updating for a given symbol to it's own table. Downside is having a table for each instrument.

Then there is this approach:

2) Symbol: string, Trade:double, TS:timestamp

This also embeds in each row of the table the symbol name. I am not sure what this buys me...

Another thing is, even in case 2 I have two choices: I can either only send trades for one symbol to that table, or I can send all trades for all symbols to that table. Again, other than saving having many tables, I am not sure what each buys me (perhaps faster lookup and aggregation times..)

My guess is the correct method is to add the symbol name to the table, and only send trades from one symbol to that table. That means I will have a table for every symbol I am interested in storing trade data for, with the name of the table being the symbol name, and the table will (redundantly) contain the symbol name as one of the columns. I think this helps if I ever want to do joins for data mining, but I am not sure...

I am also looking into lighter and heavier database management systems, like dbm and ObjectStore.

nitro
More...

nitro,

I store all my collected tickdata in one table, one for each day. In order to retrieve the data by symbol in an efficient way, you should build an index on the symbol column. (The db will do this for you if you ask it to.) This slows you down a bit when generating the table, but you can write the data without the index to speed things up and generate the index later when you close the day. When you are further along, you will want to retrieve data for one symbol over several days, i.e. tables. This is no problem as it is very easy to have the program open table after table and retrieving the data for the symbol in each table. This can be set up with straightforward SQL (see further).

One more thing nitro. You probably know this, but as you said you start with db's and you specifically refer to writing in C/C++: NEVER PROGRAM ANY DB OPERATION EXPLICITELY if you can do it by writing an SQL query. db's like MySQL, Oracle, PostgreSQL etc are all SQL based. You realize tremendous speedups if you learn how to accomplish things with SQL instead of using programmed db operations. This holds for any application language. Of course you program these SQL statements as query strings in your application.

In fact, MySQL carries several SQL based maintenance/query tools for downloading on its website.

Starting out is rather hard, because it requires you to do things the way the db requires. If you programmed ad hoc without db, it looked like you were less constrained. After a while you will discover the tremendous power of a good db. You have to give it the time to grow upon you. With me, it took more than a couple of days.

Be good,
nononsense

nitro · Mar 3, 2005

nononsense,

Thanks for the tips. I am going to experiment with different designs and see what comes of it.

nitro

Quote from nononsense:

nitro,

I store all my collected tickdata in one table, one for each day. In order to retrieve the data by symbol in an efficient way, you should build an index on the symbol column. (The db will do this for you if you ask it to.) This slows you down a bit when generating the table, but you can write the data without the index to speed things up and generate the index later when you close the day. When you are further along, you will want to retrieve data for one symbol over several days, i.e. tables. This is no problem as it is very easy to have the program open table after table and retrieving the data for the symbol in each table. This can be set up with straightforward SQL (see further).

One more thing nitro. You probably know this, but as you said you start with db's and you specifically refer to writing in C/C++: NEVER PROGRAM ANY DB OPERATION EXPLICITELY if you can do it by writing an SQL query. db's like MySQL, Oracle, PostgreSQL etc are all SQL based. You realize tremendous speedups if you learn how to accomplish things with SQL instead of using programmed db operations. This holds for any application language. Of course you program these SQL statements as query strings in your application.

In fact, MySQL carries several SQL based maintenance/query tools for downloading on its website.

Starting out is rather hard, because it requires you to do things the way the db requires. If you programmed ad hoc without db, it looked like you were less constrained. After a while you will discover the tremendous power of a good db. You have to give it the time to grow upon you. With me, it took more than a couple of days.

Be good,
nononsense
More...

cmaxb · Mar 3, 2005

nitro,

the relational way:

A table full of symbols and their id's.
A table full of "ticks", e.g. price and timestamp. Each row is marked by the id of the corresponding symbol.

Personally, I would append a "tick" to the end of a file. One file per symbol. For storing a lot of data, txt files are fine. For storing relationships, a database is needed.

My two cents.

nitro · Mar 3, 2005

cmaxb,

Thanks for the tips. I will relate (no pun intended ) my experience as I dwelve deeper and gain experience with different implementations.

One thing that seems to me to be problematical is that having one table mixing lots of ticks for different symbols is that when you need to read the time series for a given symbol in realtime, that will take quite a bit longer than if there was one table per symbol and the TS was "sequential." Maybe the sequentialness of it is an illusion anyway since the ticks will be scattered all over the disk anyway?

nitro

Quote from cmaxb:

nitro,

the relational way:

A table full of symbols and their id's.
A table full of "ticks", e.g. price and timestamp. Each row is marked by the id of the corresponding symbol.

Personally, I would append a "tick" to the end of a file. One file per symbol. For storing a lot of data, txt files are fine. For storing relationships, a database is needed.

My two cents.
More...

cmaxb · Mar 4, 2005

I would index the table according to symbol id, then timestamp. Makes insertions slower, but makes retrieval *much* faster. Also, I believe indexing affects how the data is stored to disk. Could be wrong, tho.

nitro · Mar 4, 2005

Ok, thanks - I will benchmark it in this form.

nitro

Quote from cmaxb:

I would index the table according to symbol id, then timestamp. Makes insertions slower, but makes retrieval *much* faster. Also, I believe indexing affects how the data is stored to disk. Could be wrong, tho.
More...

nitro · Apr 28, 2005

MSFT announces 64bit windows release version:

http://www.microsoft.com/windowsserver2003/64bit/x64/overview.mspx

I downloaded a trial yesterday and I am starting the port of my software in earnest to the x64 platform. I will start on the DELL since it is my research machine, and assuming all goes well move to production on the Quad Opteron.

I will try to post my experiences.

nitro

Trader.NET · May 2, 2005

You can try cluster index (the rows will be physically stored in sequential order if the indexed columns are in sequential order such as 1, 2, 3, 4, ... ) on the symbol id and timestamp using a high fill factor such as 90% to optimize query performance while adding some hit on inserts.

Storing a series per symbol per table will be slow compared to storing a series in an image column per row per symbol.

The fastest way to retrieve a series is to store it (an array of market data) in a serilalized blob like image column.

Quote from nitro:

cmaxb,

Thanks for the tips. I will relate (no pun intended ) my experience as I dwelve deeper and gain experience with different implementations.

One thing that seems to me to be problematical is that having one table mixing lots of ticks for different symbols is that when you need to read the time series for a given symbol in realtime, that will take quite a bit longer than if there was one table per symbol and the TS was "sequential." Maybe the sequentialness of it is an illusion anyway since the ticks will be scattered all over the disk anyway?

nitro
More...