Record high-frequency order book depth, hardware

stephencrowley · May 9, 2005

Do any of you guys record depth quotes from all the ECNs for backtesting purposes?

I've written some software to do this with 2 seperate direct access brokers via their respective APIs and was wondering how you guys handle the volume?

I'm recording ECN+Nasdaq level II for the top 10 nasdaq-100 and only nasdaq level II for the remaining 90 and the volume is crushing my poor little T1.

About 1.5 million records per hour is being recorded to my sql database. The database is only capable of writing around 500 records per second so the reamining is queued into memory and it finally all gets flushed out about 30 minutes after market close.

My main question is, if anyone is doing this, what kind of hardware setup are you using?

I'm thinking of building a dual-opteron 252 with 8gigs of ram, raid, etc. But i'd like to see if anyone is willing to share their experiences in this area first.

nitro · May 9, 2005

With this:

http://www.elitetrader.com/vb/showthread.php?threadid=28762

nitro

stephencrowley · May 9, 2005

Yow. You actually have this board now? I've been hearing great things about the dual-cure opterons lately.

Back on to something more relevant to automated trading:

Do you guys believe there is value in using a full aggreated order book compared to just pricing or level2 data?

Quote from nitro:

With this:

http://www.elitetrader.com/vb/showthread.php?threadid=28762

nitro
More...

nitro · May 9, 2005

Quote from stephencrowley:

Yow. You actually have this board now?
More...

Yes.

...
Do you guys believe there is value in using a full aggreated order book compared to just pricing or level2 data?
More...

I don't know, it is on my list of research projects.

nitro

originalskunk · May 9, 2005

Quote from stephencrowley:

About 1.5 million records per hour is being recorded to my sql database. The database is only capable of writing around 500 records per second so the reamining is queued into memory and it finally all gets flushed out about 30 minutes after market close.

My main question is, if anyone is doing this, what kind of hardware setup are you using?

More...

Have you considered that the problem may be sql, not the hardware. If you check amongst all the commercial trading software, I doubt you will find (m)any that are using a sql database to store high frequency time series data. Most tend to write arrays of fixed length records to flat files. These days most modern operating systems offer some flavor of memory mapped file so the io performance can be blindingly fast. c-tree used to be popular.

stephencrowley · May 9, 2005

Quote from originalskunk:

Have you considered that the problem may be sql, not the hardware. If you check amongst all the commercial trading software, I doubt you will find (m)any that are using a sql database to store high frequency time series data. Most tend to write arrays of fixed length records to flat files. These days most modern operating systems offer some flavor of memory mapped file so the io performance can be blindingly fast. c-tree used to be popular.
More...

You're absolutely right about sql not being the best formance. But it makes it vastly easier to analyze, extract, backtest, report, etc. My main goal right now is gathering data and development of algorithms, so real-time recording is not all that import. This box is an old dual p3-800 anyway so I imagine a dual opteron system would have no problem with 2-3K records per second, and probably more with optmizing ODBC calls and batching statements.

Runningbear · May 10, 2005

You could try taking 10 second updates of the order book instead of all changes. Or you could go with a hardcore database like INSQL.

Sparohok · May 10, 2005

Are you sure your app is CPU limited? Have you measured the CPU load? It seems to me like a slam dunk that this is going to be massively IO bound. Just doing inserts of small rows into a SQL database, a Pentium 3 should be able to keep up with a large RAID array.

I agree with originalskunk, a SQL database is not the right tool for that volume of data. If you roll your own storage backend, you can process on the order of a million records per minute, not per hour. It will also be far more space efficient. I use SQL a lot, but not for tick data.

I'm curious, if just inserting the data into a SQL database is slower than realtime, how are you planning to do your backtest?

Martin

stephencrowley · May 10, 2005

It is cpu and IO bound. I'm running a vmware instance to run my win32 app which connects to the local postgres database via ODBC. All the context switching is taking a hit as well.

I am recording all data by using an in-memory queue. The tick data receiving thread is appending to the queue while the sql writer thread is pulling data off. It does lag, but after the markets close it only takes about 45 minutes to finish dumping all the tick data from memory.

Quote from Sparohok:

Are you sure your app is CPU limited? Have you measured the CPU load? It seems to me like a slam dunk that this is going to be massively IO bound. Just doing inserts of small rows into a SQL database, a Pentium 3 should be able to keep up with a large RAID array.

I agree with originalskunk, a SQL database is not the right tool for that volume of data. If you roll your own storage backend, you can process on the order of a million records per minute, not per hour. It will also be far more space efficient. I use SQL a lot, but not for tick data.

I'm curious, if just inserting the data into a SQL database is slower than realtime, how are you planning to do your backtest?

Martin
More...

science_trader · May 10, 2005

I'm doing that. Just using stupid text files and it works very well.