Tick Database Implementations

nitro · Jun 1, 2012

Quote from amazingIndustry:

Interesting link, thanks for that. What are you running on? I am currently looking for a tick database versatile enough to feed Excel, C#.Net 4.0 applications, WPF charts, and the like. Any recommendations? For my backtesting architecture I coded up my own binary data store through which I manage to iterate over ticks at a rate of 6.5 million ticks per second. But I look for something that may be able to feed different applications through APIs, ODBC, and/or other interfaces in order to code up a scanning app in .Net. It may be in combination with a CEP engine because I look to calculate customized indexes and indicators on historical data and store alongside the tick data and also calculate such data in real-time on part historical data, part real-time. Any suggestions regarding suitable databases? Please exclude HDF5 because I target .Net and this database is the worst when it comes to supporting .Net. Someone wrote up a .Net compliant Api but it severely lacks in functionality.

Thanks.
More...

People are mixing two different uses of "Tick Database" in this thread. As I pointed out somewhere in this thread, databasing (persisting) the ticks to disk is trivial and is of no consequence (to me.) Where the real value is being able to extract intelligence from the data, and since this is a time domain, to be able to do it in such a way that makes sense in the markets. [Although I will add that it would be nice to be able to turn a time stream into a time-frequency domain stream and either do research on this stream or trade the time-frequency domain stream in realtime].

So for me, the actual database to disk strategy I couldn't really care. For example, I am talking to the Xenomorph Timescape people (linked to above) where the actual database will be SQL Server, Oracle, HBase, or perhaps their proprietary format. Who cares? What I care about is that if I stop using Xenomorph Query Language software, I can still get at my data. That probably rules out their proprietary data store format. The real value these things add is the query language that sits on top of some database that allows you to ask questions of your data very simply, and then to be able to turn that question(s) into a trading system with little or no modification of your code whether it is from Excel or C# or whatever. So the code to do research and the code to turn that into a trading system should be almost identical.

TradeStation is probably one of the most seamless ways to do research and turn that research into a realtime system, but it is not an institutional grade product because you are stuck with EasyLanguage and all analytics have to be applied to a chart instead of a database. If TradeStation turned their EasyLanguage loose on a generic database [more accurately, generic stream] without having to have charts, it would be far more interesting. But even there, EasyLanguage is not semantically rich enough as these time base query languages that allow for far more complex mathematical analysis.

Vector or column databases and the (proprietary more often than not for market analytics) language to query them is why they are not commonplace.

amazingIndustry · Jun 5, 2012

you mentioned things others misunderstand, database solutions you would not use, but I was wondering do you mind sharing for the benefit of all what specific databse/query language you yourself favor and would use?

I strongly disagree with your point on the separation between storage technology and query language. Most always the two are inseparable for efficiency purposes (not saying separation does not exist but the most efficient solutions cannot be split). You can always segregate by running a binary data store and translate back and forth between binary read/writes and on the other side queries but this is for sure nothing that comes even close to being most efficient (fast or memory efficient). Kx's query language is very closely integrated down to the finest detail of its storage logic in memory and physical medium. I think my point is clear when considering that running sql queries on a Kx database is orders of magnitudes slower than using the built in query logic.
Thus, I disagree with your point on this issue.

Care to share what specific solution you propose to tackle issues I described in my previous post when I described typical use cases. I am well aware that logic involving historical data based queries can be very different from supplying data to charting libraries (via databindings).

Quote from nitro:

People are mixing two different uses of "Tick Database" in this thread. As I pointed out somewhere in this thread, databasing (persisting) the ticks to disk is trivial and is of no consequence (to me.) Where the real value is being able to extract intelligence from the data, and since this is a time domain, to be able to do it in such a way that makes sense in the markets. [Although I will add that it would be nice to be able to turn a time stream into a time-frequency domain stream and either do research on this stream or trade the time-frequency domain stream in realtime].

So for me, the actual database to disk strategy I couldn't really care. For example, I am talking to the Xenomorph Timescape people (linked to above) where the actual database will be SQL Server, Oracle, HBase, or perhaps their proprietary format. Who cares? What I care about is that if I stop using Xenomorph Query Language software, I can still get at my data. That probably rules out their proprietary data store format. The real value these things add is the query language that sits on top of some database that allows you to ask questions of your data very simply, and then to be able to turn that question(s) into a trading system with little or no modification of your code whether it is from Excel or C# or whatever. So the code to do research and the code to turn that into a trading system should be almost identical.

TradeStation is probably one of the most seamless ways to do research and turn that research into a realtime system, but it is not an institutional grade product because you are stuck with EasyLanguage and all analytics have to be applied to a chart instead of a database. If TradeStation turned their EasyLanguage loose on a generic database [more accurately, generic stream] without having to have charts, it would be far more interesting. But even there, EasyLanguage is not semantically rich enough as these time base query languages that allow for far more complex mathematical analysis.

Vector or column databases and the (proprietary more often than not for market analytics) language to query them is why they are not commonplace.
More...

propseeker · Jun 6, 2012

Quote from amazingIndustry:

But I look for something that may be able to feed different applications through APIs, ODBC, and/or other interfaces in order to code up a scanning app in .Net. It may be in combination with a CEP engine because I look to calculate customized indexes and indicators on historical data and store alongside the tick data and also calculate such data in real-time on part historical data, part real-time.
More...

why wouldn't you just continue to use your binaries?

if you can push 6.5M ticks, i'd think you'd be competent enough to write yourself a loader/scanner ("cep engine").

my2c... hype and marketing slow down real work much more than they solve it. keep it simple. roll your own.

amazingIndustry · Jun 6, 2012

with all due respect but I think you did not get my point. I have no problem implementing scan algorithms or other apps. I am looking to exchange ideas about efficient database structures for tick data and custom time series for read and write purposes with APIs that expose rich query functionality. Rolling my own is a huge time waster especially when something already exists. Even my own binary data store and reader implements open source components that I did not develop myself. I am wondering whether anyone has experience using Redis and RavenDb and what they have to say about their efficiency regarding time series data storage and data retrieval.

Quote from propseeker:

why wouldn't you just continue to use your binaries?

if you can push 6.5M ticks, i'd think you'd be competent enough to write yourself a loader/scanner ("cep engine").

my2c... hype and marketing slow down real work much more than they solve it. keep it simple. roll your own.
More...

nitro · Jun 7, 2012

When OpenTSDB is able to store microsecond resolution data, it will be VERY interesting to me:

http://opentsdb.net/faq.html#How_much_write_throughput_can_I_get_with_OpenTSDB

nitro · Jun 7, 2012

I am probably going to use

http://pandas.pydata.org/

ontop of either Cassandra or HBASE

Column Oriented databases:

http://en.wikipedia.org/wiki/Column-oriented_DBMS#Implementations

PocketChange · Jun 7, 2012

May want to take a look at Cloudera's Hadoop/H-Base implementation.
They have a ready made VM image and can have a test environment up and running in 30 minutes.

Quote from nitro:

I am probably going to use

http://pandas.pydata.org/

ontop of either Cassandra or HBASE

Column Oriented databases:

http://en.wikipedia.org/wiki/Column-oriented_DBMS#Implementations
More...

amazingIndustry · Jun 7, 2012

Nitro, what attracts you to that particular database. The throughput figures look awfully slow. And 4 byte timestamps, so no longs for dateTimeTicks (yet)....

Quote from nitro:

When OpenTSDB is able to store microsecond resolution data, it will be VERY interesting to me:

http://opentsdb.net/faq.html#How_much_write_throughput_can_I_get_with_OpenTSDB
More...

nitro · Jun 8, 2012

Quote from amazingIndustry:

Nitro, what attracts you to that particular database. The throughput figures look awfully slow. And 4 byte timestamps, so no longs for dateTimeTicks (yet)....
More...

It is free, performant, reliable, it is meant for Time Series, and it sits on top of HBase which has its own advantages.

The problem is its lack of resolution. Market Time Series need to timestamp at least on the micro-second resolution, and I argue even on the nano-second resolution.

PocketChange · Jun 8, 2012

Just store time stamp as string... What timer do you plan to use for ns precision?

Quote from nitro:

It is free, performant, reliable, it is meant for Time Series, and it sits on top of HBase which has its own advantages.

The problem is its lack of resolution. Market Time Series need to timestamp at least on the micro-second resolution, and I argue even on the nano-second resolution.
More...