Tick Database Implementations

ktmexc20 · Jan 11, 2007

Quote from TraderMojo:
I understand, at least superficially, the huge potential and benefits of having streaming SQL abilities...
More...

Am I wrong that sql is nothing more than a "higher level" interface. I can implement the exact same functionality myself, with the same syntax if desired.

TraderMojo · Jan 11, 2007

Quote from ktmexc20:

Am I wrong that sql is nothing more than a "higher level" interface. I can implement the exact same functionality myself. No?
More...

To be clear, nitro is not referring to standard SQL which mainly concerns itself with the retrieval and manipulation of stored data in a relational database like SQL server, MySQL etc. So yes, you could implement this language yourself.

Instead, streamSQL is a much newer concept and is an extension to SQL that adds a temporal dimension and simplistically, allows the analysis of real-time data/event streams as it is happening i.e. before it get's stored.

Conceptually it works by looking at sliding time-based windows e.g. the last x minutes of data streams. The window is constantly updated to to always be the last x minutes. You perform streamSQL queries on this window of data. This is just a simplified example.

To be accurate you can also incorporate traditional stored data into the analysis.

A newbie attempt at illustrating the concept with one further example based on case studies from Esper: say you wanted to monitor the datafeed so that you can detect whether there is any irregular activity such as a fall off in ticks or a burst of activity:

If you were writing the ticks to a database, you'd have to periodically poll the database with queries, be it via standard SQL or some custom retrieval mechanism in order to analyze the tick-rate and come to a conclusion about regularity of the tick-stream.

With stream analysis, the streamSQL queries are essentially set up first, and the data streams are passed through it. The query can then produce results in real-time as required.

Admittedly this is a contrived example as you wouldn't neccessarily approach it the way described in a non streamSQL environment.

I'm no expert on this topic by any stretch of the imagination so I perhaps haven't explained the concept very well.

For more information you may wish to review:

http://streamsql.org/pages/about.html

I actually have the vendor-of-ats-software-who-is-now-banned to thank for brining ESP and CEP into focus in the context of trading for me! I can see some very exciting possibilities but it takes a little bit of a mind-shift to get to grips with.

ktmexc20 · Jan 11, 2007

Quote from TraderMojo:

More...

tm, thanks for your comments.

Well my coming from a C, C++ point of view, makes me believe that sql is pretty much a novelty; a convenient protocol of a language format.. if you will.

I still don't see how sql or even a "streamSQL" is advantageous, considering (specifically to what you mentioned above) it's no big deal in the native language itself.

-kt

Dazuni · Jan 11, 2007

my back ground is in VB.net, vb6, vba and SQL (MS SQL Server, Oracle), I am also learning C++, (and I am trained in statistics)

I agree that what you can do with SQL can be done with C++, and not the other way around. SQL is quite straight when it come to what can be done.

what I want to ask is that what does the data feed looks like? does it print onto a text file?

I have worked with bloomberg (and bloomberg on excel) before, but it seems that bloomberg on excel work with periodic refresh, like heartbeat, which to me is not live.

edit / add:

can someone recommend a good feed provider to me, as I just getting started in this, is there a free one? and which provider is the most timely one?

many thanks

ktmexc20 · Jan 11, 2007

Quote from Dazuni:
what I want to ask is that what does the data feed looks like? does it print onto a text file?

More...

Well, that depends on what your provider provides and on what/how you decide to apply it.

dcraig · Jan 12, 2007

Has anybody actually run any benchmark or performance test of HDF5 as compared to say MYSQL or anything else for the matter for the type of data that is being talked about here.

ktmexc20 · Jan 12, 2007

Quote from dcraig:

Has anybody actually run any benchmark or performance test of HDF5 as compared to say MYSQL or anything else for the matter for the type of data that is being talked about here.
More...

Hi dcraig,

I don't even know how to work with any sql, so i can't help with profiling.

But, I would guess that by the vast amount of scientists, researchers, etc. using hdf5 by some means, would answer that by implication. Besides hdf5 itself, NetCDF is very popular and uses hdf5 as it's back-end. I'm not sure what FITS uses, if not self implemented, being another of popularity.

Actually, I don't remember any researchers using an sql based db in my own research of scientific algorithms and software. I'm sure that's not a totality, but just what I've personally come across.

WallstYouth · Jan 12, 2007

We have a similar system in house called TDMS, TDMS is the strategic trade entry and risk management platform for the Trading Division. It is a very high-throughput, in-memory database which accommodates securities and derivatives trading, across FICC and Equities. It is being written in-house, using C++, and consists of several processes utilizing Sync Agent to access shared memory. It has been designed to accommodate throughput levels of several thousand trades per second, it also implements an SQL like api.

nitro · Jan 13, 2007

Hi ktmex,

First, there is nothing wrong in doing any of what you are doing in an expressive language like C++. Alan Turing has already proved that many years ago that assuming some very basic operations in a language, one computer language is as good as another.

That said, there is a maxim which I _try_ to live my programming life by: One doesn't program systems. You program mini-languages first, and then you build your systems in [a mix of] those languages. This is very powerful because it allows the greatest expression of each concept in it's natural linguistic domain, and therefore makes for huge productivity gains, not just in getting the original system up, but in maintaining [the modern term is agile. See http://en.wikipedia.org/wiki/Agile_software_development] the system as it evolves. And to boot, the system usually less bugs. An example of this is ASP.net, where the "code behind" is written in any .net language, and the front end is written in HTML.

Even when we program in C++, when we abstract patterns

http://en.wikipedia.org/wiki/Design_Patterns
http://en.wikipedia.org/wiki/Refactoring

into classes, we are building mini-languages first, and using a mix of C++ and those classes to build our programs. We just don't think of the classes as a mini language, but that is what we are doing: Building a language to express the problem domain naturally in a readable way for other human beings to understand when they look at our code (or ourselves six months later when we have to maintain the code.)

The distinction is very powerful. For example, we could all sit and write web pages in C++, but most of us do it in a designing language that naturally maps to html for the presentation part, and use a mix of languages like PHP, AJAX , C++ and SQL for the code-behind.

A strong reason for this is that html is a declarative language. In a declarative language like HTML or SQL you describe what something is like, as a opposed to an imperative language like C++ & Fortran, etc ,or a functional language like like Haskell or Erlang or Mathematica. Instead of giving the steps (in the case of an imperative language) of how to achieve the results, you describe what the results you are interested in look like. For more on this stuff read this and follow the link:

http://en.wikipedia.org/wiki/Declarative_programming

In conclusion, I suggest you browse the streambase website

http://streamsql.org/files/Documentation/streamprocessing_language.pdf

or http://www.streambase.com

and see the reasons that others give for doing things the way they are suggesting you do them. The advice is worth thinking about imo.

nitro

Quote from ktmexc20:

Now, it is unquestionable that SQL is the language best suited to express concepts about data stored in [relational] tables. It has been in use for years and rarely is something that successful unless it offers a rich enough syntax coupled with a good performance.

I am still not 100% sold that streamSQL is the richest language or even a good language to do stream or real-time time series analysis, but it is clerly an attempt at recognizing the flaws in doing this in SQL [let alone a language like C++], and is a good attempt at a solution.

I recommend you read this paper from the streambase people. While it sounds likea brochure, it does hilite the fact that immediately that
Hi nitro,

I have never used an sql (relational db) and have only briefly touched upon learning it. From my layman's point of view, I would rather do queries in the native programming language and not some query language tied to the db.

From what I understand, this type of db is unnecessary to my work. I could possibly understand it for enterprise use, I guess. I just don't see yet, the advantages in a relational db at the cost of run time efficiency.

Mind you that Hdf5 does internally support storage reference structures, iterators, partial I/O, point/hyperslab selection and parallel implementation.

So are saying that, just because it's not tied to an sql, it's not suitable? I don't understand.

Thanks,
kt
More...

ktmexc20 · Jan 22, 2007

Quote from nitro:

More...

Nitro, thanks for your comments.

Just to clarify, I'm aware of most of what you spoke of. Design patterns and programming discipline (agile, extreme, etc.) is what I'm currently working on mastering. For the hell of it, let me also just mention that I like programming via the Qt C++ framework and doing any heavy lifting via C libraries (or "low level" C++ itself).

I haven't taken the time yet (but I will) to read the streamSQL comments-on-it's-benefit you mentioned. But, I don't see me being persuaded at this time, considering my content and investment in hdf5.

The question I still have though, is with your original assertion that hdf5's packet table interface is "no where near" a streaming db. How is that?

Boeing's Flight Test Instrumentations Group and the HDF5 development group at the University of Illinois have developed a library that is particularly suited for "packet" data, data that arrives in streams of packets from instruments at potentially very high speeds.

-kt