Tick Database, Now Want to Run SQL

sle · Jun 30, 2012

Quote from Rationalize:
* I think the free version might boot you out every 4 hours.

More...

Actually, you're right - it times out every two hours and stops working after a while (that "feature" was not there when I downloaded my free version a little while ago).

I can't imagine using an SQL DB for anything like tick-data, it's going to be painfully slow (unless there are some hardware or optimization tricks that I don't know about). I, myself, am running into a similar problem with equity volatility data - unless you organize the data into small silos, the queries become unmanageable.

januson · Jul 1, 2012

Easy

SELECT min(price), max(price), datepart(second,time)%10
from TickTable
group by datepart(second,time)%10

But I doubt that is going to make you happy

Do the division yourself.

propseeker · Jul 3, 2012

Quote from amazingIndustry:

... Same as saying KDB is the best and there is nothing faster out there (yes there is, I bet I can aggregate tick data into time based bins faster using my own self-written accessors to binary data store than KDB could ever do).
More...

i asked this question earlier in the thread.. then why don't you?

you mentioned api's and open source libraries as a reason. what are you looking at doing where you'd be so willing to sacrifice that kind of performance to muck around with a 3rd party db?

all of this fuss for a "performance db" just to get access to a query language seems a little overkill to me... and at the end of the day, as you mentioned, underkill. i must be missing something big here.

amazingIndustry · Jul 4, 2012

To give a bit background, I peruse my own binary data store purely for back-testing purposes. For live trading I look to expand my architecture by implementing a high throughput, low latency columnar database in order to load historical quotes into the system needed for index and indicator computation purpose. The gap between the historical data and live feed is bridged through backfill requests to my quote vendor.

That is the only purpose I have to run a 3rd party DB. I run a CEP engine for real-time query and pattern matching logic, do DB needed, once the data is loaded into memory it is available to my own custom query logic. You could argue that I could just expand my existing binary datastore logic to load the data from there but the situation is that I decided on a 3-tier data storage approach.

(1) Cleaned, Filtered, and adjusted data in binary datastore (for start of the week data data loading in combination with backfilled missing data into trading engine/ cep engine)
(2) intraday data already loaded or streamed but not needed for real-time CEP engine computation purposes anymore (because they moved out of the observation window) are pushed into a 3rd party high-speed database for further cleaning purpose and having them subsequently moved to (1) (main purpose is to handle fail overs)
(3) In-Memory data perused for real-time computations and trading query logic. (a combination of (1) at start of the week, backfilled data, (2) if the system goes down and needs to re-load the past data points, and real-time data streams

Hope this makes more sense now. What I need is a lightweight database that can persist (so not purely in memory) at high speed but without the bloat and unneeded bells and whistles.

Quote from propseeker:

i asked this question earlier in the thread.. then why don't you?

you mentioned api's and open source libraries as a reason. what are you looking at doing where you'd be so willing to sacrifice that kind of performance to muck around with a 3rd party db?

all of this fuss for a "performance db" just to get access to a query language seems a little overkill to me... and at the end of the day, as you mentioned, underkill. i must be missing something big here.
More...

propseeker · Jul 5, 2012

Quote from amazingIndustry:

(1) Cleaned, Filtered, and adjusted data in binary datastore (for start of the week data data loading in combination with backfilled missing data into trading engine/ cep engine)
(2) intraday data already loaded or streamed but not needed for real-time CEP engine computation purposes anymore (because they moved out of the observation window) are pushed into a 3rd party high-speed database for further cleaning purpose and having them subsequently moved to (1) (main purpose is to handle fail overs)
(3) In-Memory data perused for real-time computations and trading query logic. (a combination of (1) at start of the week, backfilled data, (2) if the system goes down and needs to re-load the past data points, and real-time data streams
More...

a 3rd party db won't help you when you crash before you push the data to it. i'm still not seein the need here.

if you want to do this correctly, you write a data server app that talks to your feeds, scrubs/filters/persists, and pub/subs realtime/historical out to your strats and guis.

of course, if you want to tinker with tools instead of trade, then i'd recommend reviewing all the db software on the market and try and pick the best wrong tool for the job, and then put what db you decided on and how many mps it can do in your sig. coooool.

amazingIndustry · Jul 6, 2012

I think you did not read my post carefully. I never mentioned that I look to recover data that has not been saved. I also never said that I look to only push market data through the database thus your suggestion is not applicable and you missed the point.

I said I look to implement a solution that can move time series based data from memory to disk as close to real-time as possible in order to recover such saved data in the event of a crash.

Your proposed solution does not really reflect going practice regarding cleaning/adjusting real-time streamed data. Generally real-time data is only "filtered" for bad ticks but nothing more than that in order to not degrade latency and throughput.

I reached a point where my systematic framework is mature enough to allocate time and money to improve persistence, recoverability, and an in-memory database that can persist data to disk in the background. Looking at your rather emotional outburst and your judgement of the non-usefulness of any sort of database without really knowing my setup and lack of knowledge of how I approach trading kind of disqualifies your post from being taken seriously. I do not mind you being judgmental but I do mind your unqualified judgement without factual support, especially given you totally misunderstood how I approach recoverability.

Quote from propseeker:

a 3rd party db won't help you when you crash before you push the data to it. i'm still not seein the need here.

if you want to do this correctly, you write a data server app that talks to your feeds, scrubs/filters/persists, and pub/subs realtime/historical out to your strats and guis.

of course, if you want to tinker with tools instead of trade, then i'd recommend reviewing all the db software on the market and try and pick the best wrong tool for the job, and then put what db you decided on and how many mps it can do in your sig. coooool.
More...

Rationalize · Jul 6, 2012

Quote from propseeker:

...
of course, if you want to tinker with tools instead of trade, then i'd recommend reviewing all the db software on the market and try and pick the best wrong tool for the job, and then put what db you decided on and how many mps it can do in your sig. coooool.
More...

+1

propseeker · Jul 6, 2012

i do this for a living. trust me, i understand what you're trying to do.

if you're just trying to dump from memory to disk and back again in real-time, then a db is not the best tool for the job.

just sayin.

Quote from amazingIndustry:

I think you did not read my post carefully. I never mentioned that I look to recover data that has not been saved. I also never said that I look to only push market data through the database thus your suggestion is not applicable and you missed the point.

I said I look to implement a solution that can move time series based data from memory to disk as close to real-time as possible in order to recover such saved data in the event of a crash.

Your proposed solution does not really reflect going practice regarding cleaning/adjusting real-time streamed data. Generally real-time data is only "filtered" for bad ticks but nothing more than that in order to not degrade latency and throughput.

I reached a point where my systematic framework is mature enough to allocate time and money to improve persistence, recoverability, and an in-memory database that can persist data to disk in the background. Looking at your rather emotional outburst and your judgement of the non-usefulness of any sort of database without really knowing my setup and lack of knowledge of how I approach trading kind of disqualifies your post from being taken seriously. I do not mind you being judgmental but I do mind your unqualified judgement without factual support, especially given you totally misunderstood how I approach recoverability.
More...

amazingIndustry · Jul 9, 2012

To be honest you do not understand, I remind you AGAIN that I never said I want to save quotes to disk and then read them again in real-time.

I said quotes are only supposed to be persisted for fail-over purposes. Every institutional message bus (such as Tibco Rendezvous) is capable of persisting data in order to recover fast from crashes. Now, whether a database is the right solution or simply writing to binary files or such is a discussion I welcome and am open to learn from other. However, I am not in the mood to keep on repeating what I think I made very clear because you either do not want to understand or do not have the capability to understand. Please ignore my posts if you have an issue with my (I believe) very simple explanations.

Back to the point I seek advice on: I currently run an out-of-process data feed and execution module to which all strategies and dashboards connect to in order to send instructions (requests for information, instructions to flatten open positions, subscribe or unsubscribe to/from market data. The data feed engine connects to several data feeds and runs its on aggregation and consolidation engine.

I look for a solution that allows me to persist data within the data feed module to disk for recovery purposes. Obviously such routine would run within its own datablock (C# TPL Dataflow) or thread.

Quote from propseeker:

i do this for a living. trust me, i understand what you're trying to do.

if you're just trying to dump from memory to disk and back again in real-time, then a db is not the best tool for the job.

just sayin.
More...

propseeker · Jul 10, 2012

Quote from amazingIndustry:

To be honest you do not understand, I remind you AGAIN that I never said I want to save quotes to disk and then read them again in real-time.
More...

Quote from amazingIndustry:

I said I look to implement a solution that can move time series based data from memory to disk as close to real-time as possible in order to recover such saved data in the event of a crash.
More...

the only one here with a reading comprehension problem is you. my comments stand. stop being such a douche.