Tick Database Implementations

amazingIndustry · Jul 1, 2012

I think your comparison is slightly unfair, its comparing apples with oranges to look at in-mem dbs on one side and persisted dbs on the other hand. KDB can be run purely persisted or in-memory or as a combination of both. HDF5 on the other hand to my knowledge is a purely persisted db. Obviously stuff queried in memory is a lot faster without going the disk I/O route, even if query logic is not well written.

Quote from Makis:

Yes, TimesTen is strictly an in memory db. You cannot have more data than your available memory. There is an agent running as a separate process which receives messages via TCP/IP and replicates into disk. This is done asynchronously (in other words, you may loose messages if your db crashes, but that is a 'feature' of all in-memory databases.).
I have worked with in memory databases as a user and not admin, so I am not very aware of the internal workings, but I guess TimesTen offers different variations of persistence via configuration, which may influence performance. I have seen environments where persistence/replication has been disabled all together to gain some speed, and at the end of the day a 'save' command writes everything to a disk file.
More...

DaManJ · Jul 2, 2012

Hey mate,
I use SQL 2008 for storing tick data, using the datetime2 datatype to store tick timestamps down to the microsecond.

*** Here is the schema for my tick table -

CREATE TABLE [dbo].[TICK](
[tickId] [bigint] IDENTITY(1,1) NOT NULL,
[tradeDate] [date] NOT NULL,
[securityId] [smallint] NOT NULL,
[tickTime] [datetime2](6) NOT NULL,
[tickType] [smallint] NOT NULL,
[tickQty] [int] NOT NULL,
[tickValue] [numeric](12, 6) NOT NULL,
CONSTRAINT [PK_TICK] PRIMARY KEY CLUSTERED
(
[tickId] ASC
)

*** Here is the sql for the index that you need to enable speedy queries on this table

CREATE NONCLUSTERED INDEX [IX_TICK] ON [dbo].[TICK]
(
[tradeDate] ASC,
[securityId] ASC,
[tickTime] ASC,
[tickType] ASC
)

*** Here is the schema for my tick-type table (i.e. best bid, best ask, trade etc)

CREATE TABLE [dbo].[TICK_TYPES](
[tickType] [tinyint] NOT NULL,
[tickTypeDesc] [varchar](max) NULL,
CONSTRAINT [PK_TICK_TYPES] PRIMARY KEY CLUSTERED
(
[tickType] ASC
)

*** Here is the schema for my securities table

CREATE TABLE [dbo].[SECURITY](
[securityId] [smallint] NOT NULL,
[symbol] [varchar](20) NOT NULL,
[secType] [varchar](10) NOT NULL,
[exchange] [varchar](10) NOT NULL,
[externalSecurityId] [varchar](30) NULL,
[underlyingSecurityId] [smallint] NULL,
[currency] [varchar](3) NULL,
[varchar](20) NULL,
[multiplier] [int] NULL,
[minTick] [decimal](6, 6) NULL,
[expiryDate] [date] NULL,
[expired] AS (case when CONVERT([date],getutcdate(),(0))>=[expiryDate] then CONVERT([bit],(1),0) else CONVERT([bit],(0),0) end),
[floorOpenTime] [time](7) NULL,
[floorCloseTime] [time](7) NULL,
[TimeZone] [varchar](10) NULL,
CONSTRAINT [PK_SECURITY] PRIMARY KEY CLUSTERED
(
[securityId] ASC
)

*** Here is an example query for querying the tick-table

select tickTime, tickType, tickQty, tickValue
from TICK
where tradeDate = @_tradeDate
and securityId = @_securityId
and tickTime >= @sessionstart
and tickTime <= @sessionend
and (tickType = @tickType or @tickType is null)
order by tickId

****

So as you can see from the above, I put tick-data for all securities into the same table, but use an index to speed up my queries.
These queries are fast for my backtesting.

NetTecture · Jul 2, 2012

You are possibly inefficient - try using:

* A DateTime2 for timestamp, no need to have separate fields for date and time.
* Possibly get rid of the bigint. Dependinw what you do you could index the tick in the timestamp using a lower granularity than the feed granularity for that.

The last one kills a bigint - and that can be a significant saving. Anything else - you could possibly save significant by storing th e price in ticks, encoded in a custom field type using 4 bytes only with full granularity.

amazingIndustry · Jul 2, 2012

mostly agree, though in the end it always comes down to a trade off between size and speed. The more you "compress" data (such as pricing data) the more you need to cast and convert later after reading the data. (either yourself or on the server side).

Quote from NetTecture:

You are possibly inefficient - try using:

* A DateTime2 for timestamp, no need to have separate fields for date and time.
* Possibly get rid of the bigint. Dependinw what you do you could index the tick in the timestamp using a lower granularity than the feed granularity for that.

The last one kills a bigint - and that can be a significant saving. Anything else - you could possibly save significant by storing th e price in ticks, encoded in a custom field type using 4 bytes only with full granularity.
More...

NetTecture · Jul 2, 2012

Quote from amazingIndustry:

mostly agree, though in the end it always comes down to a trade off between size and speed. The more you "compress" data (such as pricing data) the more you need to cast and convert later after reading the data. (either yourself or on the server side).
More...

Fully agree.

Just last time I was loading my tick data it was 50 gigabyte per trading day Given the HUGH amount of storage needed agains the ease of having more cores on the database server (which, usual in SQL, are mostly doing nothing unless you aggregate etc.) this is a VERY easy guess to make

DaManJ · Jul 2, 2012

Quote from NetTecture:

You are possibly inefficient - try using:

* A DateTime2 for timestamp, no need to have separate fields for date and time.
* Possibly get rid of the bigint. Dependinw what you do you could index the tick in the timestamp using a lower granularity than the feed granularity for that.

The last one kills a bigint - and that can be a significant saving. Anything else - you could possibly save significant by storing th e price in ticks, encoded in a custom field type using 4 bytes only with full granularity.
More...

Hey mate, thx for the suggestions.
I know the seperate date field is reduntant, however it speeds up the index when querying by date.
In terms of using the timestamp as the index, these timestamps are provided by my data-provider and sometimes updates from the exchange will come in with the same timestamp for a few ticks, thus making it non-unique.

amazingIndustry · Jul 2, 2012

fair point. There are couple savings to be had. On the fx side one deals with quotes anyway, no reported trades making things a lot easier. I store DateTime as long (8bit) as DateTimeTicks. bids can be stored as 4bit offset, offer as offset from bid. Anything else, imho, will degrade the speed-size trade off. Similar things can be done on other asset classes but it gets slightly more complex because one needs to differentiate between quote updates and trades. However, even a full order book can be stored and retrieved very conveniently using offsets.

Quote from NetTecture:

Fully agree.

Just last time I was loading my tick data it was 50 gigabyte per trading day Given the HUGH amount of storage needed agains the ease of having more cores on the database server (which, usual in SQL, are mostly doing nothing unless you aggregate etc.) this is a VERY easy guess to make
More...

DaManJ · Jul 2, 2012

Regarding the Op's initial point of running on-the-fly queries to the database whilst the strategy is running, I question the necessity/relevance of doing this.

For my own strategies, I build underlying tick-data into in memory time-series on the fly, with a maximum length after which data is dropped off.

This is adequate for building indicators on various time lengths.

For instance, I might create a bar-chain of 1 second bars, chained for 3 minutes.
Or I might create a bar-chain of 30 second bars, chained for 60 minutes.

1 second bars grow from tick data, 5 second bars grow from 1 second bars, 30 second bars grow from 5 second bars and so forth.

Bars have properties like Open, High, Low, Close, Volume, well in fact here is my bar structure:

public abstract class IBar
{
protected double _sumPrice;
protected Tick _prevTradeTick;
protected Tick _prevBidTick, _prevAskTick;

public DateTime StartTime, EndTime;
public double Open, High, Low, Close;
public double TotalPriceChange, PositivePriceChange, NegativePriceChange;
public short BarLength;
public long Volume, VolumeBid, VolumeAsk, TradeChangeFreq, TimeDown, TimeUp, TimeFlat, TimeOnBid, TimeOnAsk;
public long BearQtyChange, BullQtyChange;
public Security Sec;

public double TypicalPrice()
{
return (High + Low + Close) / 3.0;
}
public double WeightedClosePrice()
{
return (High + Low + Close + Close) / 4.0;
}
public double AveragePrice()
{
if (Volume == 0) return Volume;
return _sumPrice / (double)Volume;
}
public abstract int UniquePriceCount();

public IBar(Security sec, short barLength)
{
Sec = sec;
BarLength = barLength;
}

public abstract void AddTick(Tick tick);

}

Regards,
J

nitro · Mar 17, 2013

Has anyone tried extremedb?

http://financial.mcobject.com

nitro · May 16, 2016

It is amazing how much easier and inexpensive this has become since this thread was started.

The answer is Cassandra.