I think your comparison is slightly unfair, its comparing apples with oranges to look at in-mem dbs on one side and persisted dbs on the other hand. KDB can be run purely persisted or in-memory or as a combination of both. HDF5 on the other hand to my knowledge is a purely persisted db. Obviously stuff queried in memory is a lot faster without going the disk I/O route, even if query logic is not well written.
Hey mate, I use SQL 2008 for storing tick data, using the datetime2 datatype to store tick timestamps down to the microsecond. *** Here is the schema for my tick table - CREATE TABLE [dbo].[TICK]( [tickId] [bigint] IDENTITY(1,1) NOT NULL, [tradeDate] [date] NOT NULL, [securityId] [smallint] NOT NULL, [tickTime] [datetime2](6) NOT NULL, [tickType] [smallint] NOT NULL, [tickQty] [int] NOT NULL, [tickValue] [numeric](12, 6) NOT NULL, CONSTRAINT [PK_TICK] PRIMARY KEY CLUSTERED ( [tickId] ASC ) *** Here is the sql for the index that you need to enable speedy queries on this table CREATE NONCLUSTERED INDEX [IX_TICK] ON [dbo].[TICK] ( [tradeDate] ASC, [securityId] ASC, [tickTime] ASC, [tickType] ASC ) *** Here is the schema for my tick-type table (i.e. best bid, best ask, trade etc) CREATE TABLE [dbo].[TICK_TYPES]( [tickType] [tinyint] NOT NULL, [tickTypeDesc] [varchar](max) NULL, CONSTRAINT [PK_TICK_TYPES] PRIMARY KEY CLUSTERED ( [tickType] ASC ) *** Here is the schema for my securities table CREATE TABLE [dbo].[SECURITY]( [securityId] [smallint] NOT NULL, [symbol] [varchar](20) NOT NULL, [secType] [varchar](10) NOT NULL, [exchange] [varchar](10) NOT NULL, [externalSecurityId] [varchar](30) NULL, [underlyingSecurityId] [smallint] NULL, [currency] [varchar](3) NULL, [varchar](20) NULL, [multiplier] [int] NULL, [minTick] [decimal](6, 6) NULL, [expiryDate] [date] NULL, [expired] AS (case when CONVERT([date],getutcdate(),(0))>=[expiryDate] then CONVERT([bit],(1),0) else CONVERT([bit],(0),0) end), [floorOpenTime] [time](7) NULL, [floorCloseTime] [time](7) NULL, [TimeZone] [varchar](10) NULL, CONSTRAINT [PK_SECURITY] PRIMARY KEY CLUSTERED ( [securityId] ASC ) *** Here is an example query for querying the tick-table select tickTime, tickType, tickQty, tickValue from TICK where tradeDate = @_tradeDate and securityId = @_securityId and tickTime >= @sessionstart and tickTime <= @sessionend and (tickType = @tickType or @tickType is null) order by tickId **** So as you can see from the above, I put tick-data for all securities into the same table, but use an index to speed up my queries. These queries are fast for my backtesting.
You are possibly inefficient - try using: * A DateTime2 for timestamp, no need to have separate fields for date and time. * Possibly get rid of the bigint. Dependinw what you do you could index the tick in the timestamp using a lower granularity than the feed granularity for that. The last one kills a bigint - and that can be a significant saving. Anything else - you could possibly save significant by storing th e price in ticks, encoded in a custom field type using 4 bytes only with full granularity.
mostly agree, though in the end it always comes down to a trade off between size and speed. The more you "compress" data (such as pricing data) the more you need to cast and convert later after reading the data. (either yourself or on the server side).
Fully agree. Just last time I was loading my tick data it was 50 gigabyte per trading day Given the HUGH amount of storage needed agains the ease of having more cores on the database server (which, usual in SQL, are mostly doing nothing unless you aggregate etc.) this is a VERY easy guess to make
Hey mate, thx for the suggestions. I know the seperate date field is reduntant, however it speeds up the index when querying by date. In terms of using the timestamp as the index, these timestamps are provided by my data-provider and sometimes updates from the exchange will come in with the same timestamp for a few ticks, thus making it non-unique.
fair point. There are couple savings to be had. On the fx side one deals with quotes anyway, no reported trades making things a lot easier. I store DateTime as long (8bit) as DateTimeTicks. bids can be stored as 4bit offset, offer as offset from bid. Anything else, imho, will degrade the speed-size trade off. Similar things can be done on other asset classes but it gets slightly more complex because one needs to differentiate between quote updates and trades. However, even a full order book can be stored and retrieved very conveniently using offsets.
Regarding the Op's initial point of running on-the-fly queries to the database whilst the strategy is running, I question the necessity/relevance of doing this. For my own strategies, I build underlying tick-data into in memory time-series on the fly, with a maximum length after which data is dropped off. This is adequate for building indicators on various time lengths. For instance, I might create a bar-chain of 1 second bars, chained for 3 minutes. Or I might create a bar-chain of 30 second bars, chained for 60 minutes. 1 second bars grow from tick data, 5 second bars grow from 1 second bars, 30 second bars grow from 5 second bars and so forth. Bars have properties like Open, High, Low, Close, Volume, well in fact here is my bar structure: public abstract class IBar { protected double _sumPrice; protected Tick _prevTradeTick; protected Tick _prevBidTick, _prevAskTick; public DateTime StartTime, EndTime; public double Open, High, Low, Close; public double TotalPriceChange, PositivePriceChange, NegativePriceChange; public short BarLength; public long Volume, VolumeBid, VolumeAsk, TradeChangeFreq, TimeDown, TimeUp, TimeFlat, TimeOnBid, TimeOnAsk; public long BearQtyChange, BullQtyChange; public Security Sec; public double TypicalPrice() { return (High + Low + Close) / 3.0; } public double WeightedClosePrice() { return (High + Low + Close + Close) / 4.0; } public double AveragePrice() { if (Volume == 0) return Volume; return _sumPrice / (double)Volume; } public abstract int UniquePriceCount(); public IBar(Security sec, short barLength) { Sec = sec; BarLength = barLength; } public abstract void AddTick(Tick tick); } Regards, J
It is amazing how much easier and inexpensive this has become since this thread was started. The answer is Cassandra.