Rounding errors, loss of accuracy, and storing price and time data

abattia · Nov 11, 2011

How do you store price and time data in your database to avoid rounding errors and preserve accuracy:

For prices, I use centicents (so $123.4501 becomes 1234501).

For time, I currently donât keep greater accuracy than second, so I just store 11/10/2011 10:01:31 as a string (â20111011â) and an integer (â100131â).

What do you do?

Anyone storing sub second time? Do you store the sub second bit with the super second bit? Or have a separate column for this?

Mr_RC · Nov 11, 2011

Well, I store whatever I have to store. If sub-second is given I store sub-second, if I have 10 digits I store 10 digits. KISS

abattia · Nov 11, 2011

Quote from Mr_RC:
...If sub-second is given I store sub-second....
More...

Time, including any subsecond portion, all in one column?

Reason I ask is I've seen nanosecond subsecond data given its own column. I suppose it helps with quicker reading/access of tick data and analysis of events at subsecond level? And I assume helps avoid accidental rounding???

Quote from Mr_RC:
... KISS
More...

Schucks! Thanks!
... but I barely know you

DontMissTheBus · Nov 11, 2011

Isn't a native time type in your database? Why are you micromanaging the storage of time? Even if you are using hdf5, you can just store it as miliseconds past unix/windows epoch - which is supposed as a builtin function in most languages.

Quote from abattia:

Time, including any subsecond portion, all in one column?

Reason I ask is I've seen nanosecond subsecond data given its own column. I suppose it helps with quicker reading/access of tick data and analysis of events at subsecond level? And I assume helps avoid accidental rounding???

Schucks! Thanks!
... but I barely know you
More...

abattia · Nov 11, 2011

Quote from DontMissTheBus:

Isn't a native time type in your database? Why are you micromanaging the storage of time? Even if you are using hdf5, you can just store it as miliseconds past unix/windows epoch - which is supposed as a builtin function in most languages.
More...

Thanks.

Iâm in no way pretending to be any sort of expert (as my post doubtless reveals very clearly!). Iâm a self taught C# (monodevelop) and SQL Express R2 enthusiast â¦

But, if I had very many events occurring in a single second (zillions, say â¦ like say all the order submits, executes, cancels at all levels for a high volume instrument on BATS or NASDAQ), and my database held all those events for each day (so many thousands of seconds X zillions of events), and also held years of data (thousands of days X â¦. etc â¦ etc)â¦

- - ->

â¦ if I then wanted to analyze events that happened each day say between 12h00m00 and 12h00m01 across all the days â¦

- - ->

â¦ wouldnât the data access process work faster if I had â12h00m00â sitting as an entry for all the data I needed to access. Then it would just be a question of selecting data where the super second time column had â12h00m00â in each row, and ignoring the rest?

[If Iâve committed some basic school boy error in my understanding of databases, bare with me for a few more moments for enough time to set me straight! Thanks!]

DontMissTheBus · Nov 11, 2011

I think that's a fair question.

You can have two fields: a date field (smalldatetime in ms sql server), and a time field (datetime, but use store the time - datetime handles milliseconds). That way, you can query any particular time slice without bothering to do anything with the date.

If you index both, your query efficiency is pretty good.

SqlDataReader and the .Parameters.AddWithValue of a SqlCommand object will convert your c# data type into native database types.

DON'T do your own data encoding. It's not worth it and it will degrade performance.

Quote from abattia:

Thanks.

Iâm in no way pretending to be any sort of expert (as my post doubtless reveals very clearly!). Iâm a self taught C# (monodevelop) and SQL Express R2 enthusiast â¦

But, if I had very many events occurring in a single second (zillions, say â¦ like say all the order submits, executes, cancels at all levels for a high volume instrument on BATS or NASDAQ), and my database held all those events for each day (so many thousands of seconds X zillions of events), and also held years of data (thousands of days X â¦. etc â¦ etc)â¦

- - ->

â¦ if I then wanted to analyze events that happened each day say between 12h00m00 and 12h00m01 across all the days â¦

- - ->

â¦ wouldnât the data access process work faster if I had â12h00m00â sitting as an entry for all the data I needed to access. Then it would just be a question of selecting data where the super second time column had â12h00m00â in each row, and ignoring the rest?

[If Iâve committed some basic school boy error in my understanding of databases, bare with me for a few more moments for enough time to set me straight! Thanks!]
More...

abattia · Nov 16, 2011

Quote from DontMissTheBus:
...You can have two fields: a date field (smalldatetime in ms sql server), and a time field (datetime, but use store the time - datetime handles milliseconds)...SqlDataReader and the .Parameters.AddWithValue of a SqlCommand object will convert your c# data type into native database types... DON'T do your own data encoding. It's not worth it and it will degrade performance.
More...

Reading from MSDN (http://msdn.microsoft.com/en-us/library/system.datetime.aspx), in .NET Framework 4 ââ¦ Time values are measured in 100-nanosecond units called ticksâ.

Same for 4.5 and Silverlight â¦

Therefore, unless Iâve misunderstood the foregoing, sub-100 nanosecond timing (i.e. sub-tick) isnât handled in .Net Framework, and âown codingâ is the only way forward. No?

DontMissTheBus · Nov 16, 2011

Are you saying that you want to be able to handle sub-tick time resolution?

In that case, I suppose you have to handle it yourself - but if that's the case - .Net's datatypes is the least of your problems: the latency in hardware/network/IO->OS->APIs->your code can introduce non-determinism in sub-tick time.

That is - under what possible scenario will you ever care about sub-tick time intervals?

Quote from abattia:

Reading from MSDN (http://msdn.microsoft.com/en-us/library/system.datetime.aspx), in .NET Framework 4 ââ¦ Time values are measured in 100-nanosecond units called ticksâ.

Same for 4.5 and Silverlight â¦

Therefore, unless Iâve misunderstood the foregoing, sub-100 nanosecond timing (i.e. sub-tick) isnât handled in .Net Framework, and âown codingâ is the only way forward. No?
More...

abattia · Nov 16, 2011

Quote from DontMissTheBus:
... That is - under what possible scenario will you ever care about sub-tick time intervals?...
More...

I have seen trading events sequenced and timed with nanosecond precision. Order matching simulators need to handle this precision correctly to simulate order queues and order matching. The data behind such a simulator needs to reside in a database.

DontMissTheBus · Nov 16, 2011

Interesting - but kind of irrelevant, no? (assuming you are not doing ultra-high-frequency trading or writing matching algorithm simulations).

Since you are unlikely really encounter either of these two scenarios in real life trading (again, presuming you aren't doing any of the two above) since (1) your network/os/io layer will create such large latency in sub-tick stamped data such that they are more or less useless on their own, and (2) why would you bother doing anything with order matching - doubt you are working on high freq execution algos.

Finally, if you really have such data (they are probably REALLY expensive), you'd either have to parse the custom date-time format anyway - in which case, just store them as some sort of double valued nano-seconds since epoch time, or compress them to lower resolution data that can actually be used.

Quote from abattia:

I have seen trading events sequenced and timed with nanosecond precision. Order matching simulators need to handle this precision correctly to simulate order queues and order matching. The data behind such a simulator needs to reside in a database.
More...