Rounding errors, loss of accuracy, and storing price and time data

Discussion in 'Data Sets and Feeds' started by abattia, Nov 11, 2011.

  1. How do you store price and time data in your database to avoid rounding errors and preserve accuracy:

    For prices, I use centicents (so $123.4501 becomes 1234501).

    For time, I currently don’t keep greater accuracy than second, so I just store 11/10/2011 10:01:31 as a string (“20111011”) and an integer (“100131”).

    What do you do?

    Anyone storing sub second time? Do you store the sub second bit with the super second bit? Or have a separate column for this?
     
  2. Mr_RC

    Mr_RC

    Well, I store whatever I have to store. If sub-second is given I store sub-second, if I have 10 digits I store 10 digits. KISS
     
  3. Time, including any subsecond portion, all in one column?

    Reason I ask is I've seen nanosecond subsecond data given its own column. I suppose it helps with quicker reading/access of tick data and analysis of events at subsecond level? And I assume helps avoid accidental rounding???

    Schucks! Thanks!
    ... but I barely know you
     
  4. Isn't a native time type in your database? Why are you micromanaging the storage of time? Even if you are using hdf5, you can just store it as miliseconds past unix/windows epoch - which is supposed as a builtin function in most languages.

     
  5. Thanks.

    I’m in no way pretending to be any sort of expert (as my post doubtless reveals very clearly!). I’m a self taught C# (monodevelop) and SQL Express R2 enthusiast …

    But, if I had very many events occurring in a single second (zillions, say … like say all the order submits, executes, cancels at all levels for a high volume instrument on BATS or NASDAQ), and my database held all those events for each day (so many thousands of seconds X zillions of events), and also held years of data (thousands of days X …. etc … etc)…

    - - ->

    … if I then wanted to analyze events that happened each day say between 12h00m00 and 12h00m01 across all the days …

    - - ->

    … wouldn’t the data access process work faster if I had “12h00m00” sitting as an entry for all the data I needed to access. Then it would just be a question of selecting data where the super second time column had “12h00m00” in each row, and ignoring the rest?

    [If I’ve committed some basic school boy error in my understanding of databases, bare with me for a few more moments for enough time to set me straight! Thanks!]
     
  6. I think that's a fair question.

    You can have two fields: a date field (smalldatetime in ms sql server), and a time field (datetime, but use store the time - datetime handles milliseconds). That way, you can query any particular time slice without bothering to do anything with the date.

    If you index both, your query efficiency is pretty good.

    SqlDataReader and the .Parameters.AddWithValue of a SqlCommand object will convert your c# data type into native database types.

    DON'T do your own data encoding. It's not worth it and it will degrade performance.

     
  7. Reading from MSDN (http://msdn.microsoft.com/en-us/library/system.datetime.aspx), in .NET Framework 4 “… Time values are measured in 100-nanosecond units called ticks”.

    Same for 4.5 and Silverlight …

    Therefore, unless I’ve misunderstood the foregoing, sub-100 nanosecond timing (i.e. sub-tick) isn’t handled in .Net Framework, and “own coding” is the only way forward. No?
     
  8. Are you saying that you want to be able to handle sub-tick time resolution?

    In that case, I suppose you have to handle it yourself - but if that's the case - .Net's datatypes is the least of your problems: the latency in hardware/network/IO->OS->APIs->your code can introduce non-determinism in sub-tick time.

    That is - under what possible scenario will you ever care about sub-tick time intervals?

     
  9. I have seen trading events sequenced and timed with nanosecond precision. Order matching simulators need to handle this precision correctly to simulate order queues and order matching. The data behind such a simulator needs to reside in a database.
     
  10. Interesting - but kind of irrelevant, no? (assuming you are not doing ultra-high-frequency trading or writing matching algorithm simulations).

    Since you are unlikely really encounter either of these two scenarios in real life trading (again, presuming you aren't doing any of the two above) since (1) your network/os/io layer will create such large latency in sub-tick stamped data such that they are more or less useless on their own, and (2) why would you bother doing anything with order matching - doubt you are working on high freq execution algos.

    Finally, if you really have such data (they are probably REALLY expensive), you'd either have to parse the custom date-time format anyway - in which case, just store them as some sort of double valued nano-seconds since epoch time, or compress them to lower resolution data that can actually be used.

     
    #10     Nov 16, 2011