Time series DB?

Discussion in 'Programming' started by sle, Dec 24, 2017.

  1. 2rosy

    2rosy

    issue with hdf5 was size, threading, corrupted files. the team moved from hdf5 to kdb. as someone mentioned columnar database vs row; find a free or cheap column db and just load into it. forget all these individual files
     
    #41     Dec 27, 2017
  2. sle

    sle

    Kdb+ or a similar Sunguard product are relatively easy choices, since the firm uses them already. Cost-wise, I’d be piggybacking on firm-wide licenses and there are experts in house. Of course, the flip side is that I have to learn a brand new product and programming language (I have touched kdb but never used it seriously)
     
    #42     Dec 27, 2017
    temnik likes this.
  3. temnik

    temnik

    A firm uses kdb already? get onboard! Kdb's language Q hurts a lot at first, like you are getting a lobotomy, but there's denfinitely a state of bliss afterwards.

    Is the "firm" a bank or a prop?
     
    #43     Dec 27, 2017
  4. sle

    sle

    That was my main concern, to be honest.

    The “firm” is a hedge fund.
     
    #44     Dec 27, 2017
  5. Would hate to thread jack @sle so let's not get off on a huge tangent, but since so many knowledgeable members have commented I wanted to ask if anyone would be willing to weigh in on the viability of the approach I suggested. Keep in mind this is only applicable to sequential replay of messages and especially so for horizontally scaling to many market data handlers (Kafka consumers) and feeding them the same data concurrently. Parameter sweep, par exempla.

    I realize it's not going to be ideal for querying like "SELECT * FROM trades WHERE size < 5." I believe that would be referred to as "random seek" in this context (please feel free to correct me).

    The idea also of course hinges on having a continuous delivery pipeline so that you can deploy Kafka consumers which implement your market data handler.

    How do you think this would perform vs using one really beefy host and a local timeseries database? Am I just stringing fancy technologies together hoping they amount to some gestalt?
     
    Last edited: Dec 28, 2017
    #45     Dec 28, 2017
  6. Simples

    Simples

    It sounds pretty straightforward and doable, however:
    1) I personally would like to avoid Java if I can, so would only consider Kafka for intended usage = configurable high throughput and/or high parallellism, maxing out network bandwidth. If it's all on one box, Kafka is clearly overkill. The network won't be a bottleneck on a single machine.
    2) HBase, while neat and probably very fast for its intended use, could also be overkill for this job.
    3) It's one thing to try it out, and you can, just to learn from it. Another thing to maintain it for years and try to adapt it to your own continued development of solution. Both can require alot of time, work and costs.

    Some of it depend on who will do what and what qualities the solution should have.
    Delaying choices keeping options open, while not sexy sounding, is often the better way, than trying to figure it all out when you know the least: at the start.
    Though if you know you're going to need such scalability, it starts making sense to test its capabilities early.
     
    #46     Dec 28, 2017
    sle likes this.
  7. T0pH4t

    T0pH4t

    My 2cents:
    I was using influx db for my market data a while back and I switched off of it due to performance reasons. It has an awesome query language for what I needed but it's ingest/read times were not meeting my requirements. I then switch over to rocksdb and its been solid (had an entire order of magnitude in performance improvments). Rocksdb is not the solution for everyone tho, since its just a key/value store. You have to write the query code for yourself.

    I guess it really comes down to what your SLA is on read/write latency, as well as how much you don't mind building yourself.
     
    #47     Jan 14, 2018
    Simples and sle like this.
  8. T0pH4t

    T0pH4t

    #48     Jan 14, 2018
  9. i960

    i960

    Honestly time series DBs aren't really all that special. Either they're columar or row based and offer a time series specific query language or something generic. To a more specific level they may be oriented towards financial data. All may or may not support replication, transactions, clustering, etc.

    We won't really get anything that truly covers everyone's bases in a robust fashion until we get a full opensource project with multiple contributors. Until then it's going to be commercial entities trying to lock things down to their specific products.

    Also, most benchmarks are rigged or not general purpose as the problem domain really isn't anything new - yet developers keep acting as if they've rediscovered the wheel.
     
    #49     Jan 14, 2018
    T0pH4t likes this.
  10. Simples

    Simples

    "The wheel" is uninteresting, really. Commercial / enterprise software stuff looks good, until you're outside its intended scope. You can program a "wheel" in a couple of days for simple usage, and often that can be much better for R&D efforts or efforts to tackle new scopes of problems rather than shoehorning fresh ideas into old garments. About the only times "the wheel" becomes truly valuable on its own, is when it's been a success for a long time and there's not much need for scoping in fresh starts and innovation, or it truly fullfills such a role already (rare). So you can make any software, commercial or open source, but they can't truly bend the law of gravity, being bound by scope and limited sets of requirements, features, qualities and complexity.

    It's possible to give up, and just make what you need yourself, free yourself of other's ideas and implementations. It's a long road, but you get what you make yourself, and at least have a chance to test out your dreams.

    I can understand building lots of DBs and scaling in any direction, just to pay for annoyances and limitations to go away, but they're never a necessary start of custom coding and introduce accidental complexities at too early stages. But also depending on use cases: who will build it, use it and maintain/evolve it further, so just to illustrate the natural way but not the only way.

    People never started great works by building cathedrals right off the bat.
    Someone's cathedral might be another's dinosaur.
     
    #50     Jan 15, 2018