Datastore design recommendations

Discussion in 'Automated Trading' started by Clark Bruno, Jan 24, 2022.

  1. M.W.

    M.W.

    Groupbys and joins favor dbs that don't rely on segregated tables and especially when single file, such as embedded dbs. It's not really clear to me what duckdb actually is last time I checked and ran some tests by running it inside my host app. Is it based on rmbs concepts? Kind of a bit, olap yes, but also not quite. I got the impression it tries to be a bit of everything with focus on ease of use. It's certainly not optimized for time series. The tests described in the link look to me very biased to favor small footprint storage technology that is directly integrated along the host application. A completely different use case than the issues applications like clickhouse or timebase are targeting. And certainly not optimized for financial data or timeseries.

    TimeBase is open source and so are all its client libraries. And so is clickhouse.

     
    Last edited: Jul 12, 2023
    #51     Jul 12, 2023
    lanty and shuraver like this.
  2. d08

    d08

    Neither is Clickhouse, which also tries to be general purpose except transactions.

    Timebase CE seems to be open source. I cannot find any benchmarking on it so thus far we're just theorising on the performance. There seems to be a general lack of information on it as well which makes getting stuff done more of a challenge.
    TimescaleDB is another solution also intended for time series data but seems to be slower than Clickhouse and DuckDB, so specialisation isn't always the answer.
     
    #52     Jul 12, 2023
    shuraver likes this.
  3. M.W.

    M.W.

    I have used clickhouse extensively for timeseries tasks. It is in some cases faster than even kdb. At the moment clickhouse is by far the most performant data store for timeseries focused applications in open source space. I have spent a lot of time profiling and trying tons of different databases. If duckdb suits your particular use case then great. But I compared duckdb and clickhouse to append, insert, prepend new datapoints into existing data, generated bars from ticks on the fly, loaded time series extracts, built windowed slices, and picked data points of fixed strides. In every case clickhouse blew duckdb out of the water. No other open source columnar data store could compete either (obviously one must compare apples and apples, can't compare performance between an inprocress only DB with one that has to fetch data from ssd/nvme. Though even here ch outperformed duckdb for above mentioned queries and writes).

    I never made any performance claims re timebase, and you are right, a huge downside is the lack of mature community, very hard to solve certain issues with only the api documentation and only few samples.



     
    Last edited: Jul 12, 2023
    #53     Jul 12, 2023
  4. d08

    d08

    DDB is in-process but can operate from persistent data, not just in-memory. But whatever works for you. I'm sure me and many others would like to see your comparison in more detail.
     
    #54     Jul 12, 2023
  5. M.W.

    M.W.

    I am sure a lot would love to see fairer performance benchmarks across the board. Joins and groupby is not a comprehensive benchmark comparison, I bet you would agree.

     
    #55     Jul 13, 2023
  6. lanty

    lanty

    It seems like Groupbys and joins thrive on databases that integrate well, like embedded ones. DuckDB appears to aim for versatility rather than specializing, which might not suit all use cases. TimeBase and ClickHouse offer open-source solutions tailored for specific needs.
     
    #56     Apr 21, 2024
  7. Considering your specific case, it might be advisable to create separate tables for each symbol within an asset class. This approach allows for easier management and more targeted queries, especially as your dataset grows. However, performance-wise, Clickhouse can handle the scale you mentioned without significant issues. For more insights and best practices, you might consider consulting with a UI/UX design firm. While they specialize in user experience, they often have expertise in data organization and architecture as well, providing valuable perspectives on structuring your database effectively.
     
    Last edited by a moderator: Jun 28, 2024
    #57     Apr 24, 2024