Haha thanks, but it's the worst because there are so many possibilities for bugs. It's completely uncommented and extremely fragile.
And I assume you are using some kind of 'master' file to store symbol specific information like file name / location, general symbol data like exchange and other 'contract specs' in case of futures data for example?
Thanks for asking. A good question. Files are indexed by symbol name (and date for intra-day data). All other symbol info including company profile, fundamentals besides quot data are stored in data files. In my system, daily data are divided in to per-symbol files and minute data are divided into per-symbol and per-day files. Data files are stored in a high-performance computer file system ( usually through hash table to locate a file in a directory structure). Also active data are cached in RAM. As an example, retrieving basic info for a list of 1000 symbols takes less than 0.1 seconds. I have tools to conveniently update binary data files so as to add new data points daily and/or real-time.
I use csv files. My data files are rather small because I only use daily OHLC price data for the last 2~3 years.
I'm thinking I want to do this as well, not to actually use it directly, but so I can transform it later. Why did you choose rocksdb vs CSV or pgsql or SQLite?
A chose rocksDb because its a simple key/value store optimized for append operations where I don't need to modify old data. For my access patterns which are scans for backtesting, its the most optimal choice. Most key/value and columnar stores are better suited to this type of work over relational databases (eg PgSQL, SQLite). RocksDb is very low level and is not for the novice. Higher level key values stores that could be used are InfluxDb (which I have moved off of for performance reasons) or KDb+ for example. I have years of raw tick data, so this works best for me. If you are not storing data with granularity of < 1min then any relational db will be fine.
Maybe take a look at this thread. I posted a couple of times and there is a lot of good info in it. https://www.elitetrader.com/et/threads/time-series-db.316394/