Agreed. It is just easier to import one of those libraries as we will eventually need them for extra calculations. Whilst it is true that we won't need them for basic statistics, those are just the starting steps of a program and those averages will be later used in more complex calculations, normally done in pandas or numpy.
The example also doesn't really show how awesome Pandas is. How about take each value in the list and set it to a business day into the future starting today, then calculate the mean between 10-10-23 and 10-16-23? Then plot the values between 10/10 and 10/16? All just so easy and easy to understand. import pandas as pd data = [12, 45, 67, 23, 41, 89, 34, 54, 21] dates = pd.date_range(start="10/5/2023", periods=len(data), freq='B') df = pd.DataFrame(data, columns=['Price'], index=dates) df.loc['2023-10-10':'2023-10-16', 'Price'].mean() df.loc['2023-10-10':'2023-10-16', 'Price'].plot()
Numpy is widely used and a fantastic tool. Pandas is fast losing popularity though, even after moving to Arrow, it's slow and a massive memory hog.
I remember you use duckdb, right? Mostly for data storage? I find polars very performant for quick analytics. Clickhouse for data storage and transformations/aggregations.
Yes, migrated to DuckDB for persistence and Polars for analysis, although the line isn't clear. Technically I should be able to stay with just DuckDB but I'm tired of rewriting stuff. Really like DuckDB being file based and not a server. Since I'm integrating it with a GUI app, making and restoring backups is just done with a copy, very convenient. Being the noob that I am, it's 2023 and I'm only now getting familiar with SQL.
File based is definitely convenient. I can't count the hours spent on setting up retention, backup, and migration scripts and policies for several databases. I wonder how good the compression algorithm at duckdb is for large datasets of tick and ohlc data.
A quick thanks to @M.W. and @d08 This year, I've been updating my dev skills, ... Things to install this weekend... Polars and DuckDB!! A thank you in the current ET environment? Yup. When I wrote this code, only God and I understood what it did. Now, only God knows. ~~ Anonymous
I haven't seen any comprehensive comparisons. If you decide to compare, it would be very interesting to see the results.
I actually ignored DuckDB for a few years now because of SQL. Just a completely irrational move on my part. I am finding DuckDB to basically be the greatest thing since sliced bread. It would be surprising if DuckDB had any issue with market data. Any test I have seen on it is that is is blazing fast. For my use case, the speed is a non-issue, I am just not that high frequency. The way I would test it though would be to make a synthetic dataset that is 2x-10X larger than what I think I will actually need and see how the performance is for my use case.
sqlite3 ? SQLite is the most used database engine in the world. https://www.geeksforgeeks.org/python-pandas-dataframe/ https://www.slingacademy.com/article/pandas-how-to-store-a-dataframe-in-a-sqlite-table/ https://fessorpro.com/blogs/c/pytho...ython-sqlite3-database-for-storing-stock-data