Data Analysis with Python: Calculating Mean and Median

Drawdown Addict · Oct 5, 2023

GlobalMacro90 said:
well the entire quant community these days uses pandas and numpy as the primary data manipulation/analysis tooling so…
More...

Agreed. It is just easier to import one of those libraries as we will eventually need them for extra calculations.

Whilst it is true that we won't need them for basic statistics, those are just the starting steps of a program and those averages will be later used in more complex calculations, normally done in pandas or numpy.

metalztrader · Oct 5, 2023

The example also doesn't really show how awesome Pandas is. How about take each value in the list and set it to a business day into the future starting today, then calculate the mean between 10-10-23 and 10-16-23? Then plot the values between 10/10 and 10/16? All just so easy and easy to understand.

import pandas as pd

data = [12, 45, 67, 23, 41, 89, 34, 54, 21]

dates = pd.date_range(start="10/5/2023", periods=len(data), freq='B')
df = pd.DataFrame(data, columns=['Price'], index=dates)

df.loc['2023-10-10':'2023-10-16', 'Price'].mean()
df.loc['2023-10-10':'2023-10-16', 'Price'].plot()

GlobalMacro90 said:
well the entire quant community these days uses pandas and numpy as the primary data manipulation/analysis tooling so…
More...

d08 · Nov 9, 2023

GlobalMacro90 said:
well the entire quant community these days uses pandas and numpy as the primary data manipulation/analysis tooling so…
More...

Numpy is widely used and a fantastic tool. Pandas is fast losing popularity though, even after moving to Arrow, it's slow and a massive memory hog.

M.W. · Nov 9, 2023

I remember you use duckdb, right? Mostly for data storage? I find polars very performant for quick analytics. Clickhouse for data storage and transformations/aggregations.

d08 said:
Numpy is widely used and a fantastic tool. Pandas is fast losing popularity though, even after moving to Arrow, it's slow and a massive memory hog.
More...

d08 · Nov 9, 2023

M.W. said:
I remember you use duckdb, right? Mostly for data storage? I find polars very performant for quick analytics. Clickhouse for data storage and transformations/aggregations.
More...

Yes, migrated to DuckDB for persistence and Polars for analysis, although the line isn't clear. Technically I should be able to stay with just DuckDB but I'm tired of rewriting stuff.
Really like DuckDB being file based and not a server. Since I'm integrating it with a GUI app, making and restoring backups is just done with a copy, very convenient.
Being the noob that I am, it's 2023 and I'm only now getting familiar with SQL.

M.W. · Nov 9, 2023

File based is definitely convenient. I can't count the hours spent on setting up retention, backup, and migration scripts and policies for several databases.

I wonder how good the compression algorithm at duckdb is for large datasets of tick and ohlc data.

d08 said:
Yes, migrated to DuckDB for persistence and Polars for analysis, although the line isn't clear. Technically I should be able to stay with just DuckDB but I'm tired of rewriting stuff.
Really like DuckDB being file based and not a server. Since I'm integrating it with a GUI app, making and restoring backups is just done with a copy, very convenient.
Being the noob that I am, it's 2023 and I'm only now getting familiar with SQL.
More...

tiddlywinks · Nov 9, 2023

A quick thanks to @M.W. and @d08

This year, I've been updating my dev skills, ...
Things to install this weekend... Polars and DuckDB!!

A thank you in the current ET environment?
Yup.

When I wrote this code, only God and I understood what it did. Now, only God knows. ~~ Anonymous

d08 · Nov 10, 2023

M.W. said:
File based is definitely convenient. I can't count the hours spent on setting up retention, backup, and migration scripts and policies for several databases.

I wonder how good the compression algorithm at duckdb is for large datasets of tick and ohlc data.
More...

I haven't seen any comprehensive comparisons. If you decide to compare, it would be very interesting to see the results.

shadowmanifold · Nov 8, 2024

I actually ignored DuckDB for a few years now because of SQL. Just a completely irrational move on my part.

I am finding DuckDB to basically be the greatest thing since sliced bread.

It would be surprising if DuckDB had any issue with market data. Any test I have seen on it is that is is blazing fast. For my use case, the speed is a non-issue, I am just not that high frequency.

The way I would test it though would be to make a synthetic dataset that is 2x-10X larger than what I think I will actually need and see how the performance is for my use case.

d08 said:
Really like DuckDB being file based and not a server. Since I'm integrating it with a GUI app, making and restoring backups is just done with a copy, very convenient.
Being the noob that I am, it's 2023 and I'm only now getting familiar with SQL.
More...

MarkBrown · Nov 8, 2024

sqlite3 ?

SQLite is the most used database engine in the world.

https://www.geeksforgeeks.org/python-pandas-dataframe/

https://www.slingacademy.com/article/pandas-how-to-store-a-dataframe-in-a-sqlite-table/

https://fessorpro.com/blogs/c/pytho...ython-sqlite3-database-for-storing-stock-data