Data Analysis with Python: Calculating Mean and Median

Discussion in 'App Development' started by Grinda21, Oct 4, 2023.

  1. Agreed. It is just easier to import one of those libraries as we will eventually need them for extra calculations.

    Whilst it is true that we won't need them for basic statistics, those are just the starting steps of a program and those averages will be later used in more complex calculations, normally done in pandas or numpy.
     
    #11     Oct 5, 2023
    GlobalMacro90 likes this.
  2. The example also doesn't really show how awesome Pandas is. How about take each value in the list and set it to a business day into the future starting today, then calculate the mean between 10-10-23 and 10-16-23? Then plot the values between 10/10 and 10/16? All just so easy and easy to understand.

    import pandas as pd

    data = [12, 45, 67, 23, 41, 89, 34, 54, 21]

    dates = pd.date_range(start="10/5/2023", periods=len(data), freq='B')
    df = pd.DataFrame(data, columns=['Price'], index=dates)

    df.loc['2023-10-10':'2023-10-16', 'Price'].mean()
    df.loc['2023-10-10':'2023-10-16', 'Price'].plot()

     
    #12     Oct 5, 2023
    MarkBrown and GlobalMacro90 like this.
  3. d08

    d08

    Numpy is widely used and a fantastic tool. Pandas is fast losing popularity though, even after moving to Arrow, it's slow and a massive memory hog.
     
    #13     Nov 9, 2023
    MarkBrown likes this.
  4. M.W.

    M.W.

    I remember you use duckdb, right? Mostly for data storage? I find polars very performant for quick analytics. Clickhouse for data storage and transformations/aggregations.

     
    #14     Nov 9, 2023
  5. d08

    d08

    Yes, migrated to DuckDB for persistence and Polars for analysis, although the line isn't clear. Technically I should be able to stay with just DuckDB but I'm tired of rewriting stuff.
    Really like DuckDB being file based and not a server. Since I'm integrating it with a GUI app, making and restoring backups is just done with a copy, very convenient.
    Being the noob that I am, it's 2023 and I'm only now getting familiar with SQL.
     
    #15     Nov 9, 2023
    MarkBrown and M.W. like this.
  6. M.W.

    M.W.

    File based is definitely convenient. I can't count the hours spent on setting up retention, backup, and migration scripts and policies for several databases.

    I wonder how good the compression algorithm at duckdb is for large datasets of tick and ohlc data.

     
    #16     Nov 9, 2023
  7. tiddlywinks

    tiddlywinks

    A quick thanks to @M.W. and @d08

    This year, I've been updating my dev skills, ...
    Things to install this weekend... Polars and DuckDB!!

    A thank you in the current ET environment?
    Yup.



    When I wrote this code, only God and I understood what it did. Now, only God knows. ~~ Anonymous
     
    #17     Nov 9, 2023
    MarkBrown and d08 like this.
  8. d08

    d08

    I haven't seen any comprehensive comparisons. If you decide to compare, it would be very interesting to see the results.
     
    #18     Nov 10, 2023
  9. I actually ignored DuckDB for a few years now because of SQL. Just a completely irrational move on my part.

    I am finding DuckDB to basically be the greatest thing since sliced bread.

    It would be surprising if DuckDB had any issue with market data. Any test I have seen on it is that is is blazing fast. For my use case, the speed is a non-issue, I am just not that high frequency.

    The way I would test it though would be to make a synthetic dataset that is 2x-10X larger than what I think I will actually need and see how the performance is for my use case.

     
    #19     Nov 8, 2024
    d08 and MarkBrown like this.
  10. MarkBrown

    MarkBrown