pandas 2.0 is here — and Databento now officially supports dataframes

Discussion in 'Announcements' started by Databento, Apr 6, 2023.

  1. Databento

    Databento Sponsor

    pandas 2.0 is out today, and we're excited to see substantial performance improvements brought by the new Apache Arrow backend.

    If you don't already know, Databento's Python client library has had native support for pandas dataframes for a while!

    For example, fetching all ticks within a day and converting the output to a dataframe looks like this:

    Code:
    import databento as db
    
    client = db.Historical('YOUR_API_KEY')
    data = client.timeseries.get_range(
        dataset='GLBX.MDP3',
        symbols=['ES.FUT'],
        schema='trades',
        stype_in='smart',
        start='2022-06-10T14:30',
        end='2022-06-10T14:40',
    )
    
    df = data.to_df(pretty_ts=True, pretty_px=True)  # to DataFrame, with pretty formatting
    What's cool is that this uses our proprietary binary protocol (DBN) under the hood, rather than common formats for pandas data persistence like CSV or Parquet. DBN achieves significant performance advantages over these common interchange formats.

    Today's release of pandas 2.0 coincides with this our recent public release of Databento Binary Encoding v0.3.0, which prepares DBN for real-time streaming while further optimizing its metadata structure for on-disk storage and exploratory use cases in dataframes. With this change, we've stabilized the pandas support in our Python client library and are ready to announce that we're officially supporting pandas dataframes going forward.
     
    Gambit, M.W., lariati and 2 others like this.