Quick pandas question

Discussion in 'App Development' started by sle, Nov 9, 2017.

  1. sle

    sle

    Lets say I have a data frame with options data. I would like to be able to quickly find an option on a given date with a given type, expiration that has delta closest to x. What’s the best way to do that?
     
  2. tommcginnis

    tommcginnis

    Probably no help to you, but in the spreadsheet world, I have to
    -- set up a table of all spreads (by expiry, by side, by strike)
    -- then search on the deltas until I have one [abs[delta]] over my filter,
    -- then back up on row.

    The Pythonista process *might* be similar, with regard to forming a sub-table object for handy manipulation? Just guessing.
     
  3. 2rosy

    2rosy

    this will get the type and exp
    optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]

    closest delta you might need to use interp or take a look at http://xarray.pydata.org/en/stable/
    it has nearest functions and N-dimensional functionality that would probably work well with options
     
  4. sle

    sle

    If I want it to work fast, would I want to add these fields to the index? I tried but it behaves quirky and, oddly enough, we don't have a python expert in house (at least none that I can talk to).
     
    Last edited: Nov 9, 2017
  5. Expanding Rosy's code

    targetDelta = 25
    D = [ abs(x - targetDelta) for x in optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]]
    D.index(min(D)

    I have not actually tested this, but it is so simple it should work.
     
  6. sle

    sle

    Pandas has wonderful features for that - in fact, I can even do something like
    Code:
    cond = (opt['exp']=='20170907') & (opt['cp']=='C')
    x = opt[cond][abs(opt['delta'] - targetDelta) == abs(opt['delta'] - targetDelta).min()]
    
    However, my main issue is that when I am searching through a lot of options, it works pretty slow. So instead I did something like
    Code:
    opt = pd.read_csv('spy.csv')
    opt.set_index(['date', 'exp', 'cp', 'strike'], inplace=True)
    after that I can search for complete combinations very quickly:
    subx = opt.loc[('20170903', '20170907', 'C', 15)]
    
    but ranges of dates misbehave badly - it seem to do an outside join on everything instead on a single field, e.g. if i do:
    Code:
    subx = opt.loc[('20170903', '20170907', 'C', 15):('20170703', '20170907', 'C', 15)]
    
    it will return every option that has a strike of 15 etc
     
  7. The code I posted doesn't actually work, and it is too late to edit. This is why I should always test before posting.

    Something like this should work, but maybe not that quickly:

    targetDelta = 25
    D = optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]
    D2 = D.Delta
    D3 = [ abs(x - targetDelta) for x in D2]
    D[ D3.index(min(D3)) ]


    One way to get around python slowness/flakiness when working with very large dataframes/tables is just to keep your options in a relational database and access
    via SQL. I use cx_Oracle for python queries against an EOD options table (multi-source) of > 4mmm rows. Speed is quite good and complex self-joins, where-clause filters, etc. behave flawlessly.
     
  8. Don't be scared by databases. You don't need to set up a MySQL/MariaDB server. I use SQLite all the time and it's great for organizing data. Instead of being server based, SQLite is file based. You have access to all the normal SQL syntax, table structures, primary keys, etc.

    Best yet Python has very good bindings with SQLite.
     
  9. sle

    sle

    I doubt a relational db would be an improvement speed-wise - pretty sure my data frame fits into memory and pandas is pretty efficient. I just don’t know how do implement it properly for specific types of searches
     
  10. 2rosy

    2rosy

    Code:
    import xarray as xr
    import pandas as pd
    
    df=pd.read_csv('http://www.deltaneutral.com/files/Sample_SPX_20151001_to_20151030.csv')
    sdf=df[ (df.type=='call') & (df.expiration=='10/16/2015') & (df.quotedate == '10/01/2015') &(df.underlying=='SPX')]
    sxdf=xr.Dataset.from_dataframe(sdf)
    sxdf.set_index(strike='strike', inplace=True)
    sxdf.sel(strike=[621],method='nearest').to_dataframe()
    
     
    #10     Nov 10, 2017