Quick pandas question

sle · Nov 9, 2017

Lets say I have a data frame with options data. I would like to be able to quickly find an option on a given date with a given type, expiration that has delta closest to x. What’s the best way to do that?

tommcginnis · Nov 9, 2017

Probably no help to you, but in the spreadsheet world, I have to
-- set up a table of all spreads (by expiry, by side, by strike)
-- then search on the deltas until I have one [abs[delta]] over my filter,
-- then back up on row.

The Pythonista process *might* be similar, with regard to forming a sub-table object for handy manipulation? Just guessing.

2rosy · Nov 9, 2017

this will get the type and exp
optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]

closest delta you might need to use interp or take a look at http://xarray.pydata.org/en/stable/
it has nearest functions and N-dimensional functionality that would probably work well with options

sle · Nov 9, 2017

2rosy said:
this will get the type and exp
optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]
More...

If I want it to work fast, would I want to add these fields to the index? I tried but it behaves quirky and, oddly enough, we don't have a python expert in house (at least none that I can talk to).

Kevin Schmit · Nov 9, 2017

Expanding Rosy's code

targetDelta = 25
D = [ abs(x - targetDelta) for x in optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]]
D.index(min(D)

I have not actually tested this, but it is so simple it should work.

sle · Nov 9, 2017

Kevin Schmit said:
targetDelta = 25
D = [ abs(x - targetDelta) for x in optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]]
D.index(min(D)
More...

Pandas has wonderful features for that - in fact, I can even do something like
Code:
cond = (opt['exp']=='20170907') & (opt['cp']=='C')
x = opt[cond][abs(opt['delta'] - targetDelta) == abs(opt['delta'] - targetDelta).min()]
However, my main issue is that when I am searching through a lot of options, it works pretty slow. So instead I did something like
Code:
opt = pd.read_csv('spy.csv')
opt.set_index(['date', 'exp', 'cp', 'strike'], inplace=True)
after that I can search for complete combinations very quickly:
subx = opt.loc[('20170903', '20170907', 'C', 15)]
but ranges of dates misbehave badly - it seem to do an outside join on everything instead on a single field, e.g. if i do:
Code:
subx = opt.loc[('20170903', '20170907', 'C', 15):('20170703', '20170907', 'C', 15)]
it will return every option that has a strike of 15 etc

Kevin Schmit · Nov 9, 2017

sle said:
Pandas has wonderful features for that - in fact, I can even do something like
Code:
cond = (opt['exp']=='20170907') & (opt['cp']=='C')
x = opt[cond][abs(opt['delta'] - targetDelta) == abs(opt['delta'] - targetDelta).min()]
However, my main issue is that when I am searching through a lot of options, it works pretty slow. So instead I did something like
Code:
opt = pd.read_csv('spy.csv')
opt.set_index(['date', 'exp', 'cp', 'strike'], inplace=True)
after that I can search for complete combinations very quickly:
subx = opt.loc[('20170903', '20170907', 'C', 15)]
but ranges of dates misbehave badly - it seem to do an outside join on everything instead on a single field, e.g. if i do:
Code:
subx = opt.loc[('20170903', '20170907', 'C', 15):('20170703', '20170907', 'C', 15)]
it will return every option that has a strike of 15 etc
More...
The code I posted doesn't actually work, and it is too late to edit. This is why I should always test before posting.

Something like this should work, but maybe not that quickly:

targetDelta = 25
D = optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]
D2 = D.Delta
D3 = [ abs(x - targetDelta) for x in D2]
D[ D3.index(min(D3)) ]

One way to get around python slowness/flakiness when working with very large dataframes/tables is just to keep your options in a relational database and access
via SQL. I use cx_Oracle for python queries against an EOD options table (multi-source) of > 4mmm rows. Speed is quite good and complex self-joins, where-clause filters, etc. behave flawlessly.

stevegee58 · Nov 10, 2017

Don't be scared by databases. You don't need to set up a MySQL/MariaDB server. I use SQLite all the time and it's great for organizing data. Instead of being server based, SQLite is file based. You have access to all the normal SQL syntax, table structures, primary keys, etc.

Best yet Python has very good bindings with SQLite.

sle · Nov 10, 2017

I doubt a relational db would be an improvement speed-wise - pretty sure my data frame fits into memory and pandas is pretty efficient. I just don’t know how do implement it properly for specific types of searches

2rosy · Nov 10, 2017

Code:

import xarray as xr
import pandas as pd

df=pd.read_csv('http://www.deltaneutral.com/files/Sample_SPX_20151001_to_20151030.csv')
sdf=df[ (df.type=='call') & (df.expiration=='10/16/2015') & (df.quotedate == '10/01/2015') &(df.underlying=='SPX')]
sxdf=xr.Dataset.from_dataframe(sdf)
sxdf.set_index(strike='strike', inplace=True)
sxdf.sel(strike=[621],method='nearest').to_dataframe()