Lets say I have a data frame with options data. I would like to be able to quickly find an option on a given date with a given type, expiration that has delta closest to x. What’s the best way to do that?
Probably no help to you, but in the spreadsheet world, I have to -- set up a table of all spreads (by expiry, by side, by strike) -- then search on the deltas until I have one [abs[delta]] over my filter, -- then back up on row. The Pythonista process *might* be similar, with regard to forming a sub-table object for handy manipulation? Just guessing.
this will get the type and exp optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ] closest delta you might need to use interp or take a look at http://xarray.pydata.org/en/stable/ it has nearest functions and N-dimensional functionality that would probably work well with options
If I want it to work fast, would I want to add these fields to the index? I tried but it behaves quirky and, oddly enough, we don't have a python expert in house (at least none that I can talk to).
Expanding Rosy's code targetDelta = 25 D = [ abs(x - targetDelta) for x in optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ]] D.index(min(D) I have not actually tested this, but it is so simple it should work.
Pandas has wonderful features for that - in fact, I can even do something like Code: cond = (opt['exp']=='20170907') & (opt['cp']=='C') x = opt[cond][abs(opt['delta'] - targetDelta) == abs(opt['delta'] - targetDelta).min()] However, my main issue is that when I am searching through a lot of options, it works pretty slow. So instead I did something like Code: opt = pd.read_csv('spy.csv') opt.set_index(['date', 'exp', 'cp', 'strike'], inplace=True) after that I can search for complete combinations very quickly: subx = opt.loc[('20170903', '20170907', 'C', 15)] but ranges of dates misbehave badly - it seem to do an outside join on everything instead on a single field, e.g. if i do: Code: subx = opt.loc[('20170903', '20170907', 'C', 15):('20170703', '20170907', 'C', 15)] it will return every option that has a strike of 15 etc
The code I posted doesn't actually work, and it is too late to edit. This is why I should always test before posting. Something like this should work, but maybe not that quickly: targetDelta = 25 D = optionsDF[ (optionsDF.type == 'CALL') & (optionsDF.EXP == '20170907') ] D2 = D.Delta D3 = [ abs(x - targetDelta) for x in D2] D[ D3.index(min(D3)) ] One way to get around python slowness/flakiness when working with very large dataframes/tables is just to keep your options in a relational database and access via SQL. I use cx_Oracle for python queries against an EOD options table (multi-source) of > 4mmm rows. Speed is quite good and complex self-joins, where-clause filters, etc. behave flawlessly.
Don't be scared by databases. You don't need to set up a MySQL/MariaDB server. I use SQLite all the time and it's great for organizing data. Instead of being server based, SQLite is file based. You have access to all the normal SQL syntax, table structures, primary keys, etc. Best yet Python has very good bindings with SQLite.
I doubt a relational db would be an improvement speed-wise - pretty sure my data frame fits into memory and pandas is pretty efficient. I just don’t know how do implement it properly for specific types of searches
Code: import xarray as xr import pandas as pd df=pd.read_csv('http://www.deltaneutral.com/files/Sample_SPX_20151001_to_20151030.csv') sdf=df[ (df.type=='call') & (df.expiration=='10/16/2015') & (df.quotedate == '10/01/2015') &(df.underlying=='SPX')] sxdf=xr.Dataset.from_dataframe(sdf) sxdf.set_index(strike='strike', inplace=True) sxdf.sel(strike=[621],method='nearest').to_dataframe()