Python/Pandas Time Series problem

sle · Oct 3, 2017

So I have a collection of time series from a few different global locations and all have time stamps based on the local time (US/Easters, US/Central, EU/London etc). I need to merge them all into a single data frame with consistent time stamps (my preference is US/Eastern since it's more intuitive to me). So I do
dfCME = dfCME.tz_localize('US/Central').tz_convert('US/Eastern')
dfEurex = dfCME.tz_localize('EU/London').tz_convert('US/Eastern')
it seems to work, I get time stamps that look like
'10:30:00-05:00'
However, when I try to merge the two series it gets all fucked up. I tried the same process on a sample data set and it seems to work ok. What could I be doing wrong?

algotrader12 · Oct 3, 2017

How are you merging them ? try inner join if you want them in one entry per line like: instrument1 instrument2 timestamp so this will capture only if they both have data during that timestamp or you can use outer join if you just want everything. I found joins to work better for this type of operations...

sle · Oct 3, 2017

I use concat with axis=1 so they get aligned. Ideally, I'd like to get everything but with separate column per instrument. E.g. this works fine on two fake files, but seems to fail in real life:

Code:

import pandas as pd

a = pd.read_csv('a.csv')
b = pd.read_csv('b.csv')
a = a.set_index(pd.DatetimeIndex(a['timestamp'])).sort_index()
b = b.set_index(pd.DatetimeIndex(b['timestamp'])).sort_index()
del a['timestamp']
del b['timestamp']
a = a.tz_localize('US/Central').tz_convert('US/Eastern')
b = b.tz_localize('US/Eastern')
#a = a.between_time('08:30:00', '16:30:00')
b = b.between_time('09:30:00', '16:30:00')

c = pd.concat([a,b], axis=1).between_time('08:30:00', '16:30:00')

the output is as expected:

Code:


data data
2017-09-13 08:30:00-04:00 64 NaN
2017-09-13 09:00:00-04:00 65 NaN
2017-09-13 09:30:00-04:00 66 68.0
2017-09-13 10:00:00-04:00 67 69.0
2017-09-13 10:30:00-04:00 68 70.0
2017-09-13 11:00:00-04:00 69 71.0
2017-09-13 11:30:00-04:00 70 72.0
2017-09-13 12:00:00-04:00 71 73.0
2017-09-13 12:30:00-04:00 72 74.0
2017-09-13 13:00:00-04:00 73 75.0
2017-09-13 13:30:00-04:00 74 76.0
2017-09-13 14:00:00-04:00 75 77.0

algotrader12 · Oct 4, 2017

Try something like this: pd.merge(a, b, on='index', how='outer')

sle · Oct 4, 2017

how is that different from a.join(b, how='outer')?

PS. I have switched to python/pandas from R and seem to bump into these difficulties once in a blue moon

algotrader12 · Oct 4, 2017

well this is probably not too different other then the fact that you do not explicitly specify index. Combination of a join usually works for me, you might have to play with it. Yeah both R and python have their little annoyances.

ofthomas · Oct 4, 2017

why not wrangle your data and switch it all to UTC and then work with those new files? might be simpler and faster to just script the conversion and remove that logic from the analysis phase.

sle · Oct 4, 2017

ofthomas said:
why not wrangle your data and switch it all to UTC and then work with those new files? might be simpler and faster to just script the conversion and remove that logic from the analysis phase.
More...

I find EST most intuitive for me - I know what opens when etc.

aphexcoil · Oct 4, 2017

ofthomas said:
why not wrangle your data and switch it all to UTC and then work with those new files? might be simpler and faster to just script the conversion and remove that logic from the analysis phase.
More...

This is how I store all of my data inside Postgres -- I just use epoch time and then when I use the data, I convert into whatever timezone I am interested in. Epoch time is naturally UTC so it works wonderfully. I've fucked up too many times when trying to store time in one specific timezone and then try converting to another after the fact.