Python/Pandas Time Series problem

Discussion in 'App Development' started by sle, Oct 3, 2017.

  1. sle

    sle

    So I have a collection of time series from a few different global locations and all have time stamps based on the local time (US/Easters, US/Central, EU/London etc). I need to merge them all into a single data frame with consistent time stamps (my preference is US/Eastern since it's more intuitive to me). So I do
    dfCME = dfCME.tz_localize('US/Central').tz_convert('US/Eastern')
    dfEurex = dfCME.tz_localize('EU/London').tz_convert('US/Eastern')
    it seems to work, I get time stamps that look like
    '10:30:00-05:00'
    However, when I try to merge the two series it gets all fucked up. I tried the same process on a sample data set and it seems to work ok. What could I be doing wrong?
     
  2. How are you merging them ? try inner join if you want them in one entry per line like: instrument1 instrument2 timestamp so this will capture only if they both have data during that timestamp or you can use outer join if you just want everything. I found joins to work better for this type of operations...
     
  3. sle

    sle

    I use concat with axis=1 so they get aligned. Ideally, I'd like to get everything but with separate column per instrument. E.g. this works fine on two fake files, but seems to fail in real life:

    Code:
    import pandas as pd
    
    a = pd.read_csv('a.csv')
    b = pd.read_csv('b.csv')
    a = a.set_index(pd.DatetimeIndex(a['timestamp'])).sort_index()
    b = b.set_index(pd.DatetimeIndex(b['timestamp'])).sort_index()
    del a['timestamp']
    del b['timestamp']
    a = a.tz_localize('US/Central').tz_convert('US/Eastern')
    b = b.tz_localize('US/Eastern')
    #a = a.between_time('08:30:00', '16:30:00')
    b = b.between_time('09:30:00', '16:30:00')
    
    c = pd.concat([a,b], axis=1).between_time('08:30:00', '16:30:00')
    
    the output is as expected:
    Code:
    
    data data
    2017-09-13 08:30:00-04:00 64 NaN
    2017-09-13 09:00:00-04:00 65 NaN
    2017-09-13 09:30:00-04:00 66 68.0
    2017-09-13 10:00:00-04:00 67 69.0
    2017-09-13 10:30:00-04:00 68 70.0
    2017-09-13 11:00:00-04:00 69 71.0
    2017-09-13 11:30:00-04:00 70 72.0
    2017-09-13 12:00:00-04:00 71 73.0
    2017-09-13 12:30:00-04:00 72 74.0
    2017-09-13 13:00:00-04:00 73 75.0
    2017-09-13 13:30:00-04:00 74 76.0
    2017-09-13 14:00:00-04:00 75 77.0
    
     
  4. Try something like this: pd.merge(a, b, on='index', how='outer')
     
  5. sle

    sle

    how is that different from a.join(b, how='outer')?

    PS. I have switched to python/pandas from R and seem to bump into these difficulties once in a blue moon
     
  6. well this is probably not too different other then the fact that you do not explicitly specify index. Combination of a join usually works for me, you might have to play with it. Yeah both R and python have their little annoyances.
     
  7. ofthomas

    ofthomas

    why not wrangle your data and switch it all to UTC and then work with those new files? might be simpler and faster to just script the conversion and remove that logic from the analysis phase.
     
  8. sle

    sle

    I find EST most intuitive for me - I know what opens when etc.
     
  9. This is how I store all of my data inside Postgres -- I just use epoch time and then when I use the data, I convert into whatever timezone I am interested in. Epoch time is naturally UTC so it works wonderfully. I've fucked up too many times when trying to store time in one specific timezone and then try converting to another after the fact.