Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: combine two index columns

Tags:

python

pandas

I have the following pandas Series:

data = {(pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 0)): 6.885,
        (pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 5)): 6.363, 
        (pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 10)): 6.093,
        (pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 15)): 6.768, 
        (pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 20)): 7.11}
s = pd.Series(data)

2016-01-01  00:00:00    6.885
            00:05:00    6.363
            00:10:00    6.093
            00:15:00    6.768
            00:20:00    7.110
dtype: float64

How can I combine the two index columns to create a DatetimeIndex like so:

2016-01-01 00:00:00    6.885
2016-01-01 00:05:00    6.363
2016-01-01 00:10:00    6.093
2016-01-01 00:15:00    6.768
2016-01-01 00:20:00    7.110
dtype: float64
like image 268
Johnny Metz Avatar asked Sep 11 '25 00:09

Johnny Metz


1 Answers

Intuitive Answer
Use pd.Index.map and pd.Timedelta

s.index = s.index.map(lambda t: t[0] + pd.Timedelta(str(t[1])))
s

2016-01-01 00:00:00    6.885
2016-01-01 00:05:00    6.363
2016-01-01 00:10:00    6.093
2016-01-01 00:15:00    6.768
2016-01-01 00:20:00    7.110
dtype: float64

Fast Answer
If speed is what your after, try this

t = np.array(
    [t.hour * 60 + t.minute for t in s.index.get_level_values(1)],
    'timedelta64[m]'
)

s.index = s.index.get_level_values(0) + t

2016-01-01 00:00:00    6.885
2016-01-01 00:05:00    6.363
2016-01-01 00:10:00    6.093
2016-01-01 00:15:00    6.768
2016-01-01 00:20:00    7.110
dtype: float64

Time Testing

Please note that this is only if you care about optimization. Otherwise, please use what you think the right choice is for you.

jez = lambda s: s.index.get_level_values(0) + pd.to_timedelta(s.index.get_level_values(1).astype(str))
pir1 = lambda s: s.index.map(lambda t: t[0] + pd.Timedelta(str(t[1])))
pir2 = lambda s: s.index.get_level_values(0) + np.array([t.hour * 60 + t.minute for t in s.index.get_level_values(1)], 'timedelta64[m]')

res = pd.DataFrame(
    np.nan, [10, 30, 100, 300, 1000, 3000, 10000, 30000],
    'jez pir1 pir2'.split()
)

for i in res.index:
    s_ = pd.concat([s] * i)
    for j in res.columns:
        stmt = f'{j}(s_)'
        setp = f'from __main__ import {j}, s_'
        res.at[i, j] = timeit(stmt, setp, number=100)

res.plot(loglog=True)

enter image description here

res.div(res.min(1), 0)

             jez       pir1  pir2
10      2.400808   3.530032   1.0
30      4.045287   8.378484   1.0
100     6.337601  18.610263   1.0
300     8.664829  30.363422   1.0
1000   11.593935  44.210358   1.0
3000   11.899037  47.425953   1.0
10000  12.226166  49.546467   1.0
30000  12.543602  50.730653   1.0
like image 72
piRSquared Avatar answered Sep 12 '25 16:09

piRSquared