I have the following df:
Timestamp A B C ...
2014-11-09 00:00:00 NaN 1 NaN NaN
2014-11-09 00:00:00 2 NaN NaN NaN
2014-11-09 00:00:00 NaN NaN 3 NaN
2014-11-09 08:24:00 NaN NaN 1 NaN
2014-11-09 08:24:00 105 NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
And I would like to make the following:
Timestamp A B C ...
2014-11-09 00:00:00 2 1 3 NaN
2014-11-09 00:01:00 NaN NaN NaN NaN
2014-11-09 00:02:00 NaN NaN NaN NaN
... NaN NaN NaN NaN
2014-11-09 08:23:00 NaN NaN NaN NaN
2014-11-09 08:24:00 105 NaN 1 NaN
2014-11-09 08:25:00 NaN NaN NaN NaN
2014-11-09 08:26:00 NaN NaN NaN NaN
2014-11-09 08:27:00 NaN NaN NaN NaN
... NaN NaN NaN NaN
2014-11-09 09:18:00 NaN NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
That is: I would like to merge the columns with the same Timestamp (I have 17 columns), resample at 1 min granularity and for those column with no values I would like to have NaN.
I started in the following ways:
df.groupby('Timestamp').sum()
and
df = df.resample('1Min', how='max')
but I obtained the following error:
ValueError: cannot reindex from a duplicate axis
How can I solve this problem? I'm just learning Python so I don't have experience at all.
Thank you!
Assumed that you have your Timestamp
as index to begin with, you need to do the resample first, and reset_index
before doing a groupby
, here's the working sample:
import pandas as pd
df
A B C ...
Timestamp
2014-11-09 00:00:00 NaN 1 NaN NaN
2014-11-09 00:00:00 2 NaN NaN NaN
2014-11-09 00:00:00 NaN NaN 3 NaN
2014-11-09 08:24:00 NaN NaN 1 NaN
2014-11-09 08:24:00 105 NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
df.resample('1Min', how='max').reset_index().groupby('Timestamp').sum()
A B C ...
Timestamp
2014-11-09 00:00:00 2 1 3 NaN
2014-11-09 00:01:00 NaN NaN NaN NaN
2014-11-09 00:02:00 NaN NaN NaN NaN
2014-11-09 00:03:00 NaN NaN NaN NaN
2014-11-09 00:04:00 NaN NaN NaN NaN
...
2014-11-09 09:17:00 NaN NaN NaN NaN
2014-11-09 09:18:00 NaN NaN NaN NaN
2014-11-09 09:19:00 NaN NaN 23 NaN
Hope this helps.
As said in comment, your 'Timestamp' isn't datetime and probably as string so you cannot resample by DatetimeIndex, just reset_index and convert it something like this:
df = df.reset_index()
df['ts'] = pd.to_datetime(df['Timestamp'])
# 'ts' is now datetime of 'Timestamp', you just need to set it to index
df = df.set_index('ts')
...
Now just run the previous code again but replace 'Timestamp' with 'ts' and you should be OK.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With