I have just discovered the power of Pandas and I love it, but I can't figure out this problem:
I have a DataFrame df.head()
:
lon lat h filename time
0 19.961216 80.617627 -0.077165 60048 2002-05-15 12:59:31.717467
1 19.923916 80.614847 -0.018689 60048 2002-05-15 12:59:31.831467
2 19.849396 80.609257 -0.089205 60048 2002-05-15 12:59:32.059467
3 19.830776 80.607857 0.076485 60048 2002-05-15 12:59:32.116467
4 19.570708 80.588183 0.162943 60048 2002-05-15 12:59:32.888467
I would like to group my data into nine day intervals
gb = df.groupby(pd.TimeGrouper(key='time', freq='9D'))
The first group:
2002-05-15 12:59:31.717467 lon lat h filename time
0 19.961216 80.617627 -0.077165 60048 2002-05-15 12:59:31.717467
1 19.923916 80.614847 -0.018689 60048 2002-05-15 12:59:31.831467
2 19.849396 80.609257 -0.089205 60048 2002-05-15 12:59:32.059467
3 19.830776 80.607857 0.076485 60048 2002-05-15 12:59:32.116467
...
Next group:
2002-05-24 12:59:31.717467 lon lat height filename time
815 18.309498 80.457024 0.187387 60309 2002-05-24 16:35:39.553563
816 18.291458 80.458514 0.061446 60309 2002-05-24 16:35:39.610563
817 18.273408 80.460014 0.129255 60309 2002-05-24 16:35:39.667563
818 18.255358 80.461504 0.046761 60309 2002-05-24 16:35:39.724563
...
So the data are grouped in nine days counting from the first time ( 12:59:31.717467), and not from the beginning of the day as I would like.
When grouping by one day:
gb = df.groupby(pd.TimeGrouper(key='time', freq='D'))
gives me:
2002-05-15 00:00:00 lon lat h filename time
0 19.961216 80.617627 -0.077165 60048 2002-05-15 12:59:31.717467
1 19.923916 80.614847 -0.018689 60048 2002-05-15 12:59:31.831467
2 19.849396 80.609257 -0.089205 60048 2002-05-15 12:59:32.059467
3 19.830776 80.607857 0.076485 60048 2002-05-15 12:59:32.116467
...
I can just loop over the days until I get a nine day interval, but I think it could be done smarter, I am looking for a Grouper freq
option equivalent to YS (start of year) just for days, a way of setting the start time (maybe by the Grouper option convention : {‘start’, ‘end’, ‘e’, ‘s’}
), or???
I am running Python 3.5.2 and Pandas is in version: 0.19.0
In Python, the Pandas head() method is used to retrieve the first N number of rows of data from Pandas DataFrame. This function always returns all rows except the last n rows.
You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.
Use pandas. DataFrame. head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start).
Dropping first time row:
Your best bet would be to normalize
the first row of the datetime
column so that the time is reset to 00:00:00
(midnight) and group according to the 9D interval:
df.loc[0, 'time'] = df['time'].iloc[0].normalize()
for _, grp in df.groupby(pd.TimeGrouper(key='time', freq='9D')):
print (grp)
# lon lat h filename time
# 0 19.961216 80.617627 -0.077165 60048 2002-05-15 00:00:00.000000
# 1 19.923916 80.614847 -0.018689 60048 2002-05-15 12:59:31.831467
# 2 19.849396 80.609257 -0.089205 60048 2002-05-15 12:59:32.059467
# 3 19.830776 80.607857 0.076485 60048 2002-05-15 12:59:32.116467
# 4 19.570708 80.588183 0.162943 60048 2002-05-15 12:59:32.888467
# ......................................................................
This restores the time in the other rows and so you wouldn't lose that information.
Keeping first time row:
If you want to keep the first time row as it is and not make any changes to it, but only want to start grouping from midnight onwards, you could do:
df_t_shift = df.shift() # Shift one level down
df_t_shift.loc[0, 'time'] = df_t_shift['time'].iloc[1].normalize()
# Concat last row of df with the shifted one to account for the loss of row
df_t_shift = df_t_shift.append(df.iloc[-1], ignore_index=True)
for _, grp in df_t_shift.groupby(pd.TimeGrouper(key='time', freq='9D')):
print (grp)
# lon lat h filename time
# 0 NaN NaN NaN NaN 2002-05-15 00:00:00.000000
# 1 19.961216 80.617627 -0.077165 60048.0 2002-05-15 12:59:31.717467
# 2 19.923916 80.614847 -0.018689 60048.0 2002-05-15 12:59:31.831467
# 3 19.849396 80.609257 -0.089205 60048.0 2002-05-15 12:59:32.059467
# 4 19.830776 80.607857 0.076485 60048.0 2002-05-15 12:59:32.116467
# 5 19.570708 80.588183 0.162943 60048.0 2002-05-15 12:59:32.888467
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With