A Pandas DataFrame
contains column named "date"
that contains non-unique datetime
values. I can group the lines in this frame using:
data.groupby(data['date'])
However, this splits the data by the datetime
values. I would like to group these data by the year stored in the "date" column. This page shows how to group by year in cases where the time stamp is used as an index, which is not true in my case.
How do I achieve this grouping?
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
I'm using pandas 0.16.2. This has better performance on my large dataset:
data.groupby(data.date.dt.year)
Using the dt
option and playing around with weekofyear
, dayofweek
etc. becomes far easier.
ecatmur's solution will work fine. This will be better performance on large datasets, though:
data.groupby(data['date'].map(lambda x: x.year))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With