Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group pandas DataFrame entries by date in a non-unique column

Tags:

python

pandas

A Pandas DataFrame contains column named "date" that contains non-unique datetime values. I can group the lines in this frame using:

data.groupby(data['date']) 

However, this splits the data by the datetime values. I would like to group these data by the year stored in the "date" column. This page shows how to group by year in cases where the time stamp is used as an index, which is not true in my case.

How do I achieve this grouping?

like image 536
Boris Gorelik Avatar asked Jul 09 '12 09:07

Boris Gorelik


People also ask

What does DF Groupby (' year ') do?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.

How do I group specific rows in pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.


2 Answers

I'm using pandas 0.16.2. This has better performance on my large dataset:

data.groupby(data.date.dt.year) 

Using the dt option and playing around with weekofyear, dayofweek etc. becomes far easier.

like image 189
DACW Avatar answered Sep 29 '22 14:09

DACW


ecatmur's solution will work fine. This will be better performance on large datasets, though:

data.groupby(data['date'].map(lambda x: x.year)) 
like image 34
Wes McKinney Avatar answered Sep 29 '22 13:09

Wes McKinney