A Pandas <code>DataFrame</code> contains column named <code>"date"</code> that contains non-unique <code>datetime</code> values. I can group the lines in this frame using: <pre class="prettyprint"><code>data.groupby(data['date']) </code></pre> However, this splits the data by the <code>datetime</code> values. I would like to group these data by the year stored in the "date" column. This page shows how to group by year in cases where the time stamp is used as an index, which is not true in my case. How do I achieve this grouping?

I'm using pandas 0.16.2. This has better performance on my large dataset: <pre class="prettyprint"><code>data.groupby(data.date.dt.year) </code></pre> Using the <code>dt</code> option and playing around with <code>weekofyear</code>, <code>dayofweek</code> etc. becomes far easier.

ecatmur's solution will work fine. This will be better performance on large datasets, though: <pre class="prettyprint"><code>data.groupby(data['date'].map(lambda x: x.year)) </code></pre>

How to group pandas DataFrame entries by date in a non-unique column

Tags:

python

pandas

A Pandas DataFrame contains column named "date" that contains non-unique datetime values. I can group the lines in this frame using:

data.groupby(data['date'])

However, this splits the data by the datetime values. I would like to group these data by the year stored in the "date" column. This page shows how to group by year in cases where the time stamp is used as an index, which is not true in my case.

How do I achieve this grouping?

536

asked Jul 09 '12 09:07

Boris Gorelik

2 Answers

I'm using pandas 0.16.2. This has better performance on my large dataset:

data.groupby(data.date.dt.year)

Using the dt option and playing around with weekofyear, dayofweek etc. becomes far easier.

189

answered Sep 29 '22 14:09

DACW

ecatmur's solution will work fine. This will be better performance on large datasets, though:

data.groupby(data['date'].map(lambda x: x.year))

answered Sep 29 '22 13:09

Wes McKinney

Related questions
                            
                                MatPlotLib: Multiple datasets on the same scatter plot
                            
                                "Line contains NULL byte" in CSV reader (Python)
                            
                                How to get the indices list of all NaN value in numpy array?
                            
                                Directing print output to a .txt file
                            
                                Extract points/coordinates from a polygon in Shapely
                            
                                Pass extra arguments to Serializer Class in Django Rest Framework
                            
                                How do I render jinja2 output to a file in Python instead of a Browser
                            
                                How do I request and process JSON with python?
                            
                                How to GroupBy a Dataframe in Pandas and keep Columns
                            
                                How to display the first few characters of a string in Python?
                            
                                How to get a matplotlib Axes instance to plot to?
                            
                                Pandas dataframe total row
                            
                                Prevent creating new attributes outside __init__
                            
                                How to continue a task when Fabric receives an error
                            
                                Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'"
                            
                                How to increase image size of pandas.DataFrame.plot
                            
                                Python: Binding Socket: "Address already in use"
                            
                                Get all child elements
                            
                                TypeError: 'RelatedManager' object is not iterable
                            
                                django abstract models versus regular inheritance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With