I import a dataframe via <code>read_csv</code>, but for some reason can't extract the year or month from the series <code>df['date']</code>, trying that gives <code>AttributeError: 'Series' object has no attribute 'year'</code>: <pre class="prettyprint"><code>date Count 6/30/2010 525 7/30/2010 136 8/31/2010 125 9/30/2010 84 10/29/2010 4469 df = pd.read_csv('sample_data.csv', parse_dates=True) df['date'] = pd.to_datetime(df['date']) df['year'] = df['date'].year df['month'] = df['date'].month </code></pre> UPDATE: and when I try solutions with <code>df['date'].dt</code> on my pandas version 0.14.1, I get "AttributeError: 'Series' object has no attribute 'dt' ": <pre class="prettyprint"><code>df = pd.read_csv('sample_data.csv',parse_dates=True) df['date'] = pd.to_datetime(df['date']) df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month </code></pre> Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO. <hr> FOLLOWUP: I can't seem to update my pandas 0.14.1 to a newer release in my Anaconda environment, each of the attempts below generates an invalid syntax error. I'm using Python 3.4.1 64bit. <pre class="prettyprint"><code>conda update pandas conda install pandas==0.15.2 conda install -f pandas </code></pre> Any ideas?

If you're running a recent-ish version of pandas then you can use the datetime attribute <code>dt</code> to access the datetime components: <pre class="prettyprint"><code>In [6]: df['date'] = pd.to_datetime(df['date']) df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month df Out[6]: date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 2010 9 4 2010-10-29 4469 2010 10 </code></pre> EDIT It looks like you're running an older version of pandas in which case the following would work: <pre class="prettyprint"><code>In [18]: df['date'] = pd.to_datetime(df['date']) df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month) df Out[18]: date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 2010 9 4 2010-10-29 4469 2010 10 </code></pre> Regarding why it didn't parse this into a datetime in <code>read_csv</code> you need to pass the ordinal position of your column (<code>[0]</code>) because when <code>True</code> it tries to parse columns <code>[1,2,3]</code> see the docs <pre class="prettyprint"><code>In [20]: t="""date Count 6/30/2010 525 7/30/2010 136 8/31/2010 125 9/30/2010 84 10/29/2010 4469""" df = pd.read_csv(io.StringIO(t), sep='\s+', parse_dates=[0]) df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 5 entries, 0 to 4 Data columns (total 2 columns): date 5 non-null datetime64[ns] Count 5 non-null int64 dtypes: datetime64[ns](1), int64(1) memory usage: 120.0 bytes </code></pre> So if you pass param <code>parse_dates=[0]</code> to <code>read_csv</code> there shouldn't be any need to call <code>to_datetime</code> on the 'date' column after loading.

This works: <pre class="prettyprint"><code>df['date'].dt.year </code></pre> Now: <pre class="prettyprint"><code>df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month </code></pre> gives this data frame: <pre class="prettyprint"><code> date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 2010 9 4 2010-10-29 4469 2010 10 </code></pre>

python pandas extract year from datetime: df['year'] = df['date'].year is not working

Tags:

python

datetime

pandas

dataframe

extract

I import a dataframe via read_csv, but for some reason can't extract the year or month from the series df['date'], trying that gives AttributeError: 'Series' object has no attribute 'year':

date    Count 6/30/2010   525 7/30/2010   136 8/31/2010   125 9/30/2010   84 10/29/2010  4469  df = pd.read_csv('sample_data.csv', parse_dates=True)  df['date'] = pd.to_datetime(df['date'])  df['year'] = df['date'].year df['month'] = df['date'].month

UPDATE: and when I try solutions with df['date'].dt on my pandas version 0.14.1, I get "AttributeError: 'Series' object has no attribute 'dt' ":

df = pd.read_csv('sample_data.csv',parse_dates=True)  df['date'] = pd.to_datetime(df['date'])  df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month

Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO.

FOLLOWUP: I can't seem to update my pandas 0.14.1 to a newer release in my Anaconda environment, each of the attempts below generates an invalid syntax error. I'm using Python 3.4.1 64bit.

conda update pandas  conda install pandas==0.15.2  conda install -f pandas

Any ideas?

821

asked May 22 '15 20:05

MJS

2 Answers

If you're running a recent-ish version of pandas then you can use the datetime attribute dt to access the datetime components:

In [6]:  df['date'] = pd.to_datetime(df['date']) df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month df Out[6]:         date  Count  year  month 0 2010-06-30    525  2010      6 1 2010-07-30    136  2010      7 2 2010-08-31    125  2010      8 3 2010-09-30     84  2010      9 4 2010-10-29   4469  2010     10

EDIT

It looks like you're running an older version of pandas in which case the following would work:

In [18]:  df['date'] = pd.to_datetime(df['date']) df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month) df Out[18]:         date  Count  year  month 0 2010-06-30    525  2010      6 1 2010-07-30    136  2010      7 2 2010-08-31    125  2010      8 3 2010-09-30     84  2010      9 4 2010-10-29   4469  2010     10

Regarding why it didn't parse this into a datetime in read_csv you need to pass the ordinal position of your column ([0]) because when True it tries to parse columns [1,2,3] see the docs

In [20]:  t="""date   Count 6/30/2010   525 7/30/2010   136 8/31/2010   125 9/30/2010   84 10/29/2010  4469""" df = pd.read_csv(io.StringIO(t), sep='\s+', parse_dates=[0]) df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 5 entries, 0 to 4 Data columns (total 2 columns): date     5 non-null datetime64[ns] Count    5 non-null int64 dtypes: datetime64[ns](1), int64(1) memory usage: 120.0 bytes

So if you pass param parse_dates=[0] to read_csv there shouldn't be any need to call to_datetime on the 'date' column after loading.

answered Sep 20 '22 03:09

EdChum

This works:

df['date'].dt.year

Now:

df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month

gives this data frame:

        date  Count  year  month 0 2010-06-30    525  2010      6 1 2010-07-30    136  2010      7 2 2010-08-31    125  2010      8 3 2010-09-30     84  2010      9 4 2010-10-29   4469  2010     10

answered Sep 21 '22 03:09

Mike Müller

Related questions
                            
                                zeromq: how to prevent infinite wait?
                            
                                DRY way to add created/modified by and time
                            
                                pip no longer working after update error 'module' object is not callable
                            
                                Is it possible to insert a row at an arbitrary position in a dataframe using pandas?
                            
                                Finding common rows (intersection) in two Pandas dataframes
                            
                                Easiest way to rm -rf in Python
                            
                                print python stack trace without exception being raised
                            
                                Unit tests for functions in a Jupyter notebook?
                            
                                What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?
                            
                                how to save a pylab figure into in-memory file which can be read into PIL image?
                            
                                What's an example use case for a Python classmethod?
                            
                                Define a method outside of class definition?
                            
                                Setuptools "development" Requirements
                            
                                Averaging over every n elements of a numpy array
                            
                                Is there an overhead when nesting functions in Python?
                            
                                How to disable password request for a Jupyter notebook session?
                            
                                Get index of a row of a pandas dataframe as an integer
                            
                                External JavaScript file is not getting added when running on Flask
                            
                                How to properly use mock in python with unittest setUp
                            
                                False or None vs. None or False

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With