I import a dataframe via read_csv, but for some reason can't extract the year or month from the series df['date'], trying that gives AttributeError: 'Series' object has no attribute 'year':
date Count 6/30/2010 525 7/30/2010 136 8/31/2010 125 9/30/2010 84 10/29/2010 4469 df = pd.read_csv('sample_data.csv', parse_dates=True) df['date'] = pd.to_datetime(df['date']) df['year'] = df['date'].year df['month'] = df['date'].month UPDATE: and when I try solutions with df['date'].dt on my pandas version 0.14.1, I get "AttributeError: 'Series' object has no attribute 'dt' ":
df = pd.read_csv('sample_data.csv',parse_dates=True) df['date'] = pd.to_datetime(df['date']) df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO.
FOLLOWUP: I can't seem to update my pandas 0.14.1 to a newer release in my Anaconda environment, each of the attempts below generates an invalid syntax error. I'm using Python 3.4.1 64bit.
conda update pandas conda install pandas==0.15.2 conda install -f pandas Any ideas?
Using datetime object, we applied today() function to extract the current date and then year() to get only the year from the current date.
We can easily get year, month, day, hour, or minute from dates in a column of a pandas dataframe using dt attributes for all columns. For example, we can use df['date']. dt. year to extract only the year from a pandas column that includes the full date.
If you're running a recent-ish version of pandas then you can use the datetime attribute dt to access the datetime components:
In [6]: df['date'] = pd.to_datetime(df['date']) df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month df Out[6]: date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 2010 9 4 2010-10-29 4469 2010 10 EDIT
It looks like you're running an older version of pandas in which case the following would work:
In [18]: df['date'] = pd.to_datetime(df['date']) df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month) df Out[18]: date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 2010 9 4 2010-10-29 4469 2010 10 Regarding why it didn't parse this into a datetime in read_csv you need to pass the ordinal position of your column ([0]) because when True it tries to parse columns [1,2,3] see the docs
In [20]: t="""date Count 6/30/2010 525 7/30/2010 136 8/31/2010 125 9/30/2010 84 10/29/2010 4469""" df = pd.read_csv(io.StringIO(t), sep='\s+', parse_dates=[0]) df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 5 entries, 0 to 4 Data columns (total 2 columns): date 5 non-null datetime64[ns] Count 5 non-null int64 dtypes: datetime64[ns](1), int64(1) memory usage: 120.0 bytes So if you pass param parse_dates=[0] to read_csv there shouldn't be any need to call to_datetime on the 'date' column after loading.
This works:
df['date'].dt.year Now:
df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month gives this data frame:
date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 2010 9 4 2010-10-29 4469 2010 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With