I import a dataframe via read_csv
, but for some reason can't extract the year or month from the series df['date']
, trying that gives AttributeError: 'Series' object has no attribute 'year'
:
date Count 6/30/2010 525 7/30/2010 136 8/31/2010 125 9/30/2010 84 10/29/2010 4469 df = pd.read_csv('sample_data.csv', parse_dates=True) df['date'] = pd.to_datetime(df['date']) df['year'] = df['date'].year df['month'] = df['date'].month
UPDATE: and when I try solutions with df['date'].dt
on my pandas version 0.14.1, I get "AttributeError: 'Series' object has no attribute 'dt' ":
df = pd.read_csv('sample_data.csv',parse_dates=True) df['date'] = pd.to_datetime(df['date']) df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month
Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO.
FOLLOWUP: I can't seem to update my pandas 0.14.1 to a newer release in my Anaconda environment, each of the attempts below generates an invalid syntax error. I'm using Python 3.4.1 64bit.
conda update pandas conda install pandas==0.15.2 conda install -f pandas
Any ideas?
Using datetime object, we applied today() function to extract the current date and then year() to get only the year from the current date.
We can easily get year, month, day, hour, or minute from dates in a column of a pandas dataframe using dt attributes for all columns. For example, we can use df['date']. dt. year to extract only the year from a pandas column that includes the full date.
If you're running a recent-ish version of pandas then you can use the datetime attribute dt
to access the datetime components:
In [6]: df['date'] = pd.to_datetime(df['date']) df['year'], df['month'] = df['date'].dt.year, df['date'].dt.month df Out[6]: date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 2010 9 4 2010-10-29 4469 2010 10
EDIT
It looks like you're running an older version of pandas in which case the following would work:
In [18]: df['date'] = pd.to_datetime(df['date']) df['year'], df['month'] = df['date'].apply(lambda x: x.year), df['date'].apply(lambda x: x.month) df Out[18]: date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 2010 9 4 2010-10-29 4469 2010 10
Regarding why it didn't parse this into a datetime in read_csv
you need to pass the ordinal position of your column ([0]
) because when True
it tries to parse columns [1,2,3]
see the docs
In [20]: t="""date Count 6/30/2010 525 7/30/2010 136 8/31/2010 125 9/30/2010 84 10/29/2010 4469""" df = pd.read_csv(io.StringIO(t), sep='\s+', parse_dates=[0]) df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 5 entries, 0 to 4 Data columns (total 2 columns): date 5 non-null datetime64[ns] Count 5 non-null int64 dtypes: datetime64[ns](1), int64(1) memory usage: 120.0 bytes
So if you pass param parse_dates=[0]
to read_csv
there shouldn't be any need to call to_datetime
on the 'date' column after loading.
This works:
df['date'].dt.year
Now:
df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month
gives this data frame:
date Count year month 0 2010-06-30 525 2010 6 1 2010-07-30 136 2010 7 2 2010-08-31 125 2010 8 3 2010-09-30 84 2010 9 4 2010-10-29 4469 2010 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With