Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is the fastest way to extract day, month and year from a given date?

Tags:

I read a csv file containing 150,000 lines into a pandas dataframe. This dataframe has a field, Date, with the dates in yyyy-mm-dd format. I want to extract the month, day and year from it and copy into the dataframes' columns, Month, Day and Year respectively. For a few hundred records the below two methods work ok, but for 150,000 records both take a ridiculously long time to execute. Is there a faster way to do this for 100,000+ records?

First method:

df = pandas.read_csv(filename) for i in xrange(len(df)):     df.loc[i,'Day'] = int(df.loc[i,'Date'].split('-')[2]) 

Second method:

df = pandas.read_csv(filename) for i in xrange(len(df)):    df.loc[i,'Day'] = datetime.strptime(df.loc[i,'Date'], '%Y-%m-%d').day 

Thank you.

like image 927
ram Avatar asked Feb 22 '14 12:02

ram


People also ask

How do I get pandas to Day date?

Hence Pandas provides a method called to_datetime() to convert strings into Timestamp objects. Once we convert a date in string format into a date time object, it is easy to get the day of the week using the method day_name() on the Timestamp object created.


2 Answers

In 0.15.0 you will be able to use the new .dt accessor to do this nice syntactically.

In [36]: df = DataFrame(date_range('20000101',periods=150000,freq='H'),columns=['Date'])  In [37]: df.head(5) Out[37]:                   Date 0 2000-01-01 00:00:00 1 2000-01-01 01:00:00 2 2000-01-01 02:00:00 3 2000-01-01 03:00:00 4 2000-01-01 04:00:00  [5 rows x 1 columns]  In [38]: %timeit f(df) 10 loops, best of 3: 22 ms per loop  In [39]: def f(df):     df = df.copy()     df['Year'] = DatetimeIndex(df['Date']).year     df['Month'] = DatetimeIndex(df['Date']).month     df['Day'] = DatetimeIndex(df['Date']).day     return df    ....:   In [40]: f(df).head() Out[40]:                   Date  Year  Month  Day 0 2000-01-01 00:00:00  2000      1    1 1 2000-01-01 01:00:00  2000      1    1 2 2000-01-01 02:00:00  2000      1    1 3 2000-01-01 03:00:00  2000      1    1 4 2000-01-01 04:00:00  2000      1    1  [5 rows x 4 columns] 

From 0.15.0 on (release in end of Sept 2014), the following is now possible with the new .dt accessor:

df['Year'] = df['Date'].dt.year df['Month'] = df['Date'].dt.month df['Day'] = df['Date'].dt.day 
like image 65
Jeff Avatar answered Nov 06 '22 20:11

Jeff


I use below code which works very well for me

df['Year']=[d.split('-')[0] for d in df.Date] df['Month']=[d.split('-')[1] for d in df.Date] df['Day']=[d.split('-')[2] for d in df.Date]  df.head(5) 
like image 21
Nim J Avatar answered Nov 06 '22 20:11

Nim J