Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Get Day of Week from date type column

I'm using Python 3.6 and Pandas 0.20.3.

I have a column that I've converted to date type from datetime. All I need is the date. I have it as a derived column for ease of use. But I'm looking to do some further operations via a day of the week calculation. I can get the day of week from a datetime type but not from the date. It seems to me that this should be possible but I've tried multiple variations and not found success.

Here is an example:

import numpy as np
import pandas as pd
df = pd.DataFrame({'date':['2017-5-16','2017-5-17']})
df['trade_date']=pd.to_datetime(df['date'])

I can get the day of the week from the datetime column 'trade_date'.

df['dow']=df['trade_date'].dt.dayofweek
df
    date    trade_date  dow
0   2017-5-16   2017-05-16  1
1   2017-5-17   2017-05-17  2

But if I have a date, rather than a datetime, no dice: For instance:

df['trade_date_2']=pd.to_datetime(df['date']).dt.date

And then:

df['dow_2']=df['trade_date_2'].dt.dayofweek

I get (at the end):

AttributeError: Can only use .dt accessor with datetimelike values

I've tried various combinations of dayofweek(), weekday, weekday() which, I realize, highlight my ignorance of exactly how Pandas works. So ... any suggestions besides adding another column which is the datetime version of column trade_date? I'll also welcome explanations of why this is not working.

like image 639
kdragger Avatar asked Oct 01 '17 18:10

kdragger


People also ask

How do I get the day of the week from a date column in Python?

The dayofweek property is used to get the day of the week. The day of the week with Monday=0, Sunday=6. Note: It is assumed the week starts on Monday, which is denoted by 0 and ends on Sunday which is denoted by 6. This method is available on both Series with datetime values (using the dt accessor) or DatetimeIndex.

How do you get the day of the month in pandas?

Period. days_in_month. Get the total number of days in the month that this period falls on.


1 Answers

There is problem it is difference between pandas datetime (timestamps) where are implemented .dt methods and python date where not.

#return python date
df['trade_date_2']= pd.to_datetime(df['date']).dt.date

print (df['trade_date_2'].apply(type))
0    <class 'datetime.date'>
1    <class 'datetime.date'>
Name: trade_date_2, dtype: object

#cannot work with python date
df['dow_2']=df['trade_date_2'].dt.dayofweek

Need convert to pandas datetime:

df['dow_2']= pd.to_datetime(df['trade_date_2']).dt.dayofweek

print (df)
        date trade_date_2  dow_2
0  2017-5-16   2017-05-16      1
1  2017-5-17   2017-05-17      2

So the best is use:

df['date'] = pd.to_datetime(df['date'])
print (df['date'].apply(type))
0    <class 'pandas._libs.tslib.Timestamp'>
1    <class 'pandas._libs.tslib.Timestamp'>
Name: date, dtype: object

df['trade_date_2']= df['date'].dt.date
df['dow_2']=df['date'].dt.dayofweek
print (df)
        date trade_date_2  dow_2
0 2017-05-16   2017-05-16      1
1 2017-05-17   2017-05-17      2

EDIT:

Thank you Bharath shetty for solution working with python date - failed with NaT:

df = pd.DataFrame({'date':['2017-5-16',np.nan]})

df['trade_date_2']= pd.to_datetime(df['date']).dt.date
df['dow_2'] = df['trade_date_2'].apply(lambda x: x.weekday()) 

AttributeError: 'float' object has no attribute 'weekday'

Comparing solutions:

df = pd.DataFrame({'date':['2017-5-16','2017-5-17']})
df = pd.concat([df]*10000).reset_index(drop=True)

def a(df):
    df['trade_date_2']= pd.to_datetime(df['date']).dt.date
    df['dow_2'] = df['trade_date_2'].apply(lambda x: x.weekday()) 
    return df

def b(df):
    df['date1'] = pd.to_datetime(df['date'])
    df['trade_date_21']= df['date1'].dt.date
    df['dow_21']=df['date1'].dt.dayofweek
    return (df)

def c(df):
    #dont write to column, but to helper series 
    dates = pd.to_datetime(df['date'])
    df['trade_date_22']= dates.dt.date
    df['dow_22']=        dates.dt.dayofweek
    return (df)

In [186]: %timeit (a(df))
10 loops, best of 3: 101 ms per loop

In [187]: %timeit (b(df))
10 loops, best of 3: 90.8 ms per loop

In [188]: %timeit (c(df))
10 loops, best of 3: 91.9 ms per loop
like image 69
jezrael Avatar answered Nov 14 '22 11:11

jezrael