Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get week start date (Monday) from a date column in Python (pandas)?

I have seen a lot of posts about how you can do it with a date string but I am trying something for a dataframe column and haven't got any luck so far. My current method is : Get the weekday from 'myday' and then offset to get monday.

df['myday'] is column of dates.  mydays = pd.DatetimeIndex(df['myday']).weekday df['week_start'] = pd.DatetimeIndex(df['myday']) - pd.DateOffset(days=mydays) 

But I get TypeError: unsupported type for timedelta days component: numpy.ndarray

How can I get week start date from a df column?

like image 478
dev28 Avatar asked Jan 16 '15 17:01

dev28


People also ask

How do you get the day of the week from a date in pandas?

If you are working with a pandas series or dataframe, then the timestamp() method is helpful to get the day number and name. First, pass the date in YYYY-MM-DD format as its parameter. Next, use the dayofweek() and day_name() method to get the weekday number and name.

How can I group by month from a date field using Python pandas?

Output: In the above example, the dataframe is groupby by the Date column. As we have provided freq = 'M' which means month, so the data is grouped month-wise till the last date of every month and provided sum of price column.


2 Answers

Another alternative:

df['week_start'] = df['myday'].dt.to_period('W').apply(lambda r: r.start_time) 

This will set 'week_start' to be the first Monday before the time in 'myday'.

like image 110
Paul Avatar answered Sep 22 '22 09:09

Paul


While both @knightofni's and @Paul's solutions work I tend to try to stay away from using apply in Pandas because it is usually quite slow compared to array-based methods. In order to avoid this, after casting to a datetime column (via pd.to_datetime) we can modify the weekday based method and simply cast the day of the week to be a numpy timedelta64[D] by either casting it directly:

df['week_start'] = df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]') 

or by using to_timedelta as @ribitskiyb suggested:

df['week_start'] = df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D').  

Using test data with 60,000 datetimes I got the following times using the suggested answers using the newly released Pandas 1.0.1.

%timeit df.apply(lambda x: x['myday'] - datetime.timedelta(days=x['myday'].weekday()), axis=1) >>> 1.33 s ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  %timeit df['myday'].dt.to_period('W').apply(lambda r: r.start_time) >>> 5.59 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  %timeit df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]') >>> 3.44 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  %timeit df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D') >>> 3.47 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 

These results show that Pandas 1.0.1 has dramatically improved the speed of the to_period apply based method (vs Pandas <= 0.25) but show that converting directly to a timedelta (by either casting the type directly .astype('timedelta64[D]') or using pd.to_timedelta is still superior. Based on these results I would suggest using pd.to_timedelta going forward.

like image 24
n8yoder Avatar answered Sep 18 '22 09:09

n8yoder