I have seen a lot of posts about how you can do it with a date string but I am trying something for a dataframe column and haven't got any luck so far. My current method is : Get the weekday from 'myday' and then offset to get monday. <pre class="prettyprint"><code>df['myday'] is column of dates. mydays = pd.DatetimeIndex(df['myday']).weekday df['week_start'] = pd.DatetimeIndex(df['myday']) - pd.DateOffset(days=mydays) </code></pre> But I get TypeError: unsupported type for timedelta days component: numpy.ndarray How can I get week start date from a df column?

Another alternative: <pre class="prettyprint"><code>df['week_start'] = df['myday'].dt.to_period('W').apply(lambda r: r.start_time) </code></pre> This will set 'week_start' to be the first Monday before the time in 'myday'.

Get week start date (Monday) from a date column in Python (pandas)?

Tags:

python

date

pandas

numpy

I have seen a lot of posts about how you can do it with a date string but I am trying something for a dataframe column and haven't got any luck so far. My current method is : Get the weekday from 'myday' and then offset to get monday.

df['myday'] is column of dates.  mydays = pd.DatetimeIndex(df['myday']).weekday df['week_start'] = pd.DatetimeIndex(df['myday']) - pd.DateOffset(days=mydays)

But I get TypeError: unsupported type for timedelta days component: numpy.ndarray

How can I get week start date from a df column?

478

asked Jan 16 '15 17:01

dev28

2 Answers

Another alternative:

df['week_start'] = df['myday'].dt.to_period('W').apply(lambda r: r.start_time)

This will set 'week_start' to be the first Monday before the time in 'myday'.

110

answered Sep 22 '22 09:09

Paul

While both @knightofni's and @Paul's solutions work I tend to try to stay away from using apply in Pandas because it is usually quite slow compared to array-based methods. In order to avoid this, after casting to a datetime column (via pd.to_datetime) we can modify the weekday based method and simply cast the day of the week to be a numpy timedelta64[D] by either casting it directly:

df['week_start'] = df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')

or by using to_timedelta as @ribitskiyb suggested:

df['week_start'] = df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D').

Using test data with 60,000 datetimes I got the following times using the suggested answers using the newly released Pandas 1.0.1.

%timeit df.apply(lambda x: x['myday'] - datetime.timedelta(days=x['myday'].weekday()), axis=1) >>> 1.33 s ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  %timeit df['myday'].dt.to_period('W').apply(lambda r: r.start_time) >>> 5.59 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  %timeit df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]') >>> 3.44 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  %timeit df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D') >>> 3.47 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

These results show that Pandas 1.0.1 has dramatically improved the speed of the to_period apply based method (vs Pandas <= 0.25) but show that converting directly to a timedelta (by either casting the type directly .astype('timedelta64[D]') or using pd.to_timedelta is still superior. Based on these results I would suggest using pd.to_timedelta going forward.

answered Sep 18 '22 09:09

n8yoder

Related questions
                            
                                Default dict keys to avoid KeyError
                            
                                How to run gunicorn from a folder that is not the django project folder
                            
                                What is the fastest way to empty s3 bucket using boto3?
                            
                                Can't call strftime on numpy.datetime64, no definition
                            
                                How is the Vader 'compound' polarity score calculated in Python NLTK?
                            
                                ipython how to execute several history lines
                            
                                Python statement of short 'if-else'
                            
                                How to check if a line has one of the strings in a list? [duplicate]
                            
                                Pandas bar plot with specific colors and legend location?
                            
                                'Module object has no attribute 'get' Python error Requests?
                            
                                Python command line: ignore indentation
                            
                                Count consecutive characters
                            
                                ImportError: cannot import name 'ensure_dir_exists'
                            
                                Create a dictionary by zipping together two lists of uneven length [duplicate]
                            
                                How to group and count rows by month and year using Pandas?
                            
                                How do I change directory back to my original working directory with Python?
                            
                                the bytes type in python 2.7 and PEP-358
                            
                                How to delete all rows in a dataframe?
                            
                                Calculating Slopes in Numpy (or Scipy)
                            
                                Loop through all CSV files in a folder

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With