Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if a date lies in first half of month or second in pandas

Tags:

python

pandas

I have a dataframe which consists of a date column but the date column is in string. How can I check if the date lies in first half of the month or the second half and add add another column with the billing date

For e.g.

if the date is 08-10-2020(format is dd-mm-yyyy) then the billing date column will consist of 16th of the same month and if the date lies in between 17-31 it then the billing date will consist of 1st day of next month

Data:

print(df['dispatch_date'].head())

0    01-10-2020
1    07-10-2020
2    17-10-2020
3    16-10-2020
4    09-10-2020
Name: dispatch_date, dtype: object

example output:

                 billing date
0    01-10-2020  16-10-2020
1    07-10-2020  16-10-2020
2    17-10-2020  01-11-2020
3    16-10-2020  01-11-2020
4    09-10-2020  16-10-2020
like image 656
Rahul Sharma Avatar asked Oct 15 '22 00:10

Rahul Sharma


2 Answers

You could do it using apply as follows-

import pandas as pd
import datetime as dt

dates = ['01-10-2020', '07-10-2020', '17-10-2020', '15-12-2020', '19-12-2020']
df = pd.DataFrame(data=dates, columns=['dates'])

# if the billing data can still be string going ahead
print(df.dates.apply(lambda x: '16'+x[2:] if int(x[:2]) < 16 else '01-'+str(int(x[3:5])+1)+x[5:] if int(x[3:5]) != 12 else '01-'+'01-'+str(int(x[6:])+1)))
df['billing_date'] = df.dates.apply(lambda x: '16'+x[2:] if int(x[:2]) < 16 else '01-'+str(int(x[3:5])+1)+x[5:] if int(x[3:5]) != 12 else '01-'+'01-'+str(int(x[6:])+1))

# if billing date series is needed as a datetime object
print(df.dates.apply(lambda x: dt.date(int(x[-4:]), int(x[3:5]), 16) if int(x[:2]) < 16 else dt.date(int(x[-4:]), int(x[3:5])+1, 1) if int(x[3:5]) != 12 else dt.date(int(x[-4:])+1, 1, 1)))
df['billing_date'] = df.dates.apply(lambda x: dt.date(int(x[-4:]), int(x[3:5]), 16) if int(x[:2]) < 16 else dt.date(int(x[-4:]), int(x[3:5])+1, 1) if int(x[3:5]) != 12 else dt.date(int(x[-4:])+1, 1, 1))
Output
0    16-10-2020
1    16-10-2020
2    01-11-2020
3    16-12-2020
4    01-01-2021
Name: dates, dtype: object

0    2020-10-16
1    2020-10-16
2    2020-11-01
3    2020-12-16
4    2021-01-01
Name: dates, dtype: object

Edit: code handles the edge case scenario possible in december

like image 137
sai Avatar answered Oct 19 '22 01:10

sai


Pure pandas, no string or apply modifications needed:

  • locate all dates with days before the 16th and set them to the 1st of monthst of that month plus semi month end (wich is the 15th)
  • locate all other dates dats on or after 16th and move it to the begin of next month

We need two different df.loc's because

df['billday'] = df['date'].dt.floor('d') + pd.offsets.SemiMonthEnd()

would move the dates to 15th and the (28th to 31th depending on month and leap year) of that month unfortunately.


import pandas as pd
from datetime import date 

# create demo data    
dt = [date(2020,10,i) for i in range(1,32,3)]    
df = pd.DataFrame({"date": dt})
df["date"] = pd.to_datetime(df["date"])
print(df)


# create billday colums     
df.loc[df.date.dt.day < 16, "billday"] = df['date'].dt.floor('d') +\
                                             pd.offsets.SemiMonthEnd() # to 15th
df.loc[df.date.dt.day > 15, "billday"] = df['date'].dt.floor('d') +\
                                             pd.offsets.MonthBegin(1)
 
print(df) 

Output:

         date    billday
0  2020-10-01 2020-10-15
1  2020-10-04 2020-10-15
2  2020-10-07 2020-10-15
3  2020-10-10 2020-10-15
4  2020-10-13 2020-10-15
5  2020-10-16 2020-11-01
6  2020-10-19 2020-11-01
7  2020-10-22 2020-11-01
8  2020-10-25 2020-11-01
9  2020-10-28 2020-11-01
10 2020-10-31 2020-11-01

See:

  • pandas.Series.dt.floor.html
  • pandas.offsets.SemiMonthEnd
  • Conditional Replace Pandas
  • Extracting the first day of month of a datetime type column in pandas
like image 27
Patrick Artner Avatar answered Oct 19 '22 01:10

Patrick Artner