I have a dataframe which consists of a date column but the date column is in string. How can I check if the date lies in first half of the month or the second half and add add another column with the billing date
For e.g.
if the date is 08-10-2020
(format is dd-mm-yyyy) then the billing date
column will consist of 16th of the same month and if the date lies in between 17-31 it then the billing date will consist of 1st day of next month
Data:
print(df['dispatch_date'].head())
0 01-10-2020
1 07-10-2020
2 17-10-2020
3 16-10-2020
4 09-10-2020
Name: dispatch_date, dtype: object
example output:
billing date
0 01-10-2020 16-10-2020
1 07-10-2020 16-10-2020
2 17-10-2020 01-11-2020
3 16-10-2020 01-11-2020
4 09-10-2020 16-10-2020
You could do it using apply
as follows-
import pandas as pd
import datetime as dt
dates = ['01-10-2020', '07-10-2020', '17-10-2020', '15-12-2020', '19-12-2020']
df = pd.DataFrame(data=dates, columns=['dates'])
# if the billing data can still be string going ahead
print(df.dates.apply(lambda x: '16'+x[2:] if int(x[:2]) < 16 else '01-'+str(int(x[3:5])+1)+x[5:] if int(x[3:5]) != 12 else '01-'+'01-'+str(int(x[6:])+1)))
df['billing_date'] = df.dates.apply(lambda x: '16'+x[2:] if int(x[:2]) < 16 else '01-'+str(int(x[3:5])+1)+x[5:] if int(x[3:5]) != 12 else '01-'+'01-'+str(int(x[6:])+1))
# if billing date series is needed as a datetime object
print(df.dates.apply(lambda x: dt.date(int(x[-4:]), int(x[3:5]), 16) if int(x[:2]) < 16 else dt.date(int(x[-4:]), int(x[3:5])+1, 1) if int(x[3:5]) != 12 else dt.date(int(x[-4:])+1, 1, 1)))
df['billing_date'] = df.dates.apply(lambda x: dt.date(int(x[-4:]), int(x[3:5]), 16) if int(x[:2]) < 16 else dt.date(int(x[-4:]), int(x[3:5])+1, 1) if int(x[3:5]) != 12 else dt.date(int(x[-4:])+1, 1, 1))
Output
0 16-10-2020
1 16-10-2020
2 01-11-2020
3 16-12-2020
4 01-01-2021
Name: dates, dtype: object
0 2020-10-16
1 2020-10-16
2 2020-11-01
3 2020-12-16
4 2021-01-01
Name: dates, dtype: object
Edit: code handles the edge case scenario possible in december
Pure pandas, no string or apply modifications needed:
We need two different df.loc
's because
df['billday'] = df['date'].dt.floor('d') + pd.offsets.SemiMonthEnd()
would move the dates to 15th and the (28th to 31th depending on month and leap year) of that month unfortunately.
import pandas as pd
from datetime import date
# create demo data
dt = [date(2020,10,i) for i in range(1,32,3)]
df = pd.DataFrame({"date": dt})
df["date"] = pd.to_datetime(df["date"])
print(df)
# create billday colums
df.loc[df.date.dt.day < 16, "billday"] = df['date'].dt.floor('d') +\
pd.offsets.SemiMonthEnd() # to 15th
df.loc[df.date.dt.day > 15, "billday"] = df['date'].dt.floor('d') +\
pd.offsets.MonthBegin(1)
print(df)
Output:
date billday
0 2020-10-01 2020-10-15
1 2020-10-04 2020-10-15
2 2020-10-07 2020-10-15
3 2020-10-10 2020-10-15
4 2020-10-13 2020-10-15
5 2020-10-16 2020-11-01
6 2020-10-19 2020-11-01
7 2020-10-22 2020-11-01
8 2020-10-25 2020-11-01
9 2020-10-28 2020-11-01
10 2020-10-31 2020-11-01
See:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With