Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to display the correct date century in Pandas?

I have following data in one of my columns:

df['DOB']

0    01-01-84
1    31-07-85
2    24-08-85
3    30-12-93
4    09-12-77
5    08-09-90
6    01-06-88
7    04-10-89
8    15-11-91
9    01-06-68
Name: DOB, dtype: object

I want to convert this to a datatype column. I tried following:

print(pd.to_datetime(df1['Date.of.Birth']))
0   1984-01-01
1   1985-07-31
2   1985-08-24
3   1993-12-30
4   1977-09-12
5   1990-08-09
6   1988-01-06
7   1989-04-10
8   1991-11-15
9   2068-01-06
Name: DOB, dtype: datetime64[ns]

How can I get the date as 1968-01-06 instead of 2068-01-06?

like image 200
Madhanlal Avatar asked Apr 18 '19 05:04

Madhanlal


People also ask

How do you format a date in a data frame?

Call dataframe[column] . dt. strftime(format) where dataframe[column] is the column from the DataFrame containing datetime objects and format is a string representing the new date format. Use "%m" to indicate where the month should be positioned, "%d" for the day, and "%y" for the year.

What is the date format in pandas?

By default pandas datetime format is YYYY-MM-DD ( %Y-%m-%d ).


2 Answers

In this specific case, I would use this:

pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])

Note that this will break if you have DOBs after 1999!

Output:

0   1984-01-01
1   1985-07-31
2   1985-08-24
3   1993-12-30
4   1977-09-12
5   1990-08-09
6   1988-01-06
7   1989-04-10
8   1991-11-15
9   1968-01-06
dtype: datetime64[ns]
like image 81
gmds Avatar answered Oct 25 '22 21:10

gmds


You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:

df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
         DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01

Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.

Notice: Solution working also for years 00 for 2000, up to 2020.

s1 = df['DOB'].str.replace(r'-(\d+)$', r'-19\1')
s2 = df['DOB'].str.replace(r'-(\d+)$', r'-20\1')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

print (df)
         DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06

If all years are below 2000:

s1 = df['DOB'].str.replace(r'-(\d+)$', r'-19\1')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
         DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
like image 34
jezrael Avatar answered Oct 25 '22 21:10

jezrael