Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parse multiple date format pandas

I 've got stuck with the following format:

0   2001-12-25  
1   2002-9-27   
2   2001-2-24   
3   2001-5-3    
4   200510
5   20078

What I need is the date in a format %Y-%m

What I tried was

 def parse(date):
     if len(date)<=5:
         return "{}-{}".format(date[:4], date[4:5], date[5:])
     else:
         pass

  df['Date']= parse(df['Date'])

However, I only succeeded in parse 20078 to 2007-8, the format like 2001-12-25 appeared as None. So, how can I do it? Thank you!

like image 556
almo Avatar asked Mar 30 '26 22:03

almo


2 Answers

we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.

assuming your column is called date

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

df['date_fixed'] = s

print(df)

         date date_fixed
0  2001-12-25 2001-12-25
1   2002-9-27 2002-09-27
2   2001-2-24 2001-02-24
3    2001-5-3 2001-05-03
4      200510 2005-10-01
5       20078 2007-08-01

In steps,

first we cast the regular datetimes to a new series called s

s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')

print(s)

0   2001-12-25
1   2002-09-27
2   2001-02-24
3   2001-05-03
4          NaT
5          NaT
Name: date, dtype: datetime64[ns]

as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,

we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s

s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))

print(s)


0   2001-12-25
1   2002-09-27
2   2001-02-24
3   2001-05-03
4   2005-10-01
5   2007-08-01

then we re-assign to your dataframe.

like image 106
Umar.H Avatar answered Apr 02 '26 11:04

Umar.H


You could use a regex to pull out the year and month, and convert to datetime :

df = pd.read_clipboard("\s{2,}",header=None,names=["Dates"])

pattern = r"(?P<Year>\d{4})[-]*(?P<Month>\d{1,2})"

df['Dates'] = pd.to_datetime([f"{year}-{month}" for year, month in df.Dates.str.extract(pattern).to_numpy()])

print(df)

        Dates
0   2001-12-01
1   2002-09-01
2   2001-02-01
3   2001-05-01
4   2005-10-01
5   2007-08-01

Note that pandas automatically converts the day to 1, since only year and month was supplied.

like image 20
sammywemmy Avatar answered Apr 02 '26 12:04

sammywemmy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!