Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Translating a messy date string in a DataFrame to `datetime` in python and pandas

I have a dataframe that lists the date in several formats. For example:

       date
201    July 17,2019
555    06-06-2019
295    28/3/2019

Is there a way to translate all of those dates to a datetime?

like image 526
rachelvsamuel Avatar asked Oct 12 '25 12:10

rachelvsamuel


1 Answers

Check Edit below for better answer.

Normally for a task like this you would use pandas.Series.dt.strftime() to convert the entire column to a specific DateTime format, but since it seems you have different formats in different rows you should probably instead look intodatetime.strptime().

Traverse through that column of Date an through if (or try) statement's changes the dates (pseudocode for demonstration):

for date in date_column:
    if "," in date:
        # strptime()
    elif "-" in date:
        # strptime()
    elif "/" in date:
        dt = datetime.strptime(date, "%d/%m/%y")

    column_row = dt

Here is the strftime() format codes, which you will need to set the second param of the function.


Edit

You may also do this by utilizing dateutil.parser.parse (make sure to install it first via pip install python-dateutil):

from dateutil.parser import parse

df['date'] = df.date.apply(parse)

So for example:

from dateutil.parser import parse
import pandas as pd
import numpy as np


def parse_date(date):
    if date is np.nan:
        return np.nan
    else:
        return parse(date)


df = pd.DataFrame(
    {
        "date": [
            "July 17,2019",
            "06-06-2019",
            "28/3/2019",
            "12/3/19",
            "31-12-19",
            np.nan,
        ],
        "text": ["t1", "t2", "t3", "t4", "t5", "t6"],
    }
)

df["date"] = df.date.apply(parse_date)

Which converts:

           date text
0  July 17,2019   t1
1    06-06-2019   t2
2     28/3/2019   t3
3       12/3/19   t4
4      31-12-19   t5
5           NaN   t6

to

        date text
0 2019-07-17   t1
1 2019-06-06   t2
2 2019-03-28   t3
3 2019-12-03   t4
4 2019-12-31   t5
5        NaT   t6

You can also accomplish the same as the function by using a lambda expression as following:

df["date"] = df.date.apply(lambda date: np.nan if date is np.nan else parse(date))
like image 159
Felipe Avatar answered Oct 14 '25 10:10

Felipe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!