I have a pandas data frame which has datetimes with 2 different formats e.g.:
3/14/2019 5:15:32 AM
2019-08-03 05:15:35
2019-01-03 05:15:33
2019-01-03 05:15:33
2/28/2019 5:15:31 AM
2/27/2019 11:18:39 AM
...
I have tried various formats but get errors like ValueError: unconverted data remains: AM
I would like to get the format as 2019-02-28 and have the time removed
You can use pd.to_datetime().dt.strftime()
to efficienty convert the entire column to a datetime object and then to a string with Pandas intelligently guessing the date formatting:
df = pd.Series('''3/14/2019 5:15:32 AM
2019-08-03 05:15:35
2019-01-03 05:15:33
2019-01-03 05:15:33
2/28/2019 5:15:31 AM
2/27/2019 11:18:39 AM'''.split('\n'), name='date', dtype=str).to_frame()
print(pd.to_datetime(df.date).dt.strftime('%Y-%m-%d'))
0 2019-03-14
1 2019-08-03
2 2019-01-03
3 2019-01-03
4 2019-02-28
5 2019-02-27
Name: date, dtype: object
If that doesn't give you what you want, you will need to identify the different kinds of formats and apply different settings when you convert them to datetime objects:
# Classify date column by format type
df['format'] = 1
df.loc[df.date.str.contains('/'), 'format'] = 2
df['new_date'] = pd.to_datetime(df.date)
# Convert to datetime with two different format settings
df.loc[df.format == 1, 'new_date'] = pd.to_datetime(df.loc[df.format == 1, 'date'], format = '%Y-%d-%m %H:%M:%S').dt.strftime('%Y-%m-%d')
df.loc[df.format == 2, 'new_date'] = pd.to_datetime(df.loc[df.format == 2, 'date'], format = '%m/%d/%Y %H:%M:%S %p').dt.strftime('%Y-%m-%d')
print(df)
date format new_date
0 3/14/2019 5:15:32 AM 2 2019-03-14
1 2019-08-03 05:15:35 1 2019-03-08
2 2019-01-03 05:15:33 1 2019-03-01
3 2019-01-03 05:15:33 1 2019-03-01
4 2/28/2019 5:15:31 AM 2 2019-02-28
5 2/27/2019 11:18:39 AM 2 2019-02-27
Assume that the column name in your DataFrame is DatStr
.
The key to success is a proper conversion function, to be applied to each date string:
def datCnv(src):
return pd.to_datetime(src)
Then all you should do to create a true date column is to call:
df['Dat'] = df.DatStr.apply(datCnv)
When you print the DataFrame, the result is:
DatStr Dat
0 3/14/2019 5:15:32 AM 2019-03-14 05:15:32
1 2019-08-03 05:15:35 2019-08-03 05:15:35
2 2019-01-03 05:15:33 2019-01-03 05:15:33
3 2019-01-03 05:15:33 2019-01-03 05:15:33
4 2/28/2019 5:15:31 AM 2019-02-28 05:15:31
5 2/27/2019 11:18:39 AM 2019-02-27 11:18:39
Note that to_datetime
function is clever enough to recognize the
actual date format used in each case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With