I have the following dataframe
df = pd.DataFrame({
'DATE1': ['NaT', 'NaT', '2010-04-15 19:09:08+00:00', '2011-01-25 15:29:37+00:00', '2010-04-10 12:29:02+00:00', 'NaT'],
'DATE2': ['NaT', 'NaT', 'NaT', 'NaT', '2014-04-10 12:29:02+00:00', 'NaT']})
df.DATE1 = pd.to_datetime(df.DATE1)
df.DATE2 = pd.to_datetime(df.DATE2)
and I would like to create a new column with the minimum value across the two columns (ignoring the NaTs) like so:
df.min(axis=1)
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
dtype: float64
If I remove the timezone information (the +00:00
) from every single cell then the desired output is produced like so:
0 NaT
1 NaT
2 2010-04-15 19:09:08
3 2011-01-25 15:29:37
4 2010-04-10 12:29:02
5 NaT
dtype: datetime64[ns]
Why does adding the timezone information break the function? My dataset has timezones so I would need to know how to remove them as a workaround.
This is good question, it should be a bug here with timezone
df.apply(lambda x : np.max(x),1)
0 NaT
1 NaT
2 2010-04-15 19:09:08+00:00
3 2011-01-25 15:29:37+00:00
4 2014-04-10 12:29:02+00:00
5 NaT
dtype: datetime64[ns, UTC]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With