Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting min value across multiple datetime columns in Pandas

I have the following dataframe

df = pd.DataFrame({
    'DATE1': ['NaT', 'NaT', '2010-04-15 19:09:08+00:00', '2011-01-25 15:29:37+00:00', '2010-04-10 12:29:02+00:00', 'NaT'],
    'DATE2': ['NaT', 'NaT', 'NaT', 'NaT', '2014-04-10 12:29:02+00:00', 'NaT']})
df.DATE1 = pd.to_datetime(df.DATE1)
df.DATE2 = pd.to_datetime(df.DATE2)

and I would like to create a new column with the minimum value across the two columns (ignoring the NaTs) like so:

df.min(axis=1)
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
dtype: float64

If I remove the timezone information (the +00:00) from every single cell then the desired output is produced like so:

0                   NaT
1                   NaT
2   2010-04-15 19:09:08
3   2011-01-25 15:29:37
4   2010-04-10 12:29:02
5                   NaT
dtype: datetime64[ns]

Why does adding the timezone information break the function? My dataset has timezones so I would need to know how to remove them as a workaround.

like image 295
Ilya Voytov Avatar asked Oct 16 '22 04:10

Ilya Voytov


1 Answers

This is good question, it should be a bug here with timezone

df.apply(lambda x : np.max(x),1)
0                         NaT
1                         NaT
2   2010-04-15 19:09:08+00:00
3   2011-01-25 15:29:37+00:00
4   2014-04-10 12:29:02+00:00
5                         NaT
dtype: datetime64[ns, UTC]
like image 173
BENY Avatar answered Oct 19 '22 01:10

BENY