Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the min date in a Pandas DF row and create new Column

Tags:

python

pandas

I have a table with a number of dates (some dates will be NaN) and I need to find the oldest date so a row may have DATE_MODIFIED, WITHDRAWN_DATE, SOLD_DATE, STATUS_DATE etc..

So for each row there will be a date in one or more of the fields I want to find the oldest of those and make a new column in the dataframe.

Something like this, if I just do one , eg DATE MODIFIED I get a result but when I add the second as below

table['END_DATE']=min([table['DATE_MODIFIED']],[table['SOLD_DATE']])

I get:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

For that matter will this construct work to find the min date, assuming I create correct date columns initially?

like image 585
dartdog Avatar asked Dec 06 '22 06:12

dartdog


1 Answers

Just apply the min function along the axis=1.

In [1]: import pandas as pd 
In [2]: df = pd.read_csv('test.cvs', parse_dates=['d1', 'd2', 'd3'])
In [3]: df.ix[2, 'd1'] = None
In [4]: df.ix[1, 'd2'] = None
In [5]: df.ix[4, 'd3'] = None
In [6]: df
Out[6]:
                   d1                  d2                  d3
0 2013-02-07 00:00:00 2013-03-08 00:00:00 2013-05-21 00:00:00
1 2013-02-07 00:00:00                 NaT 2013-05-21 00:00:00
2                 NaT 2013-03-02 00:00:00 2013-05-21 00:00:00
3 2013-02-04 00:00:00 2013-03-08 00:00:00 2013-01-04 00:00:00
4 2013-02-01 00:00:00 2013-03-06 00:00:00                 NaT
In [7]: df.min(axis=1)
Out[7]:
0   2013-02-07 00:00:00
1   2013-02-07 00:00:00
2   2013-03-02 00:00:00
3   2013-01-04 00:00:00
4   2013-02-01 00:00:00
dtype: datetime64[ns]
like image 178
Viktor Kerkez Avatar answered Jan 13 '23 12:01

Viktor Kerkez