Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Max / Min of date column in Pandas, columns include nan values

I'm trying to create a new column in a pandas dataframe with the maximum (or minimum) date from two other date columns. But, when there is a NAN anywhere in either of those columns, the whole min/max column becomes a NAN. What gives? When using number columns this works fine... but with dates, the new column is all NANs. Here's some sample code to illustrate the problem:

df = pd.DataFrame(data=[[np.nan,date(2000,11,1)], 
                        [date(2000,12,1), date(2000,9,1)],
                        [date(2000,4,1),np.nan],
                        [date(2000,12,2),np.nan]], columns=['col1','col2'])

df['col3'] = df[['col1','col2']].max(axis=1)

I know it can be done with loc and combination of <, >, isnull and so on. But how to make it work with regular max/min functions?

like image 823
Nieumysl Avatar asked Jun 01 '17 09:06

Nieumysl


People also ask

How to get the maximum value in a pandas Dataframe?

You can use the pandas max () function to get the maximum value in a given column, multiple columns, or the entire dataframe. The following is the syntax: # df is a pandas dataframe # max value in a column

How do I find the minimum distance in a column in pandas?

Min value in a single pandas column To get the minimum value in a pandas column, use the min () function as follows. For example, let’s get the minimum distance the javelin was thrown in the first attempt. We get 79.79 meters as the minimum distance thrown in the “Attemp1”

How many rows and columns are in a pandas Dataframe?

It shows that our example data consists of seven rows and three columns. In Example 1, I’ll explain how to return the maximum and minimum value contained in a particular pandas DataFrame variable.

How to get the minimum value for each column in Dataframe?

We get the minimum value for each of the two columns. 3. Min value for each column in the dataframe Similarly, you can get the min value for each column in the dataframe. Apply the min () function over the entire dataframe instead of a single column or a selection of columns.


1 Answers

You're storing date objects in your columns, if you convert to datetime then it works as expected:

In[10]:
df['col1'] = pd.to_datetime(df['col1'])
df['col2'] = pd.to_datetime(df['col2'])
df

Out[10]: 
        col1       col2  col3
0        NaT 2000-11-01   NaN
1 2000-12-01 2000-09-01   NaN
2 2000-04-01        NaT   NaN
3 2000-12-02        NaT   NaN

In[11]:
df['col3'] = df[['col1','col2']].max(axis=1)
df

Out[11]: 
        col1       col2       col3
0        NaT 2000-11-01 2000-11-01
1 2000-12-01 2000-09-01 2000-12-01
2 2000-04-01        NaT 2000-04-01
3 2000-12-02        NaT 2000-12-02

If you simply did:

df['col3'] = df['col1'].max()

this raises a TypeError: '>=' not supported between instances of 'float' and 'datetime.date'

The NaN values cause the dtype to be promoted to float so NaN gets returned. If you had no missing values then it would work as expected, if you have missing values then you should convert the dtype to datetime so that the missing values are converted to NaT so that max works correctly

like image 142
EdChum Avatar answered Oct 23 '22 17:10

EdChum