Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Element-wise Maximum of Two DataFrames Ignoring NaNs

I have two dataframes (df1 and df2) that each have the same rows and columns. I would like to take the maximum of these two dataframes, element-by-element. In addition, the result of any element-wise maximum with a number and NaN should be the number. The approach I have implemented so far seems inefficient:

def element_max(df1,df2):
    import pandas as pd
    cond = df1 >= df2
    res = pd.DataFrame(index=df1.index, columns=df1.columns)
    res[(df1==df1)&(df2==df2)&(cond)]  = df1[(df1==df1)&(df2==df2)&(cond)]
    res[(df1==df1)&(df2==df2)&(~cond)] = df2[(df1==df1)&(df2==df2)&(~cond)]
    res[(df1==df1)&(df2!=df2)&(~cond)] = df1[(df1==df1)&(df2!=df2)]
    res[(df1!=df1)&(df2==df2)&(~cond)] = df2[(df1!=df1)&(df2==df2)]
    return res

Any other ideas? Thank you for your time.

like image 904
DrTRD Avatar asked Oct 08 '15 13:10

DrTRD


People also ask

How do you find the maximum value in an entire data frame?

The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.

How do you find the maximum and minimum value of a DataFrame?

max() method finds the maximum of the values in the object and returns it. If the input is a series, the method will return a scalar which will be the maximum of the values in the series. If the input is a Dataframe, then the method will return a series with a maximum of values over the specified axis in the Dataframe.

Does pandas DataFrame have a limit?

The short answer is yes, there is a size limit for pandas DataFrames, but it's so large you will likely never have to worry about it. The long answer is the size limit for pandas DataFrames is 100 gigabytes (GB) of memory instead of a set number of cells.


1 Answers

A more readable way to do this in recent versions of pandas is concat-and-max:

import scipy as sp
import pandas as pd

A = pd.DataFrame([[1., 2., 3.]])
B = pd.DataFrame([[3., sp.nan, 1.]])

pd.concat([A, B]).max(level=0)
# 
#           0    1    2
#      0  3.0  2.0  3.0 
#
like image 186
Andy Jones Avatar answered Sep 28 '22 05:09

Andy Jones