Find the max of two or more columns with pandas

People also ask

How do you get Max in pandas?

Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.

How do I get the minimum of two columns in pandas?

Min value between two pandas columns You can do so by using the pandas min() function twice.

You can get the maximum like this:

>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [-2, 8, 1]})
>>> df
   A  B
0  1 -2
1  2  8
2  3  1
>>> df[["A", "B"]]
   A  B
0  1 -2
1  2  8
2  3  1
>>> df[["A", "B"]].max(axis=1)
0    1
1    8
2    3

and so:

>>> df["C"] = df[["A", "B"]].max(axis=1)
>>> df
   A  B  C
0  1 -2  1
1  2  8  8
2  3  1  3

If you know that "A" and "B" are the only columns, you could even get away with

>>> df["C"] = df.max(axis=1)

And you could use .apply(max, axis=1) too, I guess.

@DSM's answer is perfectly fine in almost any normal scenario. But if you're the type of programmer who wants to go a little deeper than the surface level, you might be interested to know that it is a little faster to call numpy functions on the underlying .to_numpy() (or .values for <0.24) array instead of directly calling the (cythonized) functions defined on the DataFrame/Series objects.

For example, you can use ndarray.max() along the first axis.

# Data borrowed from @DSM's post.
df = pd.DataFrame({"A": [1,2,3], "B": [-2, 8, 1]})
df
   A  B
0  1 -2
1  2  8
2  3  1

df['C'] = df[['A', 'B']].values.max(1)
# Or, assuming "A" and "B" are the only columns, 
# df['C'] = df.values.max(1) 
df

   A  B  C
0  1 -2  1
1  2  8  8
2  3  1  3

If your data has NaNs, you will need numpy.nanmax:

df['C'] = np.nanmax(df.values, axis=1)
df

   A  B  C
0  1 -2  1
1  2  8  8
2  3  1  3

You can also use numpy.maximum.reduce. numpy.maximum is a ufunc (Universal Function), and every ufunc has a reduce:

df['C'] = np.maximum.reduce(df['A', 'B']].values, axis=1)
# df['C'] = np.maximum.reduce(df[['A', 'B']], axis=1)
# df['C'] = np.maximum.reduce(df, axis=1)
df

   A  B  C
0  1 -2  1
1  2  8  8
2  3  1  3

enter image description here

np.maximum.reduce and np.max appear to be more or less the same (for most normal sized DataFrames)—and happen to be a shade faster than DataFrame.max. I imagine this difference roughly remains constant, and is due to internal overhead (indexing alignment, handling NaNs, etc).

The graph was generated using perfplot. Benchmarking code, for reference:

import pandas as pd
import perfplot

np.random.seed(0)
df_ = pd.DataFrame(np.random.randn(5, 1000))

perfplot.show(
    setup=lambda n: pd.concat([df_] * n, ignore_index=True),
    kernels=[
        lambda df: df.assign(new=df.max(axis=1)),
        lambda df: df.assign(new=df.values.max(1)),
        lambda df: df.assign(new=np.nanmax(df.values, axis=1)),
        lambda df: df.assign(new=np.maximum.reduce(df.values, axis=1)),
    ],
    labels=['df.max', 'np.max', 'np.maximum.reduce', 'np.nanmax'],
    n_range=[2**k for k in range(0, 15)],
    xlabel='N (* len(df))',
    logx=True,
    logy=True)

Related questions
                            
                                Drop rows with all zeros in pandas data frame
                            
                                Is there an expression for an infinite iterator?
                            
                                How to run code when a class is subclassed? [duplicate]
                            
                                What is the difference between .py and .pyc files? [duplicate]
                            
                                Why do you have to call .items() when iterating over a dictionary in Python?
                            
                                Generate temporary file names without creating actual file in Python
                            
                                Are for-loops in pandas really bad? When should I care?
                            
                                Equivalent C++ to Python generator pattern
                            
                                How to set a cell to NaN in a pandas dataframe
                            
                                Learning Python from Ruby; Differences and Similarities
                            
                                Displaying better error message than "No JSON object could be decoded"
                            
                                How to create major and minor gridlines with different linestyles in Python
                            
                                What exactly is Python multiprocessing Module's .join() Method Doing?
                            
                                Iterate over the lines of a string
                            
                                Combining node.js and Python
                            
                                Difference between len() and .__len__()?
                            
                                How to save a list as numpy array in python?
                            
                                In-memory size of a Python structure
                            
                                How do I disable a test using pytest?
                            
                                are there dictionaries in javascript like python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find the max of two or more columns with pandas

Tags:

python

pandas

dataframe

People also ask

Recent Activity

Donate For Us