Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas element-wise min max against a series along one axis

I have a Dataframe:

df = 
             A    B    C    D
DATA_DATE
20170103   5.0  3.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   1.0  NaN  2.0  3.0

And I have a series

s = 
DATA_DATE
20170103    4.0
20170104    0.0
20170105    2.2

I'd like to run an element-wise max() function and align s along the columns of df. In other words, I want to get

result = 
             A    B    C    D
DATA_DATE
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

What is the best way to do this? I've checked single column comparison and series to series comparison but haven't found an efficient way to run dataframe against a series.

Bonus: Not sure if the answer will be self-evident from above, but how to do it if I want to align s along the rows of df (assume dimensions match)?

like image 944
Zhang18 Avatar asked May 16 '17 22:05

Zhang18


People also ask

How do I compare Series values in pandas?

It is possible to compare two pandas Series with help of Relational operators, we can easily compare the corresponding elements of two series at a time. The result will be displayed in form of True or False. And we can also use a function like Pandas Series. equals() to compare two pandas series.

How do you find the max and min of a column in pandas?

Pandas DataFrame min() Method The min() method returns a Series with the minimum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the minimum value for each row.

How do you slice Series pandas?

Pandas str. slice() method is used to slice substrings from a string present in Pandas series object. It is very similar to Python's basic principal of slicing objects that works on [start:stop:step] which means it requires three parameters, where to start, where to end and how much elements to skip.

How do you count occurrences in pandas Series?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.


2 Answers

Data:

In [135]: df
Out[135]:
             A    B    C    D
DATA_DATE
20170103   5.0  3.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   1.0  NaN  2.0  3.0

In [136]: s
Out[136]:
20170103    4.0
20170104    0.0
20170105    2.2
Name: DATA_DATE, dtype: float64

Solution:

In [66]: df.clip_lower(s, axis=0)
C:\Users\Max\Anaconda4\lib\site-packages\pandas\core\ops.py:1247: RuntimeWarning: invalid value encountered in greater_equal
  result = op(x, y)
Out[66]:
             A    B    C    D
DATA_DATE
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

we can use the following hack in order to ged rid of the RuntimeWarning:

In [134]: df.fillna(np.inf).clip_lower(s, axis=0).replace(np.inf, np.nan)
Out[134]:
             A    B    C    D
DATA_DATE
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0
like image 73
MaxU - stop WAR against UA Avatar answered Oct 06 '22 13:10

MaxU - stop WAR against UA


This is called broadcasting and can be done as follows:

import numpy as np
np.maximum(df, s[:, None])
Out: 
             A    B    C    D
DATA_DATE                    
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

Here, s[:, None] will add a new axis to s. The same can be achieved by s[:, np.newaxis]. When you do this, they can be broadcast together because shapes (3, 4) and (3, 1) have a common element.

Note the difference between s and s[:, None]:

s.values
Out: array([ 4. ,  0. ,  2.2])

s[:, None]
Out: 
array([[ 4. ],
       [ 0. ],
       [ 2.2]])

s.shape
Out: (3,)

s[:, None].shape
Out: (3, 1)

An alternative would be:

df.mask(df.le(s, axis=0), s, axis=0)

Out: 
             A    B    C    D
DATA_DATE                    
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

This reads: Compare df and s. Where df is larger, use df, and otherwise use s.

like image 26
ayhan Avatar answered Oct 06 '22 11:10

ayhan