I have a Dataframe:
df =
A B C D
DATA_DATE
20170103 5.0 3.0 NaN NaN
20170104 NaN NaN NaN 1.0
20170105 1.0 NaN 2.0 3.0
And I have a series
s =
DATA_DATE
20170103 4.0
20170104 0.0
20170105 2.2
I'd like to run an element-wise max()
function and align s
along the columns of df
. In other words, I want to get
result =
A B C D
DATA_DATE
20170103 5.0 4.0 NaN NaN
20170104 NaN NaN NaN 1.0
20170105 2.2 NaN 2.2 3.0
What is the best way to do this? I've checked single column comparison and series to series comparison but haven't found an efficient way to run dataframe against a series.
Bonus: Not sure if the answer will be self-evident from above, but how to do it if I want to align s
along the rows of df
(assume dimensions match)?
It is possible to compare two pandas Series with help of Relational operators, we can easily compare the corresponding elements of two series at a time. The result will be displayed in form of True or False. And we can also use a function like Pandas Series. equals() to compare two pandas series.
Pandas DataFrame min() Method The min() method returns a Series with the minimum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the minimum value for each row.
Pandas str. slice() method is used to slice substrings from a string present in Pandas series object. It is very similar to Python's basic principal of slicing objects that works on [start:stop:step] which means it requires three parameters, where to start, where to end and how much elements to skip.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
Data:
In [135]: df
Out[135]:
A B C D
DATA_DATE
20170103 5.0 3.0 NaN NaN
20170104 NaN NaN NaN 1.0
20170105 1.0 NaN 2.0 3.0
In [136]: s
Out[136]:
20170103 4.0
20170104 0.0
20170105 2.2
Name: DATA_DATE, dtype: float64
Solution:
In [66]: df.clip_lower(s, axis=0)
C:\Users\Max\Anaconda4\lib\site-packages\pandas\core\ops.py:1247: RuntimeWarning: invalid value encountered in greater_equal
result = op(x, y)
Out[66]:
A B C D
DATA_DATE
20170103 5.0 4.0 NaN NaN
20170104 NaN NaN NaN 1.0
20170105 2.2 NaN 2.2 3.0
we can use the following hack in order to ged rid of the RuntimeWarning
:
In [134]: df.fillna(np.inf).clip_lower(s, axis=0).replace(np.inf, np.nan)
Out[134]:
A B C D
DATA_DATE
20170103 5.0 4.0 NaN NaN
20170104 NaN NaN NaN 1.0
20170105 2.2 NaN 2.2 3.0
This is called broadcasting and can be done as follows:
import numpy as np
np.maximum(df, s[:, None])
Out:
A B C D
DATA_DATE
20170103 5.0 4.0 NaN NaN
20170104 NaN NaN NaN 1.0
20170105 2.2 NaN 2.2 3.0
Here, s[:, None]
will add a new axis to s
. The same can be achieved by s[:, np.newaxis]
. When you do this, they can be broadcast together because shapes (3, 4)
and (3, 1)
have a common element.
Note the difference between s
and s[:, None]
:
s.values
Out: array([ 4. , 0. , 2.2])
s[:, None]
Out:
array([[ 4. ],
[ 0. ],
[ 2.2]])
s.shape
Out: (3,)
s[:, None].shape
Out: (3, 1)
An alternative would be:
df.mask(df.le(s, axis=0), s, axis=0)
Out:
A B C D
DATA_DATE
20170103 5.0 4.0 NaN NaN
20170104 NaN NaN NaN 1.0
20170105 2.2 NaN 2.2 3.0
This reads: Compare df and s. Where df is larger, use df, and otherwise use s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With