I have a Dataframe: <pre class="prettyprint"><code>df = A B C D DATA_DATE 20170103 5.0 3.0 NaN NaN 20170104 NaN NaN NaN 1.0 20170105 1.0 NaN 2.0 3.0 </code></pre> And I have a series <pre class="prettyprint"><code>s = DATA_DATE 20170103 4.0 20170104 0.0 20170105 2.2 </code></pre> I'd like to run an element-wise <code>max()</code> function and align <code>s</code> along the columns of <code>df</code>. In other words, I want to get <pre class="prettyprint"><code>result = A B C D DATA_DATE 20170103 5.0 4.0 NaN NaN 20170104 NaN NaN NaN 1.0 20170105 2.2 NaN 2.2 3.0 </code></pre> What is the best way to do this? I've checked single column comparison and series to series comparison but haven't found an efficient way to run dataframe against a series. Bonus: Not sure if the answer will be self-evident from above, but how to do it if I want to align <code>s</code> along the rows of <code>df</code> (assume dimensions match)?

This is called broadcasting and can be done as follows: <pre class="prettyprint"><code>import numpy as np np.maximum(df, s[:, None]) Out: A B C D DATA_DATE 20170103 5.0 4.0 NaN NaN 20170104 NaN NaN NaN 1.0 20170105 2.2 NaN 2.2 3.0 </code></pre> Here, <code>s[:, None]</code> will add a new axis to <code>s</code>. The same can be achieved by <code>s[:, np.newaxis]</code>. When you do this, they can be broadcast together because shapes <code>(3, 4)</code> and <code>(3, 1)</code> have a common element. Note the difference between <code>s</code> and <code>s[:, None]</code>: <pre class="prettyprint"><code>s.values Out: array([ 4. , 0. , 2.2]) s[:, None] Out: array([[ 4. ], [ 0. ], [ 2.2]]) s.shape Out: (3,) s[:, None].shape Out: (3, 1) </code></pre> An alternative would be: <pre class="prettyprint"><code>df.mask(df.le(s, axis=0), s, axis=0) Out: A B C D DATA_DATE 20170103 5.0 4.0 NaN NaN 20170104 NaN NaN NaN 1.0 20170105 2.2 NaN 2.2 3.0 </code></pre> This reads: Compare df and s. Where df is larger, use df, and otherwise use s.

Pandas element-wise min max against a series along one axis

Tags:

pandas

dataframe

max

min

elementwise-operations

I have a Dataframe:

df = 
             A    B    C    D
DATA_DATE
20170103   5.0  3.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   1.0  NaN  2.0  3.0

And I have a series

s = 
DATA_DATE
20170103    4.0
20170104    0.0
20170105    2.2

I'd like to run an element-wise max() function and align s along the columns of df. In other words, I want to get

result = 
             A    B    C    D
DATA_DATE
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

What is the best way to do this? I've checked single column comparison and series to series comparison but haven't found an efficient way to run dataframe against a series.

Bonus: Not sure if the answer will be self-evident from above, but how to do it if I want to align s along the rows of df (assume dimensions match)?

944

asked May 16 '17 22:05

Zhang18

2 Answers

Data:

In [135]: df
Out[135]:
             A    B    C    D
DATA_DATE
20170103   5.0  3.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   1.0  NaN  2.0  3.0

In [136]: s
Out[136]:
20170103    4.0
20170104    0.0
20170105    2.2
Name: DATA_DATE, dtype: float64

Solution:

In [66]: df.clip_lower(s, axis=0)
C:\Users\Max\Anaconda4\lib\site-packages\pandas\core\ops.py:1247: RuntimeWarning: invalid value encountered in greater_equal
  result = op(x, y)
Out[66]:
             A    B    C    D
DATA_DATE
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

we can use the following hack in order to ged rid of the RuntimeWarning:

In [134]: df.fillna(np.inf).clip_lower(s, axis=0).replace(np.inf, np.nan)
Out[134]:
             A    B    C    D
DATA_DATE
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

answered Oct 06 '22 13:10

MaxU - stop WAR against UA

This is called broadcasting and can be done as follows:

import numpy as np
np.maximum(df, s[:, None])
Out: 
             A    B    C    D
DATA_DATE                    
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

Here, s[:, None] will add a new axis to s. The same can be achieved by s[:, np.newaxis]. When you do this, they can be broadcast together because shapes (3, 4) and (3, 1) have a common element.

Note the difference between s and s[:, None]:

s.values
Out: array([ 4. ,  0. ,  2.2])

s[:, None]
Out: 
array([[ 4. ],
       [ 0. ],
       [ 2.2]])

s.shape
Out: (3,)

s[:, None].shape
Out: (3, 1)

An alternative would be:

df.mask(df.le(s, axis=0), s, axis=0)

Out: 
             A    B    C    D
DATA_DATE                    
20170103   5.0  4.0  NaN  NaN
20170104   NaN  NaN  NaN  1.0
20170105   2.2  NaN  2.2  3.0

This reads: Compare df and s. Where df is larger, use df, and otherwise use s.

answered Oct 06 '22 11:10

ayhan

Related questions
                            
                                Is it possible to use pandas.DataFrame.rolling with a step greater than 1?
                            
                                Different outcome using pandas nunique() and unique()
                            
                                Identifying root parents and all their children in trees
                            
                                Error: 'float' object has no attribute 'isna'"
                            
                                python pandas custom agg function
                            
                                Python 2.7 - statsmodels - formatting and writing summary output
                            
                                Construct Pandas DataFrame from dictionary in form {index: list of row values}
                            
                                Split Python sequence (time series/array) into subsequences with overlap
                            
                                Pandas filtering - between_time on a non-index column
                            
                                AttributeError: 'DataFrame' object has no attribute 'Height'
                            
                                Adding column to pandas DataFrame containing list of other columns' values
                            
                                Plotting a dataframe (pandas) in pycharm, not displaying
                            
                                Portfolio rebalancing with bandwidth method in python
                            
                                Pandas: Number of unique days in a timestamp Series
                            
                                Python pandas : Merge two tables without keys (Multiply 2 dataframes with broadcasting all elements; NxN dataframe)
                            
                                Convert psycopg2 DictRow query to Pandas dataframe
                            
                                Make multiindex columns in a pandas dataframe
                            
                                plot pandas DataFrame with condition columns
                            
                                How can I make a barplot and a lineplot in the same seaborn plot with different Y axes nicely?
                            
                                Convert API to Pandas DataFrame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With