I've got a 'DataFrame` which has occasional missing values, and looks something like this: <pre class="prettyprint"><code> Monday Tuesday Wednesday ================================================ Mike 42 NaN 12 Jenna NaN NaN 15 Jon 21 4 1 </code></pre> I'd like to add a new <code>column</code> to my data frame where I'd calculate the average across all <code>columns</code> for every <code>row</code>. Meaning, for <code>Mike</code>, I'd need <code>(df['Monday'] + df['Wednesday'])/2</code>, but for <code>Jenna</code>, I'd simply use <code>df['Wednesday amt.']/1</code> Does anyone know the best way to account for this variation that results from missing values and calculate the average?

You can simply: <pre class="prettyprint"><code>df['avg'] = df.mean(axis=1) Monday Tuesday Wednesday avg Mike 42 NaN 12 27.000000 Jenna NaN NaN 15 15.000000 Jon 21 4 1 8.666667 </code></pre> because <code>.mean()</code> ignores missing values by default: see docs. To select a subset, you can: <pre class="prettyprint"><code>df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1) Monday Tuesday Wednesday avg Mike 42 NaN 12 42.0 Jenna NaN NaN 15 NaN Jon 21 4 1 12.5 </code></pre>

Alternative - using iloc (can also use loc here): <pre class="prettyprint"><code>df['avg'] = df.iloc[:,0:2].mean(axis=1) </code></pre>

Row-wise average for a subset of columns with missing values

Tags:

python

pandas

dataframe

I've got a 'DataFrame` which has occasional missing values, and looks something like this:

          Monday         Tuesday         Wednesday        ================================================ Mike        42             NaN               12 Jenna       NaN            NaN               15 Jon         21              4                 1

I'd like to add a new column to my data frame where I'd calculate the average across all columns for every row.

Meaning, for Mike, I'd need (df['Monday'] + df['Wednesday'])/2, but for Jenna, I'd simply use df['Wednesday amt.']/1

Does anyone know the best way to account for this variation that results from missing values and calculate the average?

919

asked Jan 12 '16 03:01

scrollex

2 Answers

You can simply:

df['avg'] = df.mean(axis=1)         Monday  Tuesday  Wednesday        avg Mike       42      NaN         12  27.000000 Jenna     NaN      NaN         15  15.000000 Jon        21        4          1   8.666667

because .mean() ignores missing values by default: see docs.

To select a subset, you can:

df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1)         Monday  Tuesday  Wednesday   avg Mike       42      NaN         12  42.0 Jenna     NaN      NaN         15   NaN Jon        21        4          1  12.5

answered Sep 19 '22 23:09

Stefan

Alternative - using iloc (can also use loc here):

df['avg'] = df.iloc[:,0:2].mean(axis=1)

answered Sep 18 '22 23:09

Amir F

Related questions
                            
                                import httplib ImportError: No module named httplib
                            
                                Any tutorials for developing chatbots? [closed]
                            
                                RandomForestClassfier.fit(): ValueError: could not convert string to float
                            
                                Python float to int conversion
                            
                                How do I make an auto increment integer field in Django?
                            
                                What are the differences between add_axes and add_subplot?
                            
                                what's the biggest difference between dir and __dict__ in python
                            
                                Pandas - Filtering None Values
                            
                                Python: Concatenate (or clone) a numpy array N times
                            
                                TypeError: attrib() got an unexpected keyword argument 'convert'
                            
                                Logging, StreamHandler and standard streams
                            
                                Multiplying a tuple by a scalar
                            
                                LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str'
                            
                                How to create a read-only class property in Python? [duplicate]
                            
                                I want to multiply two columns in a pandas DataFrame and add the result into a new column
                            
                                How to add items into a numpy array
                            
                                Django REST Framework - Separate permissions per methods
                            
                                Does Python have a cleaner way to express "if x contains a|b|c|d..."? [duplicate]
                            
                                Optional stdin in Python with argparse
                            
                                Parse a tuple from a string?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With