Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Row-wise average for a subset of columns with missing values

I've got a 'DataFrame` which has occasional missing values, and looks something like this:

          Monday         Tuesday         Wednesday        ================================================ Mike        42             NaN               12 Jenna       NaN            NaN               15 Jon         21              4                 1 

I'd like to add a new column to my data frame where I'd calculate the average across all columns for every row.

Meaning, for Mike, I'd need (df['Monday'] + df['Wednesday'])/2, but for Jenna, I'd simply use df['Wednesday amt.']/1

Does anyone know the best way to account for this variation that results from missing values and calculate the average?

like image 919
scrollex Avatar asked Jan 12 '16 03:01

scrollex


People also ask

How do you subset rows with missing values in pandas?

If you want to extract rows with missing values in a specific column, use the result of isnull() for that column. The concept is the same when extracting columns with missing values in a specific row. Use loc[] to select by name (label), and iloc[] to select by position.

How do you find the mean of multiple rows in pandas?

To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.

How do you calculate the average of a row in a DataFrame?

The row average can be found using DataFrame. mean() function. It returns the mean of the values over the requested axis. If axis = 0 , the mean function is applied over the columns.

How do you find the subset of a row in a data frame?

Using Python iloc() function to create a subset of a dataframe. Python iloc() function enables us to create subset choosing specific values from rows and columns based on indexes.


2 Answers

You can simply:

df['avg'] = df.mean(axis=1)         Monday  Tuesday  Wednesday        avg Mike       42      NaN         12  27.000000 Jenna     NaN      NaN         15  15.000000 Jon        21        4          1   8.666667 

because .mean() ignores missing values by default: see docs.

To select a subset, you can:

df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1)         Monday  Tuesday  Wednesday   avg Mike       42      NaN         12  42.0 Jenna     NaN      NaN         15   NaN Jon        21        4          1  12.5 
like image 64
Stefan Avatar answered Sep 19 '22 23:09

Stefan


Alternative - using iloc (can also use loc here):

df['avg'] = df.iloc[:,0:2].mean(axis=1) 
like image 44
Amir F Avatar answered Sep 18 '22 23:09

Amir F