I've got a 'DataFrame` which has occasional missing values, and looks something like this:
Monday Tuesday Wednesday ================================================ Mike 42 NaN 12 Jenna NaN NaN 15 Jon 21 4 1
I'd like to add a new column
to my data frame where I'd calculate the average across all columns
for every row
.
Meaning, for Mike
, I'd need (df['Monday'] + df['Wednesday'])/2
, but for Jenna
, I'd simply use df['Wednesday amt.']/1
Does anyone know the best way to account for this variation that results from missing values and calculate the average?
If you want to extract rows with missing values in a specific column, use the result of isnull() for that column. The concept is the same when extracting columns with missing values in a specific row. Use loc[] to select by name (label), and iloc[] to select by position.
To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.
The row average can be found using DataFrame. mean() function. It returns the mean of the values over the requested axis. If axis = 0 , the mean function is applied over the columns.
Using Python iloc() function to create a subset of a dataframe. Python iloc() function enables us to create subset choosing specific values from rows and columns based on indexes.
You can simply:
df['avg'] = df.mean(axis=1) Monday Tuesday Wednesday avg Mike 42 NaN 12 27.000000 Jenna NaN NaN 15 15.000000 Jon 21 4 1 8.666667
because .mean()
ignores missing values by default: see docs.
To select a subset, you can:
df['avg'] = df[['Monday', 'Tuesday']].mean(axis=1) Monday Tuesday Wednesday avg Mike 42 NaN 12 42.0 Jenna NaN NaN 15 NaN Jon 21 4 1 12.5
Alternative - using iloc (can also use loc here):
df['avg'] = df.iloc[:,0:2].mean(axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With