Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate new column as the mean of other columns pandas [duplicate]

I have a this data frame:

and I would like to calculate a new columns as de the mean of salary_1, salary_2and salary_3.

df = pd.DataFrame({'salary_1':[230,345,222],'salary_2':[235,375,292],'salary_3':[210,385,260]})        salary_1     salary_2    salary_3 0        230           235        210 1        345           375        385 2        222           292        260 

How can I do it in pandas in the most efficient way? Actually I have many more columns and I don't want to write this one by one.

Something like this:

      salary_1     salary_2    salary_3     salary_mean 0        230           235        210     (230+235+210)/3 1        345           375        385       ... 2        222           292        260       ... 

Thank you!

like image 214
Carmen Pérez Carrillo Avatar asked Jan 21 '18 11:01

Carmen Pérez Carrillo


People also ask

How do you get the mean of two columns in pandas?

To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.

How do you get the value of a column based on another column pandas?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.

How will you add the value of two columns in a pandas DataFrame to create another column?

Combine Two Columns Using + OperatorBy use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.


2 Answers

Use .mean. By specifying the axis you can take the average across the row or the column.

df['average'] = df.mean(axis=1) df 

returns

       salary_1  salary_2  salary_3     average 0       230       235       210  225.000000 1       345       375       385  368.333333 2       222       292       260  258.000000 

If you only want the mean of a few you can select only those columns. E.g.

df['average_1_3'] = df[['salary_1', 'salary_3']].mean(axis=1) df 

returns

   salary_1  salary_2  salary_3  average_1_3 0       230       235       210        220.0 1       345       375       385        365.0 2       222       292       260        241.0 
like image 109
Alex Avatar answered Sep 17 '22 01:09

Alex


an easy way to solve this problem is shown below :

col = df.loc[: , "salary_1":"salary_3"] 

where "salary_1" is the start column name and "salary_3" is the end column name

df['salary_mean'] = col.mean(axis=1) df 

This will give you a new dataframe with a new column that shows the mean of all the other columns This approach is really helpful when you are having a large set of columns or also helpful when you need to perform on only some selected columns not on all.

like image 27
Mr. Stark Avatar answered Sep 17 '22 01:09

Mr. Stark