Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate mean for selected rows for selected columns in pandas data frame

Tags:

I have pandas df with say, 100 rows, 10 columns, (actual data is huge). I also have row_index list which contains, which rows to be considered to take mean. I want to calculate mean on say columns 2,5,6,7 and 8. Can we do it with some function for dataframe object?

What I know is do a for loop, get value of row for each element in row_index and keep doing mean. Do we have some direct function where we can pass row_list, and column_list and axis, for ex df.meanAdvance(row_list,column_list,axis=0) ?

I have seen DataFrame.mean() but it didn't help I guess.

  a b c d q  0 1 2 3 0 5 1 1 2 3 4 5 2 1 1 1 6 1 3 1 0 0 0 0 

I want mean of 0, 2, 3 rows for each a, b, d columns

  a b d 0 1 1 2 
like image 804
impossible Avatar asked Apr 06 '16 14:04

impossible


People also ask

How do you find the mean of multiple rows in pandas?

To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.

How do you calculate mean of data in pandas?

To find mean of DataFrame, use Pandas DataFrame. mean() function. The DataFrame. mean() function returns the mean of the values for the requested axis.


1 Answers

To select the rows of your dataframe you can use iloc, you can then select the columns you want using square brackets.

For example:

 df = pd.DataFrame(data=[[1,2,3]]*5, index=range(3, 8), columns = ['a','b','c']) 

gives the following dataframe:

   a  b  c 3  1  2  3 4  1  2  3 5  1  2  3 6  1  2  3 7  1  2  3 

to select only the 3d and fifth row you can do:

df.iloc[[2,4]] 

which returns:

   a  b  c 5  1  2  3 7  1  2  3 

if you then want to select only columns b and c you use the following command:

df[['b', 'c']].iloc[[2,4]] 

which yields:

   b  c 5  2  3 7  2  3 

To then get the mean of this subset of your dataframe you can use the df.mean function. If you want the means of the columns you can specify axis=0, if you want the means of the rows you can specify axis=1

thus:

df[['b', 'c']].iloc[[2,4]].mean(axis=0) 

returns:

b    2 c    3 

As we should expect from the input dataframe.

For your code you can then do:

 df[column_list].iloc[row_index_list].mean(axis=0) 

EDIT after comment: New question in comment: I have to store these means in another df/matrix. I have L1, L2, L3, L4...LX lists which tells me the index whose mean I need for columns C[1, 2, 3]. For ex: L1 = [0, 2, 3] , means I need mean of rows 0,2,3 and store it in 1st row of a new df/matrix. Then L2 = [1,4] for which again I will calculate mean and store it in 2nd row of the new df/matrix. Similarly till LX, I want the new df to have X rows and len(C) columns. Columns for L1..LX will remain same. Could you help me with this?

Answer:

If i understand correctly, the following code should do the trick (Same df as above, as columns I took 'a' and 'b':

first you loop over all the lists of rows, collection all the means as pd.series, then you concatenate the resulting list of series over axis=1, followed by taking the transpose to get it in the right format.

dfs = list() for l in L:     dfs.append(df[['a', 'b']].iloc[l].mean(axis=0))  mean_matrix = pd.concat(dfs, axis=1).T 
like image 157
PdevG Avatar answered Oct 23 '22 20:10

PdevG