Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate average of every x rows in a table and create new table

I have a long table of data (~200 rows by 50 columns) and I need to create a code that can calculate the mean values of every two rows and for each column in the table with the final output being a new table of the mean values. This is obviously crazy to do in Excel! I use python3 and I am aware of some similar questions:here, here and here. But none of these helps as I need some elegant code to work with multiple columns and produces an organised data table. By the way my original datatable has been imported using pandas and is defined as a dataframe but could not find an easy way to do this in pandas. Help is much appreciated.

An example of the table (short version) is:

a   b   c   d 2   50  25  26 4   11  38  44 6   33  16  25 8   37  27  25 10  28  48  32 12  47  35  45 14  8   16  7 16  12  16  30 18  22  39  29 20  9   15  47 

Expected mean table:

a    b     c     d 3   30.5  31.5  35 7   35    21.5  25 11  37.5  41.5  38.5 15  10    16    18.5 19  15.5  27    38 
like image 650
Gnu Avatar asked Apr 23 '16 12:04

Gnu


2 Answers

You can create an artificial group using df.index//2 (or as @DSM pointed out, using np.arange(len(df))//2 - so that it works for all indices) and then use groupby:

df.groupby(np.arange(len(df))//2).mean() Out[13]:        a     b     c     d 0   3.0  30.5  31.5  35.0 1   7.0  35.0  21.5  25.0 2  11.0  37.5  41.5  38.5 3  15.0  10.0  16.0  18.5 4  19.0  15.5  27.0  38.0 
like image 149
ayhan Avatar answered Sep 23 '22 05:09

ayhan


You can approach this problem using pd.rolling() to create a rolling average and then just grab every second element using iloc

df = df.rolling(2).mean()  df = df.iloc[::2, :] 

Note that the first observation will be missing (i.e. the rolling starts at the top) so make sure to check that your data is sorted how you need it.

like image 26
seeiespi Avatar answered Sep 23 '22 05:09

seeiespi