I have 2 dataframes that are 50 columns in the following format.
ID v10 v9 GENE
NM_198317 0.0 9.666823e-01 KLHL17
NM_198317 -1.0 0.000000e+00 KLHL17
I wanted to average the values where GENE matched so I used
common_mouse = common_mouse.groupby(['GENE'], as_index=False).agg([np.average])
This has resulted in a table with two headers for each column, but I only want to preserve the older header.
logger.info("Column headers: %s" % list(common_mouse.columns.values))
Shows
[('v10', 'average'), ('v11', 'average'), ('v12', 'average')]
rather than
['v10', 'v11', 'v12']
I tried to use zip(*common_mouse.columns.values) in order to reassign the columns but I can't seem to get it to work. Is there a way to either prevent or remove the new 'average' field from the column header?
You are using a list of functions as an argument for agg. When you do this, you are telling Pandas groupby that there are several aggregating functions that should be calculated for each column. It lets you know by creating a MultiIndex columns object.
So don't give it a list of functions to aggregate with. Give it just one (not in a list).
common_mouse.groupby(['GENE'], as_index=False).agg(np.average)
GENE v10 v9
0 KLHL17 -0.5 0.483341
However, Pandas has pre-coded strings that it has optimized algorithms for. Averaging is one of them. Use the mean string argument instead.
common_mouse.groupby(['GENE'], as_index=False).agg('mean')
GENE v10 v9
0 KLHL17 -0.5 0.483341
Even further, Pandas has a dedicated groupby method for this.
common_mouse.groupby(['GENE'], as_index=False).mean()
GENE v10 v9
0 KLHL17 -0.5 0.483341
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With