Pandas dataframe has additional header after groupby agg

Question

I have 2 dataframes that are 50 columns in the following format.

ID             v10    v9              GENE
NM_198317      0.0    9.666823e-01    KLHL17
NM_198317      -1.0   0.000000e+00    KLHL17

I wanted to average the values where GENE matched so I used

common_mouse = common_mouse.groupby(['GENE'], as_index=False).agg([np.average])

This has resulted in a table with two headers for each column, but I only want to preserve the older header.

logger.info("Column headers: %s" % list(common_mouse.columns.values))

Shows

[('v10', 'average'), ('v11', 'average'), ('v12', 'average')]

rather than

['v10', 'v11', 'v12']

I tried to use zip(*common_mouse.columns.values) in order to reassign the columns but I can't seem to get it to work. Is there a way to either prevent or remove the new 'average' field from the column header?

piRSquared · Accepted Answer

You are using a list of functions as an argument for agg. When you do this, you are telling Pandas groupby that there are several aggregating functions that should be calculated for each column. It lets you know by creating a MultiIndex columns object.

So don't give it a list of functions to aggregate with. Give it just one (not in a list).

Use a scalar

common_mouse.groupby(['GENE'], as_index=False).agg(np.average)

     GENE  v10        v9
0  KLHL17 -0.5  0.483341

String operator

However, Pandas has pre-coded strings that it has optimized algorithms for. Averaging is one of them. Use the mean string argument instead.

common_mouse.groupby(['GENE'], as_index=False).agg('mean')

     GENE  v10        v9
0  KLHL17 -0.5  0.483341

groupby method

Even further, Pandas has a dedicated groupby method for this.

common_mouse.groupby(['GENE'], as_index=False).mean()

     GENE  v10        v9
0  KLHL17 -0.5  0.483341

Pandas dataframe has additional header after groupby agg

Tags:

python

pandas

pandas-groupby

Stephopolis

1 Answers

Use a scalar

String operator

groupby method

piRSquared

Recent Activity

Donate For Us

Pandas dataframe has additional header after groupby agg

Tags:

python

pandas

pandas-groupby

Stephopolis

1 Answers

Use a scalar

String operator

groupby method

piRSquared

Related questions

Recent Activity

Donate For Us