If I have a dataframe that has columns that include the same name, is there a way to combine the columns that have the same name with some sort of function (i.e. sum)?
For instance with:
In [186]: df["NY-WEB01"].head() Out[186]: NY-WEB01 NY-WEB01 DateTime 2012-10-18 16:00:00 5.6 2.8 2012-10-18 17:00:00 18.6 12.0 2012-10-18 18:00:00 18.4 12.0 2012-10-18 19:00:00 18.2 12.0 2012-10-18 20:00:00 19.2 12.0
How might I collapse the NY-WEB01 columns (there are a bunch of duplicate columns, not just NY-WEB01) by summing each row where the column name is the same?
I believe this does what you are after:
df.groupby(lambda x:x, axis=1).sum()
Alternatively, between 3% and 15% faster depending on the length of the df:
df.groupby(df.columns, axis=1).sum()
EDIT: To extend this beyond sums, use .agg()
(short for .aggregate()
):
df.groupby(df.columns, axis=1).agg(numpy.max)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With