Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum rows with the same keys?

In my code, df is defined like this

df = pd.read_excel(io=file_name, sheet_name=sheet, sep='\s*,\s*')

I have a [86 rows x 1 columns] dataframe df which looks like this on print(df)

          0
Male    511
Female  461
Male    273
Female  217
Male    394
Female  337
Female  337
Male    337
...

I wish to write a code that would merge the Male and Female entries like this

          0   1   2   3 ...
Male    511 273 394 337 ...
Female  461 217 337 337 ...

The final task I need to do is to .sum() the male row and then the female row to get the total of each sex. I am new to python and pandas and I haven't been able to make much progress so far. Any help, tutorial, documentation would be great! Thank you!

Edit: By keys I mean the indexes. I hope these labels of Male and Females can be used to 'club' these rows together, but I don't know how to.

Edit: I have accomplished my last task directly via

print(df.ix['Female'].sum())
print(df.ix['Male'].sum())

But I am yet to achieve my forst task. Any ideas?

like image 562
Vibhu Avatar asked Jun 08 '18 09:06

Vibhu


People also ask

How do you sum across rows in a data frame?

Use pandas.DataFrame.sum() to sum the rows of a DataFrame Call pandas.DataFrame.sum(axis=1) to find the sum of all rows in DataFrame ; axis=1 specifies that the sum will be done on the rows. Specify the sum to be restricted to certain columns by making a list of the columns to be included in the sum.

How do I sum a Pandas row?

To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.


1 Answers

Create MultiIndex by GroupBy.cumcount for new columns names created by reshaping by unstack:

df.index = [df.index, df.groupby(level=0).cumcount()]

print (df)
            0
Male   0  511
Female 0  461
Male   1  273
Female 1  217
Male   2  394
Female 2  337
       3  337
Male   3  337

df = df[0].unstack()
print (df)
          0    1    2    3
Female  461  217  337  337
Male    511  273  394  337

And then sum all rows by axis=1:

print (df.sum(axis=1))

Female    1352
Male      1515
dtype: int64
like image 177
jezrael Avatar answered Nov 07 '22 09:11

jezrael