If I have a dataframe similar to this one
Apples Bananas Grapes Kiwis
2 3 nan 1
1 3 7 nan
nan nan 2 3
I would like to add a column like this
Apples Bananas Grapes Kiwis Fruit Total
2 3 nan 1 6
1 3 7 nan 11
nan nan 2 3 5
I guess you could use df['Apples'] + df['Bananas']
and so on, but my actual dataframe is much larger than this. I was hoping a formula like df['Fruit Total']=df[-4:-1].sum
could do the trick in one line of code. That didn't work however. Is there any way to do it without explicitly summing up all columns?
You can use DataFrame. apply() for concatenate multiple column values into a single column, with slightly less typing and more scalable when you want to join multiple columns .
To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].
Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.
To apply aggregations to multiple columns, just add additional key:value pairs to the dictionary. Applying multiple aggregation functions to a single column will result in a multiindex. Working with multi-indexed columns is a pain and I'd recommend flattening this after aggregating by renaming the new columns.
You can first select by iloc
and then sum
:
df['Fruit Total']= df.iloc[:, -4:-1].sum(axis=1)
print (df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 5.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 2.0
For sum all columns use:
df['Fruit Total']= df.sum(axis=1)
This may be helpful for beginners, so for the sake of completeness, if you know the column names (e.g. they are in a list), you can use:
column_names = ['Apples', 'Bananas', 'Grapes', 'Kiwis']
df['Fruit Total']= df[column_names].sum(axis=1)
This gives you flexibility about which columns you use as you simply have to manipulate the list column_names
and you can do things like pick only columns with the letter 'a' in their name. Another benefit of this is that it's easier for humans to understand what they are doing through column names. Combine this with list(df.columns)
to get the column names in a list format. Thus, if you want to drop the last column, all you have to do is:
column_names = list(df.columns)
df['Fruit Total']= df[column_names[:-1]].sum(axis=1)
It is possible to do it without knowing the number of columns and even without iloc:
print(df)
Apples Bananas Grapes Kiwis
0 2.0 3.0 NaN 1.0
1 1.0 3.0 7.0 NaN
2 NaN NaN 2.0 3.0
cols_to_sum = df.columns[ : df.shape[1]-1]
df['Fruit Total'] = df[cols_to_sum].sum(axis=1)
print(df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 5.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 5.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With