I have a list as follows.
[['Andrew', '1', '9'], ['Peter', '1', '10'], ['Andrew', '1', '8'], ['Peter', '1', '11'], ['Sam', '4', '9'], ['Andrew', '2', '2']]
I would like sum up the last column grouped by the other columns.The result is like this
[['Andrew', '1', '17'], ['Peter', '1', '21'], ['Sam', '4', '9'], ['Andrew', '2', '2']]
which is still a list.
In real practice, I would always like to sum up the last column grouped by many other columns. Is there a way I can do this in Python? Much appreciated.
To sum given or list of columns then create a list with all columns you wanted and slice the DataFrame with the selected list of columns and use the sum() function. Use df['Sum']=df[col_list]. sum(axis=1) to get the total sum.
If we want to summarize all the columns, then we can simply use the DataFrame sum() method.
We can find sum of each column of the given nested list using zip function of python enclosing it within list comprehension. Another approach is to use map(). We apply the sum function to each element in a column and find sum of each column accordingly.
dynamically grouping by all columns except the last one:
In [24]: df = pd.DataFrame(data)
In [25]: df.groupby(df.columns[:-1].tolist(), as_index=False).agg(lambda x: x.astype(int).sum()).values.tolist()
Out[25]: [['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
This is an O(n) solution via collections.defaultdict
, adaptable to any number of keys.
If your desired output is a list, then this may be preferable to a solution via Pandas, which requires conversion to and from a non-standard type.
from collections import defaultdict
lst = [['Andrew', '1', '9'], ['Peter', '1', '10'], ['Andrew', '1', '8'],
['Peter', '1', '11'], ['Sam', '4', '9'], ['Andrew', '2', '2']]
d = defaultdict(int)
for *keys, val in lst:
d[tuple(keys)] += int(val)
res = [[*k, v] for k, v in sorted(d.items())]
Result
[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
Explanation
defaultdict
of lists.Op1
You can pass a index sum
and add tolist convert back to list
pd.DataFrame(L).\
set_index([0,1])[2].astype(int).sum(level=[0,1]).\
reset_index().values.tolist()
Out[78]: [['Andrew', '1', 17], ['Peter', '1', 21], ['Sam', '4', 9], ['Andrew', '2', 2]]
Op2
For list of list you can using groupby
from itertools
from itertools import groupby
[k+[sum(int(v) for _,_, v in g)] for k, g in groupby(sorted(l), key = lambda x: [x[0],x[1]])]
Out[98]: [['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
Create to DataFrame
and aggregate third column converted to integers by first and second columns, last convert back to list
s:
df = pd.DataFrame(L)
L = df[2].astype(int).groupby([df[0], df[1]]).sum().reset_index().values.tolist()
print (L)
[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
And solution with defaultdict, python 3.x only:
from collections import defaultdict
d = defaultdict(int)
#https://stackoverflow.com/a/10532492
for *head, tail in L:
d[tuple(head)] += int(tail)
d = [[*i, j] for i, j in sorted(d.items())]
print (d)
[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With