Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum a column grouped by other columns in a list?

I have a list as follows.

[['Andrew', '1', '9'], ['Peter', '1', '10'], ['Andrew', '1', '8'], ['Peter', '1', '11'], ['Sam', '4', '9'], ['Andrew', '2', '2']]

I would like sum up the last column grouped by the other columns.The result is like this

[['Andrew', '1', '17'], ['Peter', '1', '21'], ['Sam', '4', '9'], ['Andrew', '2', '2']]

which is still a list.

In real practice, I would always like to sum up the last column grouped by many other columns. Is there a way I can do this in Python? Much appreciated.

like image 665
Deepleeqe Avatar asked Mar 28 '18 13:03

Deepleeqe


People also ask

How do I sum a list of columns in pandas?

To sum given or list of columns then create a list with all columns you wanted and slice the DataFrame with the selected list of columns and use the sum() function. Use df['Sum']=df[col_list]. sum(axis=1) to get the total sum.

How do you sum multiple columns in Python?

If we want to summarize all the columns, then we can simply use the DataFrame sum() method.

How do I sum a list of columns in Python?

We can find sum of each column of the given nested list using zip function of python enclosing it within list comprehension. Another approach is to use map(). We apply the sum function to each element in a column and find sum of each column accordingly.


4 Answers

dynamically grouping by all columns except the last one:

In [24]: df = pd.DataFrame(data)

In [25]: df.groupby(df.columns[:-1].tolist(), as_index=False).agg(lambda x: x.astype(int).sum()).values.tolist()
Out[25]: [['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
like image 193
MaxU - stop WAR against UA Avatar answered Oct 08 '22 04:10

MaxU - stop WAR against UA


This is an O(n) solution via collections.defaultdict, adaptable to any number of keys.

If your desired output is a list, then this may be preferable to a solution via Pandas, which requires conversion to and from a non-standard type.

from collections import defaultdict

lst = [['Andrew', '1', '9'], ['Peter', '1', '10'], ['Andrew', '1', '8'],
       ['Peter', '1', '11'], ['Sam', '4', '9'], ['Andrew', '2', '2']]

d = defaultdict(int)

for *keys, val in lst:
    d[tuple(keys)] += int(val)

res = [[*k, v] for k, v in sorted(d.items())]

Result

[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]

Explanation

  • Cycle through your list of lists, define keys / value and add to your defaultdict of lists.
  • Use a list comprehension to convert dictionary to desired output.
like image 31
jpp Avatar answered Oct 08 '22 03:10

jpp


Op1

You can pass a index sum and add tolist convert back to list

pd.DataFrame(L).\
   set_index([0,1])[2].astype(int).sum(level=[0,1]).\
        reset_index().values.tolist()
Out[78]: [['Andrew', '1', 17], ['Peter', '1', 21], ['Sam', '4', 9], ['Andrew', '2', 2]]

Op2

For list of list you can using groupby from itertools

from itertools import groupby
[k+[sum(int(v) for _,_, v in g)] for k, g in groupby(sorted(l), key = lambda x: [x[0],x[1]])]
Out[98]: [['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
like image 9
BENY Avatar answered Oct 08 '22 04:10

BENY


Create to DataFrame and aggregate third column converted to integers by first and second columns, last convert back to lists:

df = pd.DataFrame(L)
L = df[2].astype(int).groupby([df[0], df[1]]).sum().reset_index().values.tolist()
print (L)
[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]

And solution with defaultdict, python 3.x only:

from collections import defaultdict

d = defaultdict(int)
#https://stackoverflow.com/a/10532492
for *head, tail in L:
    d[tuple(head)] += int(tail)

d = [[*i, j] for i, j in sorted(d.items())]
print (d)
[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
like image 7
jezrael Avatar answered Oct 08 '22 02:10

jezrael