Python pandas: How to group by and count unique values based on multiple columns?

Question

I have datafarme df:

id name number
1 sam   76
2 sam    8
2 peter  8 
4 jack   2

I would like to group by on 'id' column and count the number of unique values based on the pair of (name,number)?

id count(name-number)
1    1
2    2
4    1

I have tried this, but it does not work:

df.groupby('id')[('number','name')].nunique().reset_index()

stedes · Accepted Answer

You can just combine two groupbys to get the desired result.

import pandas
df = pandas.DataFrame({"id": [1, 2, 2, 4], "name": ["sam", "sam", "peter", "jack"], "number": [8, 8, 8, 2]})
group = df.groupby(['id','name','number']).size().groupby(level=0).size()

The first groupby will count the complete set of original combinations (and thereby make the columns you want to count unique). The second groupby will count the unique occurences per the column you want (and you can use the fact that the first groupby put that column in the index).

The result will be a Series. If you want to have DataFrame with the right column name (as you showed in your desired result) you can use the aggregate function:

group = df.groupby(['id','name','number']).size().groupby(level=0).agg({'count(name-number':'size'})

mvd · Answer

You can do:

import pandas
df = pandas.DataFrame({"id": [1, 2, 3, 4], "name": ["sam", "sam", "peter", "jack"], "number": [8, 8, 8, 2]})
g = df.groupby(["name", "number"])
print g.groups

which gives:

{('jack', 2): [3], ('peter', 8): [2], ('sam', 8): [0, 1]}

to get number of unique entries per pair you can do:

for p in g.groups: 
    print p, " has ", len(g.groups[p]), " entries"

which gives:

('peter', 8)  has  1  entries
('jack', 2)  has  1  entries
('sam', 8)  has  2  entries

update:

the OP asked for result in dataframe. One way to get this is to use aggregate with the length function, which will return a dataframe with the number of unique entries per pair:

d = g.aggregate(len)
print d.reset_index().rename(columns={"id": "num_entries"})

gives:

    name  number  num_entries
0   jack       2           1
1  peter       8           1
2    sam       8           2

Python pandas: How to group by and count unique values based on multiple columns?

Tags:

python

pandas

unique

group-by

UserYmY

2 Answers

stedes

mvd

Recent Activity

Donate For Us

Python pandas: How to group by and count unique values based on multiple columns?

Tags:

python

pandas

unique

group-by

UserYmY

2 Answers

stedes

mvd

Related questions

Recent Activity

Donate For Us