I have datafarme df:
id name number
1 sam 76
2 sam 8
2 peter 8
4 jack 2
I would like to group by on 'id' column and count the number of unique values based on the pair of (name,number)?
id count(name-number)
1 1
2 2
4 1
I have tried this, but it does not work:
df.groupby('id')[('number','name')].nunique().reset_index()
You can just combine two groupby
s to get the desired result.
import pandas
df = pandas.DataFrame({"id": [1, 2, 2, 4], "name": ["sam", "sam", "peter", "jack"], "number": [8, 8, 8, 2]})
group = df.groupby(['id','name','number']).size().groupby(level=0).size()
The first groupby
will count the complete set of original combinations (and thereby make the columns you want to count unique). The second groupby
will count the unique occurences per the column you want (and you can use the fact that the first groupby
put that column in the index).
The result will be a Series. If you want to have DataFrame with the right column name (as you showed in your desired result) you can use the aggregate
function:
group = df.groupby(['id','name','number']).size().groupby(level=0).agg({'count(name-number':'size'})
You can do:
import pandas
df = pandas.DataFrame({"id": [1, 2, 3, 4], "name": ["sam", "sam", "peter", "jack"], "number": [8, 8, 8, 2]})
g = df.groupby(["name", "number"])
print g.groups
which gives:
{('jack', 2): [3], ('peter', 8): [2], ('sam', 8): [0, 1]}
to get number of unique entries per pair you can do:
for p in g.groups:
print p, " has ", len(g.groups[p]), " entries"
which gives:
('peter', 8) has 1 entries
('jack', 2) has 1 entries
('sam', 8) has 2 entries
update:
the OP asked for result in dataframe. One way to get this is to use aggregate
with the length function, which will return a dataframe with the number of unique entries per pair:
d = g.aggregate(len)
print d.reset_index().rename(columns={"id": "num_entries"})
gives:
name number num_entries
0 jack 2 1
1 peter 8 1
2 sam 8 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With