I think this is a fairly basic question, but I can't seem to find the solution.
I have a pandas dataframe similar to the following:
import pandas as pd df = pd.DataFrame({'A' : ['x','x','y','z','z'], 'B' : ['p','p','q','r','r']}) df
which creates a table like this:
A B 0 x p 1 x p 2 y q 3 z r 4 z r
I'm trying to create a table that represents the number of distinct values in that dataframe. So my goal is something like this:
A B c 0 x p 2 1 y q 1 2 z r 2
I can't find the correct functions to achieve this, though. I've tried:
df.groupby(['A','B']).agg('count')
This produces a table with 3 rows (as expected) but without a 'count' column. I don't know how to add in that count column. Could someone point me in the right direction?
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.
count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.
You can using size
df.groupby(['A','B']).size() Out[590]: A B x p 2 y q 1 z r 2 dtype: int64
For your solution adding one of the columns
df.groupby(['A','B']).B.agg('count') Out[591]: A B x p 2 y q 1 z r 2 Name: B, dtype: int64
Update :
df.groupby(['A','B']).B.agg('count').to_frame('c').reset_index() #df.groupby(['A','B']).size().to_frame('c').reset_index() Out[593]: A B c 0 x p 2 1 y q 1 2 z r 2
DataFrame.value_counts
This is an identical replacement for df.groupby(['A', 'B']).size()
.
df.value_counts(['A', 'B']) A B z r 2 x p 2 y q 1 dtype: int64
df.value_counts(['A', 'B']).reset_index(name='c') A B c 0 z r 2 1 x p 2 2 y q 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With