Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting unique values in a column in pandas dataframe like in Qlik?

People also ask

How do you count unique values in qlik sense?

You want to count unique values so for that you need to count by =Count(Distinct EMAIL) which gives count of unique values. and for duplicate values you need to count how much time it is repeated like =Count( EMAIL). Hi, use "Count" aggr function with "Distinct" for respective field.


Count distinct values, use nunique:

df['hID'].nunique()
5

Count only non-null values, use count:

df['hID'].count()
8

Count total values including null values, use the size attribute:

df['hID'].size
8

Edit to add condition

Use boolean indexing:

df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])

OR using query:

df.query('mID == "A"')['hID'].agg(['nunique','count','size'])

Output:

nunique    5
count      5
size       5
Name: hID, dtype: int64

If I assume data is the name of your dataframe, you can do :

data['race'].value_counts()

this will show you the distinct element and their number of occurence.


Or get the number of unique values for each column:

df.nunique()

dID    3
hID    5
mID    3
uID    5
dtype: int64

New in pandas 0.20.0 pd.DataFrame.agg

df.agg(['count', 'size', 'nunique'])

         dID  hID  mID  uID
count      8    8    8    8
size       8    8    8    8
nunique    3    5    3    5

You've always been able to do an agg within a groupby. I used stack at the end because I like the presentation better.

df.groupby('mID').agg(['count', 'size', 'nunique']).stack()


             dID  hID  uID
mID                       
A   count      5    5    5
    size       5    5    5
    nunique    3    5    5
B   count      2    2    2
    size       2    2    2
    nunique    2    2    2
C   count      1    1    1
    size       1    1    1
    nunique    1    1    1

You can use nunique in pandas:

df.hID.nunique()
# 5

To count unique values in column, say hID of dataframe df, use:

len(df.hID.unique())

I was looking for something similar and I found another way you may help you

  • If you want to count the number of null values, you could use this function:
def count_nulls(s):
    return s.size - s.count()
  • If you want to include NaN values in your unique counts, you need to pass dropna=False to the nunique function.
def unique_nan(s):
    return s.nunique(dropna=False)
  • Here is a summary of all the values together using the titanic dataset:
from scipy.stats import mode

agg_func_custom_count = {
    'embark_town': ['count', 'nunique', 'size', unique_nan, count_nulls, set]
}
df.groupby(['deck']).agg(agg_func_custom_count)

You can find more info Here