Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas 'count(distinct)' equivalent

I am using Pandas as a database substitute as I have multiple databases (Oracle, SQL Server, etc.), and I am unable to make a sequence of commands to a SQL equivalent.

I have a table loaded in a DataFrame with some columns:

YEARMONTH, CLIENTCODE, SIZE, etc., etc. 

In SQL, to count the amount of different clients per year would be:

SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH; 

And the result would be

201301    5000 201302    13245 

How can I do that in Pandas?

like image 659
Adriano Almeida Avatar asked Mar 14 '13 13:03

Adriano Almeida


People also ask

How do I count unique values in pandas?

You can use the nunique() function to count the number of unique values in a pandas DataFrame.

What is Nunique () in pandas?

Pandas DataFrame nunique() Method The nunique() method returns the number of unique values for each column. By specifying the column axis ( axis='columns' ), the nunique() method searches column-wise and returns the number of unique values for each row.

What does .values do in pandas?

The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.


Video Answer


1 Answers

I believe this is what you want:

table.groupby('YEARMONTH').CLIENTCODE.nunique() 

Example:

In [2]: table Out[2]:     CLIENTCODE  YEARMONTH 0           1     201301 1           1     201301 2           2     201301 3           1     201302 4           2     201302 5           2     201302 6           3     201302  In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique() Out[3]:  YEARMONTH 201301       2 201302       3 
like image 128
Dan Allan Avatar answered Sep 21 '22 11:09

Dan Allan