I am using Pandas as a database substitute as I have multiple databases (Oracle, SQL Server, etc.), and I am unable to make a sequence of commands to a SQL equivalent.
I have a table loaded in a DataFrame with some columns:
YEARMONTH, CLIENTCODE, SIZE, etc., etc.  In SQL, to count the amount of different clients per year would be:
SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;  And the result would be
201301    5000 201302    13245  How can I do that in Pandas?
You can use the nunique() function to count the number of unique values in a pandas DataFrame.
Pandas DataFrame nunique() Method The nunique() method returns the number of unique values for each column. By specifying the column axis ( axis='columns' ), the nunique() method searches column-wise and returns the number of unique values for each row.
The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.
I believe this is what you want:
table.groupby('YEARMONTH').CLIENTCODE.nunique()   Example:
In [2]: table Out[2]:     CLIENTCODE  YEARMONTH 0           1     201301 1           1     201301 2           2     201301 3           1     201302 4           2     201302 5           2     201302 6           3     201302  In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique() Out[3]:  YEARMONTH 201301       2 201302       3 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With