I have a big DF
with 10 millions
rows and I need to find the unique number for each column.
I wrote the function below: (need to return a series)
def count_unique_values(df):
return pd.Series(df.nunique())
and I get this output:
Area 210
Item 436
Element 4
Year 53
Unit 2
Value 313640
dtype: int64
expected result should be value 313641.
when I just do
df['Value'].unique()
I do get that answer. Didn't figure out why I get less with nunique()
just there.
The output of number of unique values is returned. In this example, length of array returned by unique() method is compared to integer returned by nunique() method. Output: The output is not same in both of the cases as dropna parameter is set to True and hence NULL values were excluded while counting unique values.
Pandas DataFrame nunique() Method The nunique() method returns the number of unique values for each column. By specifying the column axis ( axis='columns' ), the nunique() method searches column-wise and returns the number of unique values for each row.
You can use the nunique() function to count the number of unique values in a pandas DataFrame.
As I've already mentioned dataframe columns are essentially Pandas Series objects. If you want to use the unique() method on a dataframe column, you can do so as follows: Type the name of the dataframe, then use “dot syntax” and type the name of the column. Then use dot syntax to call the unique() method.
Because DataFrame.nunique
omit missing values, because default parameter dropna=True
, Series.unique
function not.
Sample:
df = pd.DataFrame({
'A':list('abcdef'),
'D':[np.nan,3,5,5,3,5],
})
print (df)
A D
0 a NaN
1 b 3.0
2 c 5.0
3 d 5.0
4 e 3.0
5 f 5.0
def count_unique_values(df):
return df.nunique()
print (count_unique_values(df))
A 6
D 2
dtype: int64
print (df['D'].unique())
[nan 3. 5.]
print (df['D'].nunique())
2
print (df['D'].unique())
[nan 3. 5.]
Solution is add parameter dropna=False
:
print (df['D'].nunique(dropna=False))
3
print (df['D'].unique())
3
So in your function:
def count_unique_values(df):
return df.nunique(dropna=False)
print (count_unique_values(df))
A 6
D 3
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With