You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.
Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.
To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e. It returns the count of unique elements in multiple columns.
You can use the nunique() function to count the number of unique values in a pandas DataFrame.
You can groupby
on cols 'A' and 'B' and call size
and then reset_index
and rename
the generated column:
In [26]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
update
A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size
which returns the number of unique groups:
In[202]:
df1.groupby(['A','B']).size()
Out[202]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64
So now to restore the grouped columns, we call reset_index
:
In[203]:
df1.groupby(['A','B']).size().reset_index()
Out[203]:
A B 0
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
This restores the indices but the size aggregation is turned into a generated column 0
, so we have to rename this:
In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[204]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
groupby
does accept the arg as_index
which we could have set to False
so it doesn't make the grouped columns the index, but this generates a series
and you'd still have to restore the indices and so on....:
In[205]:
df1.groupby(['A','B'], as_index=False).size()
Out[205]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64
In Pandas 1.1.0 you can use the method value_counts
with DataFrames:
df.value_counts() # or df[['A', 'B']].value_counts()
Result:
A B
yes no 4
yes 3
no yes 2
no 1
dtype: int64
Convert index to columns and sort by value counts:
df.value_counts(ascending=True).reset_index(name='count')
Result:
A B count
0 no no 1
1 no yes 2
2 yes yes 3
3 yes no 4
Slightly related, I was looking for the unique combinations and I came up with this method:
def unique_columns(df,columns):
result = pd.Series(index = df.index)
groups = meta_data_csv.groupby(by = columns)
for name,group in groups:
is_unique = len(group) == 1
result.loc[group.index] = is_unique
assert not result.isnull().any()
return result
And if you only want to assert that all combinations are unique:
df1.set_index(['A','B']).index.is_unique
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With