unique combinations of values in selected columns in pandas data frame and count

People also ask

How do I get unique column combinations in pandas?

You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.

How do I get unique values from multiple columns in a data frame?

Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.

How do I get the unique value of a column in a DataFrame?

To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e. It returns the count of unique elements in multiple columns.

How do you count unique values in a pandas series?

You can use the nunique() function to count the number of unique values in a pandas DataFrame.

You can groupby on cols 'A' and 'B' and call size and then reset_index and rename the generated column:

In [26]:

df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

update

A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size which returns the number of unique groups:

In[202]:
df1.groupby(['A','B']).size()

Out[202]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

So now to restore the grouped columns, we call reset_index:

In[203]:
df1.groupby(['A','B']).size().reset_index()

Out[203]: 
     A    B  0
0   no   no  1
1   no  yes  2
2  yes   no  4
3  yes  yes  3

This restores the indices but the size aggregation is turned into a generated column 0, so we have to rename this:

In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})

Out[204]: 
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

groupby does accept the arg as_index which we could have set to False so it doesn't make the grouped columns the index, but this generates a series and you'd still have to restore the indices and so on....:

In[205]:
df1.groupby(['A','B'], as_index=False).size()

Out[205]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

In Pandas 1.1.0 you can use the method value_counts with DataFrames:

df.value_counts() # or df[['A', 'B']].value_counts()

Result:

A    B
yes  no     4
     yes    3
no   yes    2
     no     1
dtype: int64

Convert index to columns and sort by value counts:

df.value_counts(ascending=True).reset_index(name='count')

Result:

     A    B  count
0   no   no      1
1   no  yes      2
2  yes  yes      3
3  yes   no      4

Slightly related, I was looking for the unique combinations and I came up with this method:

def unique_columns(df,columns):

    result = pd.Series(index = df.index)

    groups = meta_data_csv.groupby(by = columns)
    for name,group in groups:
       is_unique = len(group) == 1
       result.loc[group.index] = is_unique

    assert not result.isnull().any()

    return result

And if you only want to assert that all combinations are unique:

df1.set_index(['A','B']).index.is_unique

Related questions
                            
                                multiprocessing: How do I share a dict among multiple processes?
                            
                                How to print Unicode character in Python?
                            
                                Converting dict to OrderedDict
                            
                                Sorting a set of values
                            
                                In Python, how does one catch warnings as if they were exceptions?
                            
                                How to filter objects for count annotation in Django?
                            
                                Read a zipped file as a pandas DataFrame
                            
                                Modular multiplicative inverse function in Python
                            
                                In Python, how do I read the exif data for an image?
                            
                                Is there a way to delete created variables, functions, etc from the memory of the interpreter?
                            
                                How can I pass data from Flask to JavaScript in a template?
                            
                                How to get all subsets of a set? (powerset)
                            
                                How can I install a previous version of Python 3 in macOS using homebrew?
                            
                                How can I add the sqlite3 module to Python?
                            
                                Open document with default OS application in Python, both in Windows and Mac OS
                            
                                Python TypeError: not enough arguments for format string
                            
                                ImportError: cannot import name main when running pip --version command in windows7 32 bit
                            
                                How to prevent errno 32 broken pipe?
                            
                                Can a line of Python code know its indentation nesting level?
                            
                                How do I create a namespace package in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

unique combinations of values in selected columns in pandas data frame and count

Tags:

python

pandas

People also ask

Recent Activity

Donate For Us