Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find unique values in a Pandas dataframe, irrespective of row or column location

I have a Pandas dataframe and I want to find all the unique values in that dataframe...irrespective of row/columns. If I have a 10 x 10 dataframe, and suppose they have 84 unique values, I need to find them - Not the count.

I can create a set and add the values of each rows by iterating over the rows of the dataframe. But, I feel that it may be inefficient (cannot justify that). Is there an efficient way to find it? Is there a predefined function?

like image 961
user1717931 Avatar asked Nov 19 '13 23:11

user1717931


People also ask

How do I display only unique values in pandas DataFrame?

You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.

How do I get unique values in multiple columns in pandas?

Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.

How do I get a list of unique values in pandas?

The easiest way to obtain a list of unique values in a pandas DataFrame column is to use the unique() function.

How can I get unique values of a column in pandas with Count?

To get a count of unique values in a column use pandas, first use Series. unique() function to get unique values from column by removing duplidate values and then call the size to get the count. unique() function returns a ndarray with unique value in order of appearance and the results are not sorted.


1 Answers

In [1]: df = DataFrame(np.random.randint(0,10,size=100).reshape(10,10))  In [2]: df Out[2]:     0  1  2  3  4  5  6  7  8  9 0  2  2  3  2  6  1  9  9  3  3 1  1  2  5  8  5  2  5  0  6  3 2  0  7  0  7  5  5  9  1  0  3 3  5  3  2  3  7  6  8  3  8  4 4  8  0  2  2  3  9  7  1  2  7 5  3  2  8  5  6  4  3  7  0  8 6  4  2  6  5  3  3  4  5  3  2 7  7  6  0  6  6  7  1  7  5  1 8  7  4  3  1  0  6  9  7  7  3 9  5  3  4  5  2  0  8  6  4  7  In [13]: Series(df.values.ravel()).unique() Out[13]: array([9, 1, 4, 6, 0, 7, 5, 8, 3, 2]) 

Numpy unique sorts, so its faster to do it this way (and then sort if you need to)

In [14]: df = DataFrame(np.random.randint(0,10,size=10000).reshape(100,100))  In [15]: %timeit Series(df.values.ravel()).unique() 10000 loops, best of 3: 137 ᄉs per loop  In [16]: %timeit np.unique(df.values.ravel()) 1000 loops, best of 3: 270 ᄉs per loop 
like image 184
Jeff Avatar answered Sep 29 '22 13:09

Jeff