I have a Pandas dataframe and I want to find all the unique values in that dataframe...irrespective of row/columns. If I have a 10 x 10 dataframe, and suppose they have 84 unique values, I need to find them - Not the count. I can create a set and add the values of each rows by iterating over the rows of the dataframe. But, I feel that it may be inefficient (cannot justify that). Is there an efficient way to find it? Is there a predefined function?

<pre class="prettyprint"><code>In [1]: df = DataFrame(np.random.randint(0,10,size=100).reshape(10,10)) In [2]: df Out[2]: 0 1 2 3 4 5 6 7 8 9 0 2 2 3 2 6 1 9 9 3 3 1 1 2 5 8 5 2 5 0 6 3 2 0 7 0 7 5 5 9 1 0 3 3 5 3 2 3 7 6 8 3 8 4 4 8 0 2 2 3 9 7 1 2 7 5 3 2 8 5 6 4 3 7 0 8 6 4 2 6 5 3 3 4 5 3 2 7 7 6 0 6 6 7 1 7 5 1 8 7 4 3 1 0 6 9 7 7 3 9 5 3 4 5 2 0 8 6 4 7 In [13]: Series(df.values.ravel()).unique() Out[13]: array([9, 1, 4, 6, 0, 7, 5, 8, 3, 2]) </code></pre> Numpy unique sorts, so its faster to do it this way (and then sort if you need to) <pre class="prettyprint"><code>In [14]: df = DataFrame(np.random.randint(0,10,size=10000).reshape(100,100)) In [15]: %timeit Series(df.values.ravel()).unique() 10000 loops, best of 3: 137 ﾵs per loop In [16]: %timeit np.unique(df.values.ravel()) 1000 loops, best of 3: 270 ﾵs per loop </code></pre>

Find unique values in a Pandas dataframe, irrespective of row or column location

961

asked Nov 19 '13 23:11

user1717931

1 Answers

In [1]: df = DataFrame(np.random.randint(0,10,size=100).reshape(10,10))  In [2]: df Out[2]:     0  1  2  3  4  5  6  7  8  9 0  2  2  3  2  6  1  9  9  3  3 1  1  2  5  8  5  2  5  0  6  3 2  0  7  0  7  5  5  9  1  0  3 3  5  3  2  3  7  6  8  3  8  4 4  8  0  2  2  3  9  7  1  2  7 5  3  2  8  5  6  4  3  7  0  8 6  4  2  6  5  3  3  4  5  3  2 7  7  6  0  6  6  7  1  7  5  1 8  7  4  3  1  0  6  9  7  7  3 9  5  3  4  5  2  0  8  6  4  7  In [13]: Series(df.values.ravel()).unique() Out[13]: array([9, 1, 4, 6, 0, 7, 5, 8, 3, 2])

Numpy unique sorts, so its faster to do it this way (and then sort if you need to)

In [14]: df = DataFrame(np.random.randint(0,10,size=10000).reshape(100,100))  In [15]: %timeit Series(df.values.ravel()).unique() 10000 loops, best of 3: 137 ﾵs per loop  In [16]: %timeit np.unique(df.values.ravel()) 1000 loops, best of 3: 270 ﾵs per loop

184

answered Sep 29 '22 13:09

Jeff

Related questions
                            
                                Why is the size of 2⁶³ 36 bytes, but 2⁶³-1 is only 24 bytes?
                            
                                Get the Olson TZ name for the local timezone?
                            
                                using class methods as celery tasks
                            
                                NameError: global name 'execfile' is not defined trying to run an app on Google App Engine Launcher
                            
                                Python2: Should I use Pickle or cPickle?
                            
                                How do you skip over a list comprehension in Python's debugger (pdb)?
                            
                                What is the difference between a stack and a frame?
                            
                                Add new column in Pandas DataFrame Python [duplicate]
                            
                                deepcopy() is extremely slow
                            
                                Python: Using xpath locally / on a specific element
                            
                                Function returns None without return statement
                            
                                tag generation from a text content
                            
                                Python TypeError on regex [duplicate]
                            
                                Install pip in docker
                            
                                Is there a list of characters that look similar to English letters?
                            
                                Ignore by directory using Pylint
                            
                                Replace default handler of Python logger
                            
                                How to change marker border width and hatch width?
                            
                                Generic many-to-many relationships
                            
                                Weird timezone issue with pytz

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find unique values in a Pandas dataframe, irrespective of row or column location

Tags:

python

pandas

dataframe

user1717931

People also ask

1 Answers

Jeff

Recent Activity

Donate For Us