Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set.pop() isn't random?

Tags:

python

random

From the python docs, "set.pop() remove and return an arbitrary element from s". While generating some random data to test a program, I noticed strange behavior of this pop() function. Here is my code (python 2.7.3):

testCases = 10
numberRange = 500

poppedValues = []
greaterPercentages = []

for i in range (testCases):
    s = Set()

    """ inserting 100 random values in the set, in the range [0, numberRange) """
    for j in range (100):
        s.add(random.randrange(numberRange)) 

    poppedValue = s.pop()
    greaterCount = 0

    """ counting how many numbers in the set are smaller then the popped value """
    for number in s:
        if poppedValue > number:
            greaterCount += 1

    poppedValues.append(poppedValue)
    greaterPercentages.append(float(greaterCount) / len(s) * 100)

for poppedValue in poppedValues:
    print poppedValue, '\t',

print

for percentage in greaterPercentages:
    print "{:2.2f}".format(percentage), '\t',

What I'm doing here is,

  1. Inserting some random values in the set s where each element is in the range [0, numberRange)
  2. Pop an element from the set (according to the docs, it should be a random one)
  3. Counting how many elements in the set are smaller then the popped value

I expected that the popped value should be a random one and about 50% of the numbers in the set will be greater then the popped value. But seems that pop() almost always returns the lowest number in the set. Here are the result for numberRange = 500. First row denotes the values of the popped element. Second row is the percentage of elements which are smaller then the popped value.

9   0   3   1   409     0   1   2   4   0   
0 % 0 % 0 % 0 % 87 %    0 % 0 % 0 % 0 % 0 %

I've conducted this test with different values of numberRange. It seems that for lower values of the set elements, pop() almost always returns the lowest element. But for higher values it returns a random element. For numberRange = 1000, the result is:

518     3586    3594    4103    2560    3087    4095    3079    3076    1622    
7 %     72 %    73 %    84 %    54 %    51 %    79 %    63 %    67 %    32 %

which I think is pretty random. Why this strange behavior? Am I doing something wrong?

EDIT: Thanks for everyone's answer and comment, seems that by "arbitrarily", it isn't guaranteed that it will be "random".

like image 469
Rafi Kamal Avatar asked Jan 09 '14 10:01

Rafi Kamal


People also ask

Is set Pop () random?

Python Set pop() The pop() method randomly removes an item from a set and returns the removed item.

Is set order random?

No, it is not random. It is "arbitrarily ordered", which means that you cannot depend on it being either ordered or random.

Why pop is not working in Python?

This is because you append 6 to the end or top of the stack / list / array . A pop command follows the last in first out rule which means when you pop an item from the stack / list /array you will be taking out the last item.

How do you remove random items from a set?

Python set pop() method removes a random element from the set when it is called. The pop() method removes any item anywhere from the set.


2 Answers

It's an implementation detail - set is implemented as a HashMap (similar to dict but without a slot for a value), set.pop removes the first entry in the HashMap, and an ints hash value is the same int.

Combined, this means that your set, which is ordered by the hash values, is actually ordered by the entries modulo hashtable size as well; this should be close to natural ordering in your case as you are only inserting numbers from a small range - if you take random numbers from randrange(10**10) instead of randrange(500) you should see a different behaviour. Also, depending on your insertion order, you can get some values out of their original hashing order due to hash collisions.

like image 111
l4mpi Avatar answered Sep 30 '22 18:09

l4mpi


When the doc says:

remove and return an arbitrary element from s; raises KeyError if empty

that means the behaviour isn't defined and the implementation can do whatever it's possible. In this case, it seems that the implemented behaviour is to remove the smallest value. That's all.
In fact, set.pop() is based on a HashMap and remove the first element of this (which is the smaller hashcode). In the case of set of ints, it's the smallest int.

On other implementation of Python could return a random value or the first pushed... You can't know.

like image 34
Maxime Lorant Avatar answered Sep 30 '22 16:09

Maxime Lorant