Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiency of len() and pop() in Python

Why is this Significantly faster with comments? Shouldn't a pop, a comparison, and a length check be O(1)? Would that significantly affect the speed?

#! /usr/bin/python                                                                
import math                                                                       

pmarbs = []                                                                       
pows = 49                                                                         
pmarbs.append("W")                                                                
inc = 1                                                                           
for i in range(pows):                                                             
    count = 0                                                                     
    j = 0                                                                         
    ran = int(pow(2, i))                                                          
    marker = len(pmarbs) - inc                                                    
    while (j < ran):                                                              
        #potential marble choice                                                  
        pot = pmarbs[marker - j]                                                  
        pot1 = pot + "W"                                                          
        pot2 = pot + "B"                                                          

        if (pot2.count('W') < pot2.count('B')) and (len(pot2) > (i+1)):           
            count += 1                                                            
        else:                                                                     
            pmarbs.append(pot2)                                                   

        pmarbs.append(pot1)                                                       
#        if(len(pmarbs[0]) < i):                                                  
#           pmarbs.pop(0)                                                         
#           marker -= 1                                                           
        j += 1                                                                    


    if (count != 0):                                                              
        print(count)                                                              
        print("length of pmarbs = %s" % len(pmarbs)) 

UPDATE: I'm making the question shorter, because the code being significantly slower was my question. I cared less about the code getting killed at runtime.

like image 395
Derek Halden Avatar asked Oct 18 '13 04:10

Derek Halden


2 Answers

list.pop(index) is an O(n) operation, because after you remove the value from the list, you have to shift the memory location of every other value in the list over one. Calling pop repeatedly on large lists is great way to waste computing cycles. If you absolutely must remove from the front of a large list over and over use collections.deque, which will give you much faster insertions and deletions to thr front.

len() is O(1) because deletions are O(n), since if you make sure all the values in a list are allocated in memory right next to each other, the total length of a list is just the tail's memory location - the head's memory location. If you don't care about the performance of len() and similar operations, then you can use a linked list to do constant time insertions and deletions - that just makes len() be O(n) and pop() be O(1) (and you get some other funky stuff like O(n) lookups).

Everything I said about pop() goes for insert() also - except for append(), which usually takes O(1).

I recently worked on a problem that required deleting lots of elements from a very large list (around 10,000,000 integers) and my initial dumb implementation just used pop() every time I needed to delete something - that turned out to not work at all, because it took O(n) to do even one cycle of the algorithm, which itself needed to n times.

My solution was to create a set() called ignore in which I kept the indices of all "deleted" elements. I had little helper functions to help me not have to think about skipping these, so my algorithm didn't get too ugly. What eventually did it was doing a single O(n) pass every 10,000 iterations to delete all the elements in ignore and make ignore empty again, that way I got the increased performance from a shrinking list while only having to do one 10,000th of the work for my deletions.

Also, ya, you should get a memory error because you are trying to allocate a list that is definitely much larger than your hard drive - much less your memory.

like image 129
reem Avatar answered Sep 25 '22 01:09

reem


Just to answer part of the question: popping from the end (the right end) of a list takes constant time in CPython, but popping from the left end (.pop(0)) takes time proportional to the length of the list: all the elements in the_list[1:] are physically moved one position to the left.

If you need to delete index position 0 frequently, much better to use an instance of collections.deque. Deques support efficient pushing and popping from both ends.

BTW, when I run the program, I get a clean exception:

...
length of pmarbs = 8306108
Traceback (most recent call last):
  File "xxx.py", line 22, in <module>
    pmarbs.append(pot2)
MemoryError

That happened to be on a 32-bit Windows box. And it doesn't surprise me ;-)

like image 31
Tim Peters Avatar answered Sep 26 '22 01:09

Tim Peters