Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'in' operator functionality in python

I needed to remove the characters in string1 which are present in string2. Here string1 and string2 have only the lower case characters a-z with given condition that the length of string1 will be greater every time.

I was using the in operator:

def removeChars (string1, string2):
    for char in string2:
        if char in string1:
            string1 = string1.replace(char, '')
    return string1

But I read one answer on Stack Overflow which says:

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

Which means the in operator is using a for loop behind the scenes.

So my question is, in the for loop in my code, should I consider that nested for loops are being used, as the in operator is using a for loop in the background? If yes, what would be the time complexity of this program?

like image 647
Ram Shankar Kumar Avatar asked Jan 25 '23 06:01

Ram Shankar Kumar


2 Answers

in does not necessarily use loops behind the scenes. For example:

r = range(100000000000)
print(333 in r)  # prints True immediately without looping

If you were to loop r it will take quite a long time, so clearly that doesn't happen.

in basically calls (behind the scenes) the object's __contains__ method. For some iterators it will in fact "loop" through everything but that's not always the case.

This example is basically the same as calling:

r.__contains__(333)

As pointed out in comments - the str objects specifically have a smarter algorithm than just plain loops, as you can see here

Also see example answer here

And see the documentation here

Because the real world scenario would probably mean that string1 can be arbitrarily long, but the characters to be removed would be a finite and small set, it will probably be much more efficient to just add up all the characters that aren't in string2. Something like this:

def removeChars (string1, string2):
    result = ''
    for char in string1:
        if char not in string2:
            result += char
    return result

This will involve looping over string1 just once, but multiple checks against string2 using in. This can be further simplified (to avoid += loop over the result):

def removeChars (string1, string2):
    return ''.join(char for char in string1 if char not in string2)
like image 191
Ofer Sadan Avatar answered Feb 04 '23 09:02

Ofer Sadan


When in is equivalent to a loop, it has to 'walk' the iterator and compare each item in turn.

This means each loop is O(n), so for 2 levels this is O(n²)

https://wiki.python.org/moin/TimeComplexity

Note that you actually have 3 loops here - since your replace will also walk through the string.

Since replace doens't raise any errors if char isn't found, it is simpler to just do this without first testing char in string1.

like image 31
match Avatar answered Feb 04 '23 07:02

match