Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the first index of any of a set of characters in a string

I'd like to find the index of the first occurrence of any “special” character in a string, like so:

>>> "Hello world!".index([' ', '!'])
5

…except that's not valid Python syntax. Of course, I can write a function that emulates this behavior:

def first_index(s, characters):
    i = []
    for c in characters:
        try:
            i.append(s.index(c))
        except ValueError:
            pass
    if not i:
        raise ValueError
    return min(i)

I could also use regular expressions, but both solutions seem to be a bit overkill. Is there any “sane” way to do this in Python?

like image 844
user3426575 Avatar asked May 03 '15 22:05

user3426575


4 Answers

You can use enumerate and next with a generator expression, getting the first match or returning None if no character appears in s:

s = "Hello world!"

st = {"!"," "}
ind = next((i for i, ch  in enumerate(s) if ch in st),None)
print(ind)

You can pass any value you want to next as a default return value if there is no match.

If you want to use a function and raise a ValueError:

def first_index(s, characters):
    st = set(characters)
    ind = next((i for i, ch in enumerate(s) if ch in st), None)
    if ind is not None:
        return ind
    raise ValueError

For smaller inputs using a set won't make much if any difference but for large strings it will be a more efficient.

Some timings:

In the string, last character of character set:

In [40]: s = "Hello world!" * 100    
In [41]: string = s    
In [42]: %%timeit
st = {"x","y","!"}
next((i for i, ch in enumerate(s) if ch in st), None)
   ....: 
1000000 loops, best of 3: 1.71 µs per loop    
In [43]: %%timeit
specials = ['x', 'y', '!']
min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
   ....: 
100000 loops, best of 3: 2.64 µs per loop

Not in the string, larger character set:

In [44]: %%timeit
st = {"u","v","w","x","y","z"}
next((i for i, ch in enumerate(s) if ch in st), None)
   ....: 
1000000 loops, best of 3: 1.49 µs per loop

In [45]: %%timeit
specials = ["u","v","w","x","y","z"]
min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
   ....: 
100000 loops, best of 3: 5.48 µs per loop

In string an very first character of character set:

In [47]: %%timeit
specials = ['H', 'y', '!']
min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
   ....: 
100000 loops, best of 3: 2.02 µs per loop

In [48]: %%timeit
st = {"H","y","!"}
next((i for i, ch in enumerate(s) if ch in st), None)
   ....: 
1000000 loops, best of 3: 903 ns per loop
like image 200
Padraic Cunningham Avatar answered Oct 10 '22 05:10

Padraic Cunningham


I would favour the re module as it's built in and already tested. It's also optimised for exactly this kind of thing.

>>> import re
>>> re.search(r'[ !]', 'Hello World!').start()
5

You probably want to check that a match was found though or catch the exception when it's not.

There are reasons for not using re, but I would want to see a good comment justifying the rational. Thinking that you can "do it better" is generally unnecessary, makes it harder for others to read the code and is less maintainable.

like image 45
Peter Gibson Avatar answered Oct 10 '22 04:10

Peter Gibson


Use a gen-exp and the find method.

>>> a = [' ', '!']
>>> s = "Hello World!"
>>> min(s.find(i) for i in a)
5

To remove -1s if they occur, you can have a filter within the list comp

>>> a = [' ', '!','$']
>>> s = "Hello World!"
>>> min(s.find(i) for i in a if i in s)
5

or you can substitute None

>>> min(s.find(i) if i in s else None for i in a)
5

Adding the timeit results

$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';min(s.find(i) for i in a if i in s)"
1000000 loops, best of 3: 0.902 usec per loop
$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';next((i for i, ch  in enumerate(s) if ch in a),None)"
1000000 loops, best of 3: 1.25 usec per loop
$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';min(map(lambda x: (s.index(x) if (x in s) else len(s)), a))"
1000000 loops, best of 3: 1.12 usec per loop

In your Example case, Padraic's beautiful solution is a little slow. However in large test cases, It is definitely a winner. (It's a little surprising that alfasin's "Not as optimized" is also faster here)

Added the implementation details

>>> def take1(s,a):
...     min(s.find(i) for i in a if i in s)
... 
>>> import dis
>>> dis.dis(take1)
  2           0 LOAD_GLOBAL              0 (min)
              3 LOAD_CLOSURE             0 (s)
              6 BUILD_TUPLE              1
              9 LOAD_CONST               1 (<code object <genexpr> at 0x7fa622e961b0, file "<stdin>", line 2>)
             12 MAKE_CLOSURE             0
             15 LOAD_FAST                1 (a)
             18 GET_ITER            
             19 CALL_FUNCTION            1
             22 CALL_FUNCTION            1
             25 POP_TOP             
             26 LOAD_CONST               0 (None)
             29 RETURN_VALUE        
>>> def take2(s,a):
...     next((i for i, ch  in enumerate(s) if ch in a),None)
... 
>>> dis.dis(take2)
  2           0 LOAD_GLOBAL              0 (next)
              3 LOAD_CLOSURE             0 (a)
              6 BUILD_TUPLE              1
              9 LOAD_CONST               1 (<code object <genexpr> at 0x7fa622e96e30, file "<stdin>", line 2>)
             12 MAKE_CLOSURE             0
             15 LOAD_GLOBAL              1 (enumerate)
             18 LOAD_FAST                0 (s)
             21 CALL_FUNCTION            1
             24 GET_ITER            
             25 CALL_FUNCTION            1
             28 LOAD_CONST               0 (None)
             31 CALL_FUNCTION            2
             34 POP_TOP             
             35 LOAD_CONST               0 (None)
             38 RETURN_VALUE        
>>> def take3(s,a):
...     min(map(lambda x: (s.index(x) if (x in s) else len(s)), a))
... 
>>> dis.dis(take3)
  2           0 LOAD_GLOBAL              0 (min)
              3 LOAD_GLOBAL              1 (map)
              6 LOAD_CLOSURE             0 (s)
              9 BUILD_TUPLE              1
             12 LOAD_CONST               1 (<code object <lambda> at 0x7fa622e44eb0, file "<stdin>", line 2>)
             15 MAKE_CLOSURE             0
             18 LOAD_FAST                1 (a)
             21 CALL_FUNCTION            2
             24 CALL_FUNCTION            1
             27 POP_TOP             
             28 LOAD_CONST               0 (None)
             31 RETURN_VALUE        

As you can clearly see in Padraic's case the loading of the global functions next and enumerate are the ones which are killing the time along with None at the end. In alfasin's solution the primary slowing down is the lambda function.

like image 34
Bhargav Rao Avatar answered Oct 10 '22 05:10

Bhargav Rao


Not as optimized as Padraic Cunningham's solution, but still a one liner:

string = "Hello world!"
specials = [' ', '!', 'x']
min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
like image 31
Nir Alfasi Avatar answered Oct 10 '22 03:10

Nir Alfasi