I'd like to find the index of the first occurrence of any “special” character in a string, like so:
>>> "Hello world!".index([' ', '!'])
5
…except that's not valid Python syntax. Of course, I can write a function that emulates this behavior:
def first_index(s, characters):
i = []
for c in characters:
try:
i.append(s.index(c))
except ValueError:
pass
if not i:
raise ValueError
return min(i)
I could also use regular expressions, but both solutions seem to be a bit overkill. Is there any “sane” way to do this in Python?
You can use enumerate and next with a generator expression, getting the first match or returning None if no character appears in s:
s = "Hello world!"
st = {"!"," "}
ind = next((i for i, ch in enumerate(s) if ch in st),None)
print(ind)
You can pass any value you want to next as a default return value if there is no match.
If you want to use a function and raise a ValueError:
def first_index(s, characters):
st = set(characters)
ind = next((i for i, ch in enumerate(s) if ch in st), None)
if ind is not None:
return ind
raise ValueError
For smaller inputs using a set won't make much if any difference but for large strings it will be a more efficient.
Some timings:
In the string, last character of character set:
In [40]: s = "Hello world!" * 100
In [41]: string = s
In [42]: %%timeit
st = {"x","y","!"}
next((i for i, ch in enumerate(s) if ch in st), None)
....:
1000000 loops, best of 3: 1.71 µs per loop
In [43]: %%timeit
specials = ['x', 'y', '!']
min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
....:
100000 loops, best of 3: 2.64 µs per loop
Not in the string, larger character set:
In [44]: %%timeit
st = {"u","v","w","x","y","z"}
next((i for i, ch in enumerate(s) if ch in st), None)
....:
1000000 loops, best of 3: 1.49 µs per loop
In [45]: %%timeit
specials = ["u","v","w","x","y","z"]
min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
....:
100000 loops, best of 3: 5.48 µs per loop
In string an very first character of character set:
In [47]: %%timeit
specials = ['H', 'y', '!']
min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
....:
100000 loops, best of 3: 2.02 µs per loop
In [48]: %%timeit
st = {"H","y","!"}
next((i for i, ch in enumerate(s) if ch in st), None)
....:
1000000 loops, best of 3: 903 ns per loop
I would favour the re
module as it's built in and already tested. It's also optimised for exactly this kind of thing.
>>> import re
>>> re.search(r'[ !]', 'Hello World!').start()
5
You probably want to check that a match was found though or catch the exception when it's not.
There are reasons for not using re
, but I would want to see a good comment justifying the rational. Thinking that you can "do it better" is generally unnecessary, makes it harder for others to read the code and is less maintainable.
Use a gen-exp and the find
method.
>>> a = [' ', '!']
>>> s = "Hello World!"
>>> min(s.find(i) for i in a)
5
To remove -1
s if they occur, you can have a filter within the list comp
>>> a = [' ', '!','$']
>>> s = "Hello World!"
>>> min(s.find(i) for i in a if i in s)
5
or you can substitute None
>>> min(s.find(i) if i in s else None for i in a)
5
Adding the timeit
results
$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';min(s.find(i) for i in a if i in s)"
1000000 loops, best of 3: 0.902 usec per loop
$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';next((i for i, ch in enumerate(s) if ch in a),None)"
1000000 loops, best of 3: 1.25 usec per loop
$ python -m timeit "a = [' ', '\!'];s = 'Hello World\!';min(map(lambda x: (s.index(x) if (x in s) else len(s)), a))"
1000000 loops, best of 3: 1.12 usec per loop
In your Example case, Padraic's beautiful solution is a little slow. However in large test cases, It is definitely a winner. (It's a little surprising that alfasin's "Not as optimized" is also faster here)
Added the implementation details
>>> def take1(s,a):
... min(s.find(i) for i in a if i in s)
...
>>> import dis
>>> dis.dis(take1)
2 0 LOAD_GLOBAL 0 (min)
3 LOAD_CLOSURE 0 (s)
6 BUILD_TUPLE 1
9 LOAD_CONST 1 (<code object <genexpr> at 0x7fa622e961b0, file "<stdin>", line 2>)
12 MAKE_CLOSURE 0
15 LOAD_FAST 1 (a)
18 GET_ITER
19 CALL_FUNCTION 1
22 CALL_FUNCTION 1
25 POP_TOP
26 LOAD_CONST 0 (None)
29 RETURN_VALUE
>>> def take2(s,a):
... next((i for i, ch in enumerate(s) if ch in a),None)
...
>>> dis.dis(take2)
2 0 LOAD_GLOBAL 0 (next)
3 LOAD_CLOSURE 0 (a)
6 BUILD_TUPLE 1
9 LOAD_CONST 1 (<code object <genexpr> at 0x7fa622e96e30, file "<stdin>", line 2>)
12 MAKE_CLOSURE 0
15 LOAD_GLOBAL 1 (enumerate)
18 LOAD_FAST 0 (s)
21 CALL_FUNCTION 1
24 GET_ITER
25 CALL_FUNCTION 1
28 LOAD_CONST 0 (None)
31 CALL_FUNCTION 2
34 POP_TOP
35 LOAD_CONST 0 (None)
38 RETURN_VALUE
>>> def take3(s,a):
... min(map(lambda x: (s.index(x) if (x in s) else len(s)), a))
...
>>> dis.dis(take3)
2 0 LOAD_GLOBAL 0 (min)
3 LOAD_GLOBAL 1 (map)
6 LOAD_CLOSURE 0 (s)
9 BUILD_TUPLE 1
12 LOAD_CONST 1 (<code object <lambda> at 0x7fa622e44eb0, file "<stdin>", line 2>)
15 MAKE_CLOSURE 0
18 LOAD_FAST 1 (a)
21 CALL_FUNCTION 2
24 CALL_FUNCTION 1
27 POP_TOP
28 LOAD_CONST 0 (None)
31 RETURN_VALUE
As you can clearly see in Padraic's case the loading of the global functions next
and enumerate
are the ones which are killing the time along with None
at the end. In alfasin's solution the primary slowing down is the lambda
function.
Not as optimized as Padraic Cunningham's solution, but still a one liner:
string = "Hello world!"
specials = [' ', '!', 'x']
min(map(lambda x: (string.index(x) if (x in string) else len(string)), specials))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With