I am trying to count the occurrence of the name in a name_list with letter 'i' as their second letter using a nested loop.
def print_count(names_list):
for name in names_list:
count = 0
for i in range(len(name)):
if name[i] == 'i' and i == 1:
count = count + 1
print(count)
names = ["Cody", "Baldassar", "Delilah", "Vinnie", "Leila", "Zac", "Aiden", "Zaynab"]
print_count(names)
My expected output should be: 2 but i got 0 instead.
Update
The fastest solution so far is @KellyBundy's idea of using a slice:
>>> len([s for s in names if s[1:2] == 'i'])
2
Original answer (twice slower!)
You can express that simply and efficiently:
>>> len([s for s in names if s[1:].startswith('i')])
2
Why?
The argument of len is a list comprehension. It is the original list, filtered by the condition "must have a second letter, and that second letter must be 'i'":
>>> [s for s in names if s[1:2] == 'i']
['Vinnie', 'Aiden']
But is it safe?
Q: "What if a word is empty or has only one letter? For sure s[1] would raise IndexError, right?"
A: It is safe. Yes, s[1] would raise if s is empty or contains a single char, but s[1:2] is just fine:
>>> 'foo'[1:2]
'o'
>>> 'f'[1:2]
''
>>> ''[1:2]
''
Variations and timings
# setup: generate a large list of random names
import numpy as np
n = 1_000_000
names = list(map(''.join, np.random.choice(list('abcdefghijkl'), (n, 10))))
# 1. ***current winner*** @KellyBundy's idea to use a slice
%timeit len([s for s in names if s[1:2] == 'i'])
# 71.8 ms ± 32.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 2. @Blckknght's suggestion, len of list comprehension version
%timeit len([s for s in names if len(s) > 1 and s[1] == 'i'])
# 98.2 ms ± 82.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 3. @Blckknght's suggestion, generator version
%timeit sum(len(s) > 1 and s[1] == 'i' for s in names)
# 105 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 4. @AndrejKesely's solution
%timeit sum(n[1] == "i" for n in names if len(n) > 1)
# 106 ms ± 77.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 5. original answer
%timeit len([s for s in names if s[1:].startswith('i')])
# 140 ms ± 821 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 6. generator
%timeit sum(1 for s in names if s[1:].startswith('i'))
# 141 ms ± 27.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 7. sum of booleans, as list comprehension
%timeit sum([s[1:].startswith('i') for s in names])
# 154 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 8. sum of booleans, as generator
%timeit sum(s[1:].startswith('i') for s in names)
#163 ms ± 544 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Notice how operating on a generator instead of on a list comprehension sometimes takes longer (# 3. vs # 2. and # 8. vs # 7.). That surprised me when I heard about it a few years ago.
Update (including @Blckknght and @AndrejKesely ideas)
Both of these solutions are about 40% faster than my initial code (kudos!).
Update 2 Including @KellyBundy's slice idea
That idea (new # 1.) gets us another 20% cut off of the previous winner (# 2. @Blckknght's suggestion together with using len of a list comprehension). It is the new overall winner. In my tests, I found that using a constant slice (slc = slice(1, 2) and s[slc]) is indistinguishable from the expression in # 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With