Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

index of second repeated character in a string

Tags:

python

string

I am trying a hangman code in python. For matching a character of a word , iam using index function to get the location of character. Ex :word = 'COMPUTER'

user_input = raw_input('Enter a character :') # say 'T; is given here

if user_input in word:
                print "\nThe Character %c is present in the word \n" %user_input 
                word_dict[word.index(user_input)] = user_input

#so the output will looks like

{0: '_', 1: '_', 2: '_', 3: '_', 4: '_', 5: 'T', 6: '_', 7: '_'} 

Now , my problems comes when it comes with the repeated character.

# Another example 
>>> 'CARTOON'.index('O')
4

For the second 'O', how to get its index. since i have used this 'index' logic, i am looking to continue on this way.

like image 332
rajpython Avatar asked Jan 21 '26 09:01

rajpython


2 Answers

As per the str.index docs, signature looks like this

str.index(sub[, start[, end]])

The second parameter is the starting index to search from. So you can pass the index which you got for the first item + 1, to get the next index.

i = 'CARTOON'.index('O')
print 'CARTOON'.index('O', i + 1)

Output

5

The above code can be written like this

data = 'CARTOON'
print data.index('O', data.index('O') + 1)

You can even have this as a utility function, like this

def get_second_index(input_string, sub_string):
    return input_string.index(sub_string, input_string.index(sub_string) + 1)

print get_second_index("CARTOON", "O")

Note: If the string is not found atleast twice, this will throw ValueError.

The more generalized way,

def get_index(input_string, sub_string, ordinal):
    current = -1
    for i in range(ordinal):
        current = input_string.index(sub_string, current + 1)
    else:
        raise ValueError("ordinal {} - is invalid".format(ordinal))
    return current

print get_index("AAABBBCCCC", "C", 4)
like image 94
thefourtheye Avatar answered Jan 23 '26 22:01

thefourtheye


A perhaps more pythonic method would be to use a generator, thus avoiding the intermediate array 'found':

def find_indices_of(char, in_string):
    index = -1
    while True:
        index = in_string.find(char, index + 1)
        if index == -1:
            break
        yield index

for i in find_indices_of('x', 'axccxx'):
    print i

1
4
5

An alternative would be the enumerate built-in

def find_indices_of_via_enumerate(char, in_string):
    return (index for index, c in enumerate(in_string) if char == c)

This also uses a generator.

I then got curious as to perf differences. I'm a year into using python, so I'm only beginning to feel truly knowledgeable. Here's a quick test, with various types of data:

test_cases = [
    ('x', ''),
    ('x', 'axxxxxxxxxxxx'),
    ('x', 'abcdefghijklmnopqrstuvw_yz'),
    ('x', 'abcdefghijklmnopqrstuvw_yzabcdefghijklmnopqrstuvw_yzabcdefghijklmnopqrstuvw_yzabcdefghijklmnopqrstuvwxyz'),
]

for test_case in test_cases:
    print "('{}', '{}')".format(*test_case)

    print "string.find:", timeit.repeat(
        "[i for i in find_indices_of('{}', '{}')]".format(*test_case),
        "from __main__ import find_indices_of",
    )
    print "enumerate  :", timeit.repeat(
        "[i for i in find_indices_of_via_enumerate('{}', '{}')]".format(*test_case),
        "from __main__ import find_indices_of_via_enumerate",
    )
    print

Which, on my machine results in these timings:

('x', '')
string.find: [0.6248660087585449, 0.6235580444335938, 0.6264920234680176]
enumerate  : [0.9158611297607422, 0.9153609275817871, 0.9118690490722656]

('x', 'axxxxxxxxxxxx')
string.find: [6.01502799987793, 6.077538013458252, 5.997750997543335]
enumerate  : [3.595151901245117, 3.5859270095825195, 3.597352981567383]

('x', 'abcdefghijklmnopqrstuvw_yz')
string.find: [0.6462750434875488, 0.6512351036071777, 0.6495819091796875]
enumerate  : [2.6581480503082275, 2.6216518878936768, 2.6187551021575928]

('x', 'abcdefghijklmnopqrstuvw_yzabcdefghijklmnopqrstuvw_yzabcdefghijklmnopqrstuvw_yzabcdefghijklmnopqrstuvwxyz')
string.find: [1.2539417743682861, 1.2511990070343018, 1.2702908515930176]
enumerate  : [7.837890863418579, 7.791800022125244, 7.9181809425354]

enumerate() method is more expressive, pythonic. Whether or not perf differences matter depends on the actual use cases.

like image 29
Potrebic Avatar answered Jan 23 '26 23:01

Potrebic



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!