Python: Fastest way to find a string in an enumeration

Question

Parsed the IANA subtag (see Cascaded string split, pythonic way) and made a list of 8600 tags:

tags= ['aa',
       'ab',
       'ae',
       'af',
       'ak',
       'am',
       'an',
       'ar',
       # ...

I want to check for example mytag="ro" if is in the list: what is the fastest way to do that:

First solution:

if mytag in tags:
    print "found"

Second solution:

if mytag in Set(tags):
    print "found"

Third solution: Transform the list in a big string like: '|aa|ab|ae|af|ak|am|an|ar|...' and then see if string is in another string:

tags = '|aa|ab|ae|af|ak|am|an|ar|...'
if mytag in tags:
    print "found"

Is there another way? Which is the fastest, is this already measured, if not how can I benchmark myself (shoul I take a random element from the list or should I take the last and then test it, can someone provide python code for a 'chronometer')?

Pierre GM · Accepted Answer

As I don't have access to the original string, any test would be biased. However, you asked for a chronometer ? Check the timeit module, designed to time some code snippets.

Note that if you use IPython, %timeit is a magic function that makes it a breeze to time the execution of a function, as illustrated below.

Some comments

you should replace Set by set...
construct your set and long string before running any test
Taking a random element from your tags list is the way to go indeed.

As an example of use of %timeit in IPython:

tags = ['aa','ab','ae','af','ak','an','ar']
tags_set = set(tags)
tags_str = "|".join(tags)

%timeit 'ro' in tags
1000000 loops, best of 3: 223 ns per loop
%timeit 'ro' in tags_set
1000000 loops, best of 3: 73.5 ns per loop
%timeit 'ro' in tags_str
1000000 loops, best of 3: 98.1 ns per loop

Jon Clements · Answer

Not related to timings or performance, but you may be able to not worry about this kind of thing earlier on by structuring the data differently.

Looking at your previous post, the answer you accepted contained a function iana_parse that yielded a dict. So, if you know what you're looking for pre-parse time, then you could do:

looking_for = {'ro', 'xx', 'yy', 'zz'}
for res in iana_parse(data): # from previous post
    if res['Subtag'] in looking_for:
        print res['Subtag'], 'was found'

Otherwise (or in combination with), you could build a dict from that function and use that:

subtag_lookup = {rec['Subtag']:rec for rec in iana_parse(data)}

ro = subtag_lookup['ro']
print ro['Description']

At some point if you did just want a list of Subtags, then use:

subtags = list(subtag_lookup)

Python: Fastest way to find a string in an enumeration

Tags:

python

string

enumeration

search

python-2.7

Eduard Florinescu

2 Answers

Pierre GM

Jon Clements

Recent Activity

Donate For Us

Python: Fastest way to find a string in an enumeration

Tags:

python

string

enumeration

search

python-2.7

Eduard Florinescu

2 Answers

Pierre GM

Jon Clements

Related questions

Recent Activity

Donate For Us