Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Fastest way to find a string in an enumeration

Parsed the IANA subtag (see Cascaded string split, pythonic way) and made a list of 8600 tags:

tags= ['aa',
       'ab',
       'ae',
       'af',
       'ak',
       'am',
       'an',
       'ar',
       # ...

I want to check for example mytag="ro" if is in the list: what is the fastest way to do that:

First solution:

if mytag in tags:
    print "found"

Second solution:

if mytag in Set(tags):
    print "found"

Third solution: Transform the list in a big string like: '|aa|ab|ae|af|ak|am|an|ar|...' and then see if string is in another string:

tags = '|aa|ab|ae|af|ak|am|an|ar|...'
if mytag in tags:
    print "found"

Is there another way? Which is the fastest, is this already measured, if not how can I benchmark myself (shoul I take a random element from the list or should I take the last and then test it, can someone provide python code for a 'chronometer')?

like image 974
Eduard Florinescu Avatar asked Dec 15 '22 19:12

Eduard Florinescu


2 Answers

As I don't have access to the original string, any test would be biased. However, you asked for a chronometer ? Check the timeit module, designed to time some code snippets.

Note that if you use IPython, %timeit is a magic function that makes it a breeze to time the execution of a function, as illustrated below.

Some comments

  • you should replace Set by set...
  • construct your set and long string before running any test
  • Taking a random element from your tags list is the way to go indeed.

As an example of use of %timeit in IPython:

tags = ['aa','ab','ae','af','ak','an','ar']
tags_set = set(tags)
tags_str = "|".join(tags)

%timeit 'ro' in tags
1000000 loops, best of 3: 223 ns per loop
%timeit 'ro' in tags_set
1000000 loops, best of 3: 73.5 ns per loop
%timeit 'ro' in tags_str
1000000 loops, best of 3: 98.1 ns per loop
like image 184
Pierre GM Avatar answered Mar 27 '23 20:03

Pierre GM


Not related to timings or performance, but you may be able to not worry about this kind of thing earlier on by structuring the data differently.

Looking at your previous post, the answer you accepted contained a function iana_parse that yielded a dict. So, if you know what you're looking for pre-parse time, then you could do:

looking_for = {'ro', 'xx', 'yy', 'zz'}
for res in iana_parse(data): # from previous post
    if res['Subtag'] in looking_for:
        print res['Subtag'], 'was found'

Otherwise (or in combination with), you could build a dict from that function and use that:

subtag_lookup = {rec['Subtag']:rec for rec in iana_parse(data)}

ro = subtag_lookup['ro']
print ro['Description']

At some point if you did just want a list of Subtags, then use:

subtags = list(subtag_lookup)
like image 40
Jon Clements Avatar answered Mar 27 '23 20:03

Jon Clements