Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to perform string lookups?

Assuming we have a certain number of possible strings:

possible_strings_list = ['foo', 'bar', 'baz', 'qux', 'spam', 'ham', 'eggs'] 

and receive new strings that are known to be one of them. We want to assign an integer to every new string, for example

if new_string == 'foo':
    return 0
elif new_string == 'bar':
    return 1
...

What is the fastest way to do that in Python 3.6? I tried several ways and using a dictionary is the fastest so far:

list_index 2.7494255019701086
dictionary 0.9412809460191056
if_elif_else 2.10705983400112
lambda_function 2.6321219780365936
tupple_index 2.751029207953252
ternary 1.931659944995772
np_where 15.610908019007184

However I am more or less a Python newbie and I'm interested if there are other and faster solutions. Do you have any suggestions?

My complete testig code:

import timeit
import random
import numpy as np

def list_index(i):
    return(possible_strings_list.index(i))

def dictionary(i):
    return possible_strings_dict[i]

def tupple_index(i):
    return possible_strings_tup.index(i)


def if_elif_else(i):
    if i == 'foo':
        return 1
    elif i == 'bar':
        return 2
    elif i == 'baz':
        return 3
    elif i == 'qux':
        return 4
    elif i == 'spam':
        return 5
    elif i == 'ham':
        return 6
    elif i == 'eggs':
        return 7

def ternary(i):
    return 0 if i == 'foo' else 1 if i == 'baz' else 2 if i == 'bar' else 3 if i == 'qux' else 4 if i == 'spam'else 5 if i == 'ham' else 6

n = lambda i: 0 if i == 'foo' else 1 if i == 'baz' else 2 if i == 'bar' else 3 if i == 'qux' else 4 if i == 'spam'else 5 if i == 'ham' else 6
def lambda_function(i):
    return n(i)

def np_where(i):
    return np.where(possible_strings_array == i)[0][0]

##
def check(function):
    for i in testlist:
        function(i)

possible_strings_list = ['foo', 'bar', 'baz', 'qux', 'spam', 'ham', 'eggs']
testlist = [random.choice(possible_strings_list) for i in range(1000)]
possible_strings_dict = {'foo':0, 'bar':1, 'baz':2, 'qux':3, 'spam':4, 'ham':5, 'eggs':6}
possible_strings_tup = ('foo', 'bar', 'baz', 'qux', 'spam', 'ham', 'eggs')
allfunctions = [list_index, dictionary, if_elif_else, lambda_function, tupple_index, ternary, np_where]

for function in allfunctions:
    t = timeit.Timer(lambda: check(function))
    print(function.__name__, t.timeit(number=10000))
like image 281
elpunkt Avatar asked Feb 15 '18 22:02

elpunkt


1 Answers

The dictionary lookup is the fastest possible way to perform this search. In doing an analysis like this, you would normally compare the Time Complexity of each process.

For the dictionary lookup, the time complexity is "constant time", or O(1). While this can mean it is generally an integer value of steps that the algorithm can take, it is literally one in this instance.

The other methods will require iteration (or in the case of the if elses traversal - which is essentially a similar approach). These will range from needing to look at all values O(n), to needing to look at some values, O(log n).

As n is the size of the examining set, and as the set gets larger, the variance in the results will as well, with the dictionary consistently outperforming the other options shown.

There is no possible way to be faster than O(1). The only downside to the approach you have shown is that it may require more memory as the set grows, this is referred to as the Spacial Complexity of the algorithm. In this instance though, since we only need one value per item in the set, the spacial complexity will be O(n) which is negligible.

In a general sense of optimizations, it is important to consider how much complexity is present in a current solution, and how much sense it makes to improve on that complexity. If improvements are to be made, they should be aimed at getting to different tiers of performance, for example getting from O(n) to O(log n) or O(log n) to O(1).

Time Complexity Comparisons
Image courtesy: http://bigocheatsheet.com/

Micro-optimizations tend to be of the case where an optimization is made to and from the same complexity tier, and these are often not constructive in and of themselves.

like image 195
Travis J Avatar answered Jan 02 '23 10:01

Travis J