Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient method of checking if a string starts with one of a set of strings

Tags:

python

I have a need to match a string to see if it starts with one of about 40 strings, and this method gets called a heck of a lot.

Currently it does

for pref, newval in list_of_prefixes:
    if oldval.startswith(pref):
         return newval
return oldval

However, given it's called a huge number of times, it makes sense for this to be efficient as possible. I could ensure list_of_prefixes is sorted, and then drop out of the loop as soon as pref > oldval, but that doesn't seem to gain a great deal.

Currently the largest number of input values fall somewhere between two of the prefixes, so I could explicitly test for that, or search in reverse order, but although this is efficient for the data set now, it might be less efficient should the data set change.

Originally, there was only 1 possible prefix, so performance was perhaps less of an issue.

I looked at string.startswith(tuple()) but that seems only to make it easier to write, and it doesn't tell me which tuple matched, so where there is a match I have to do the check twice.

like image 332
Tom Tanner Avatar asked Feb 10 '16 12:02

Tom Tanner


People also ask

How do you check if a string starts with another string?

The startsWith() method returns true if a string starts with a specified string. Otherwise it returns false . The startsWith() method is case sensitive. See also the endsWith() method.

How do you check if a string starts with a specific character?

startsWith() The startsWith() method determines whether a string begins with the characters of a specified string, returning true or false as appropriate.

How do you check if a string starts with another string in Python?

The startswith() method returns True if the string starts with the specified value, otherwise False.

How do you check if a string is a prefix of another string Java?

Java String startsWith() The Java String class startsWith() method checks if this string starts with the given prefix. It returns true if this string starts with the given prefix; else returns false.


1 Answers

With a compiled regular expression, I would expect the overhead of compiling a regex to pay itself back when you have more than just a few strings. Basically, the compiled regex is an automaton which runs out of traversable paths pretty quickly if the prefix is not one which the automaton recognizes. Especially if all the matches are anchored to the beginning of the string, it should fail very quickly when there is no match.

import re

prefixes = ['foo', 'bar', 'baz']
rx = re.compile(''.join(['^(?:', '|'.join(prefixes), ')']))
for line in input:
    match = rx.match(line)
    if match:
        matched = match.group(0)

If you need a more complex regular expression (say, one with trailing context after the closing parenthesis), you will want to use regular grouping parentheses ( instead of non-grouping (?:, and fetch group(1) instead.

Here is the same with a dictionary mapping prefixes to replacements:

prefixes = {'foo': 'nu', 'bar': 'beer', 'baz': 'base'}
rx = re.compile(''.join(['^(?:', '|'.join(prefixes.keys()), ')']))
for line in input:
    match = rx.match(line)
    if match:
        newval = prefixes[match.group(0)]

Actually, as pointed out in comments, the ^ is not strictly necessary with re.match().

like image 92
tripleee Avatar answered Sep 24 '22 15:09

tripleee