Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrieve partial matches from a list of strings

For approaches to retrieving partial matches in a numeric list, go to:

  • How to return a subset of a list that matches a condition?

  • Python: Find in list


But if you're looking for how to retrieve partial matches for a list of strings, you'll find the best approaches concisely explained in the answer below.

SO: Python list lookup with partial match shows how to return a bool, if a list contains an element that partially matches (e.g. begins, ends, or contains) a certain string. But how can you return the element itself, instead of True or False

Example:

l = ['ones', 'twos', 'threes']
wanted = 'three'

Here, the approach in the linked question will return True using:

any(s.startswith(wanted) for s in l)

So how can you return the element 'threes' instead?

like image 894
vestland Avatar asked Sep 29 '20 20:09

vestland


People also ask

How do you partially match a string in Python?

Use the in operator for partial matches, i.e., whether one string contains the other string. x in y returns True if x is contained in y ( x is a substring of y ), and False if it is not. If each character of x is contained in y discretely, False is returned.

How do you find partial match in Excel?

If you just want to find which name is partial match the given name, you also can use this formula =INDEX($E$2:$E$14,MATCH($K$1&"*",E2:E14,0)). (E2:E14 is the column list you want to lookup from, k1 is the given name, you can change as you need.)

How do you match a string to a list in Python?

Python Find String in List using count() We can also use count() function to get the number of occurrences of a string in the list. If its output is 0, then it means that string is not present in the list. l1 = ['A', 'B', 'C', 'D', 'A', 'A', 'C'] s = 'A' count = l1.

How do you find partial matches in R?

The charmatch() is a built-in R function that finds matches between two arguments. To do a Partial String Matching in R, use the charmatch() function. The charmatch() function accepts three arguments and returns the integer vector of the same length as input.


5 Answers

  • startswith and in, return a Boolean.
  • The in operator is a test of membership.
  • This can be performed with a list-comprehension or filter.
  • Using a list-comprehension, with in, is the fastest implementation tested.
  • If case is not an issue, consider mapping all the words to lowercase.
    • l = list(map(str.lower, l)).
  • Tested with python 3.10.0

filter:

  • Using filter creates a filter object, so list() is used to show all the matching values in a list.
l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = list(filter(lambda x: x.startswith(wanted), l))

# using in
result = list(filter(lambda x: wanted in x, l))

print(result)
[out]:
['threes']

list-comprehension

l = ['ones', 'twos', 'threes']
wanted = 'three'

# using startswith
result = [v for v in l if v.startswith(wanted)]

# using in
result = [v for v in l if wanted in v]

print(result)
[out]:
['threes']

Which implementation is faster?

  • Tested in Jupyter Lab using the words corpus from nltk v3.6.5, which has 236736 words
  • Words with 'three'
    • ['three', 'threefold', 'threefolded', 'threefoldedness', 'threefoldly', 'threefoldness', 'threeling', 'threeness', 'threepence', 'threepenny', 'threepennyworth', 'threescore', 'threesome']
from nltk.corpus import words

%timeit list(filter(lambda x: x.startswith(wanted), words.words()))
[out]:
64.8 ms ± 856 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit list(filter(lambda x: wanted in x, words.words()))
[out]:
54.8 ms ± 528 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [v for v in words.words() if v.startswith(wanted)]
[out]:
57.5 ms ± 634 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit [v for v in words.words() if wanted in v]
[out]:
50.2 ms ± 791 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
like image 114
Trenton McKinney Avatar answered Oct 29 '22 07:10

Trenton McKinney


Instead of returning the result of the any() function, you can use a for-loop to look for the string instead:

def find_match(string_list, wanted):
    for string in string_list:
        if string.startswith(wanted):
            return string
    return None

>>> find_match(['ones', 'twos', 'threes'], "three")
'threes'
like image 8
damon Avatar answered Oct 29 '22 07:10

damon


A simple, direct answer:

test_list = ['one', 'two','threefour']
r = [s for s in test_list if s.startswith('three')]
print(r[0] if r else 'nomatch')

Result:

threefour

Not sure what you want to do in the non-matching case. r[0] is exactly what you asked for if there is a match, but it's undefined if there is no match. The print deals with this, but you may want to do so differently.

like image 8
CryptoFool Avatar answered Oct 29 '22 05:10

CryptoFool


I'd say the most closely related solution would be to use next instead of any:

>>> next((s for s in l if s.startswith(wanted)), 'mydefault')
'threes'
>>> next((s for s in l if s.startswith('blarg')), 'mydefault')
'mydefault'

Just like any, it stops the search as soon as it found a match, and only takes O(1) space. Unlike the list comprehension solutions, which always process the whole list and take O(n) space.

Ooh, alternatively just use any as is but remember the last checked element:

>>> if any((match := s).startswith(wanted) for s in l):
        print(match)

threes
>>> if any((match := s).startswith('blarg') for s in l):
        print(match)

>>>

Another variation, only assign the matching element:

>>> if any(s.startswith(wanted) and (match := s) for s in l):
        print(match)

threes

(Might want to include something like or True if a matching s could be the empty string.)

like image 6
superb rain Avatar answered Oct 29 '22 05:10

superb rain


this seems simple to me so i might have misread but you could just run it through a foor loop w/ an if statement;

l = ['ones', 'twos', 'threes']
wanted = 'three'

def run():
    for s in l:
        if (s.startswith(wanted)):
            return s

print(run())

output: threes

like image 5
Ironkey Avatar answered Oct 29 '22 06:10

Ironkey