Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficient way to search for string in list of string?

I have a list of strings and need to find which strings match a given input value. what is the most efficient way (memory vs execution speed) for me to store this list of strings and be able to search through it? The start-up and loading of the list of strings isnt important, but the response time for searching is.

should i be using a List or HashSet or just a basic string[] or something else?

like image 387
MakkyNZ Avatar asked Dec 28 '11 15:12

MakkyNZ


People also ask

How to search string in list in Python?

Python Find String in List using count() We can also use count() function to get the number of occurrences of a string in the list. If its output is 0, then it means that string is not present in the list. l1 = ['A', 'B', 'C', 'D', 'A', 'A', 'C'] s = 'A' count = l1.

How do you check if a string is a substring of items in a list of strings Python?

The easiest way to check if a Python string contains a substring is to use the in operator. The in operator is used to check data structures for membership in Python. It returns a Boolean (either True or False ).

How do you check if a string is a substring of items in a list of strings?

Use any() function to check if a list contains a substring in Python. The any(iterable) with iterable as a for-loop that checks if any element in the list contains the substring and returns the Boolean value.

Can list store string?

Method#1: Using split() methodThe split method is used to split the strings and store them in the list. The built-in method returns a list of the words in the string, using the “delimiter” as the delimiter string.


3 Answers

It depends very much on the nature of the strings and the size of the collection. Depending on characteristics of the collection, and the expected search strings, there are ways to organize things very cleverly so that searching is very fast. You haven't given us that information.

But here's what I'd do. I'd set a reasonable performance requirement. Then I'd try a n-gram index (why? because you said in a comment you need to account for partial matches; a HashSet<string> won't help you here) and I'd profile reasonable inputs that I expect against this solution and see if it meets my performance requirements or not. If it does, I'd accept the solution and move on. If it doesn't, I'd think very carefully about whether or not my performance requirements are reasonable. If they are, I'd start thinking about whether or not there is something special about my inputs and collection that might enable me to use some more clever solutions.

like image 125
jason Avatar answered Nov 11 '22 01:11

jason


It seems the best way is to build a suffix tree of your input in O(input_len) time then do queries of your patterns in O(pattern_length) time. So if your text is really big compared to your patterns, this will work well.

See Ukkonen's algorithm for building a suffix tree.

If you want inexact matching...see the work of Gonzalo Navarro.

like image 32
Cris Stringfellow Avatar answered Nov 11 '22 02:11

Cris Stringfellow


Use a Dictionary<string>() or an HashSet<string> is probably good for you.

  • Look here for Dictionary
  • and here for HashSet
like image 36
Felice Pollano Avatar answered Nov 11 '22 03:11

Felice Pollano