I have a list of strings and need to find which strings match a given input value. what is the most efficient way (memory vs execution speed) for me to store this list of strings and be able to search through it? The start-up and loading of the list of strings isnt important, but the response time for searching is.
should i be using a List or HashSet or just a basic string[] or something else?
Python Find String in List using count() We can also use count() function to get the number of occurrences of a string in the list. If its output is 0, then it means that string is not present in the list. l1 = ['A', 'B', 'C', 'D', 'A', 'A', 'C'] s = 'A' count = l1.
The easiest way to check if a Python string contains a substring is to use the in operator. The in operator is used to check data structures for membership in Python. It returns a Boolean (either True or False ).
Use any() function to check if a list contains a substring in Python. The any(iterable) with iterable as a for-loop that checks if any element in the list contains the substring and returns the Boolean value.
Method#1: Using split() methodThe split method is used to split the strings and store them in the list. The built-in method returns a list of the words in the string, using the “delimiter” as the delimiter string.
It depends very much on the nature of the strings and the size of the collection. Depending on characteristics of the collection, and the expected search strings, there are ways to organize things very cleverly so that searching is very fast. You haven't given us that information.
But here's what I'd do. I'd set a reasonable performance requirement. Then I'd try a n-gram index (why? because you said in a comment you need to account for partial matches; a HashSet<string>
won't help you here) and I'd profile reasonable inputs that I expect against this solution and see if it meets my performance requirements or not. If it does, I'd accept the solution and move on. If it doesn't, I'd think very carefully about whether or not my performance requirements are reasonable. If they are, I'd start thinking about whether or not there is something special about my inputs and collection that might enable me to use some more clever solutions.
It seems the best way is to build a suffix tree of your input in O(input_len) time then do queries of your patterns in O(pattern_length) time. So if your text is really big compared to your patterns, this will work well.
See Ukkonen's algorithm for building a suffix tree.
If you want inexact matching...see the work of Gonzalo Navarro.
Use a Dictionary<string>()
or an HashSet<string>
is probably good for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With