I have a data structure which consists of pairs of values, the first of which is an integer and the second of which is an alphanumeric string (which may begin with digits):
+--------+-----------------+
| Number | Name |
+--------+-----------------+
| 15 | APPLES |
| 16 | APPLE COMPUTER |
| 17 | ORANGE |
| 21 | TWENTY-1 |
| 291 | 156TH ELEMENT |
+--------+-----------------+
A table of these would comprise up to 100,000 rows.
I'd like to provide a lookup function in which the user can look up either the number (as if it were a string), or pieces of the string. Ideally the lookup will be "live" as the user types; after each keystroke (or maybe after a brief delay ~250-500 ms) a new search will be done to find the most likely candidates. So, for example searching on
1
will return 15 APPLES
, 16 APPLE COMPUTER
, 17 ORANGE
, and
291 156TH ELEMENT
15
will narrow the search to 15 APPLES
, 291 156TH ELEMENT
AP
will return 15 APPLES
and 16 APPLE COMPUTER
ELEM
will return 291 156TH ELEMENT
.I was thinking about using two Dictionary<string, string>
s since ultimately the int
s are being compared as string
s -- one will index by the integer part and the other by the string part.
But really searching by substring shouldn't use a hash function, and it seems wasteful to use twice the memory that I feel like I should need.
Ultimately the question is, is there any well-performing way to text search two large lists simultaneously for substrings?
Failing that, how about a SortedDictionary
? Might increase performance but still wouldn't solve the hash problem.
Thought about creating a regex on the fly, but I would think that would perform terribly.
I'm new to C# (having come from the Java world) so I haven't looked into LINQ yet; is that the answer?
EDIT 18:21 EST: None of the strings in the "Name" field will be longer than 12-15 characters, if that affects your potential solution.
Yes, it's a bit odd that you can get a loud consensus that K&R is a great C book, and also a loud consensus that C99 is the correct/current/best version of C.
C is one of the earliest and most widely used programming languages. C is the fourth most popular programming language in the world as of January 2022. Modern languages such as Go, Swift, Scala, and Python are not as popular as C. Where is C used today?
C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...
If possible, I would avoid loading all 100,000 entries into memory. I would use either a database or Lucene.Net to index the values. Then use the appropriate query syntax to efficiently search for the results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With