Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Searching Through a Large Word List

Tags:

c#

.net

memory

I have a file containing about 170,000 words or so. What would be the best way of dealing with this in .NET?

Does it make sense to load it into a List keep in memory and search the list. Would a list of this size be an issue to keep in memory? Any suggestions in regards to loading and searching this type of list would be appreciated.

Thanks,

like image 536
Gerardo Calderon Avatar asked Dec 06 '25 02:12

Gerardo Calderon


1 Answers

Does it make sense to load it into a List keep in memory and search the list. Would a list of this size be an issue to keep in memory?

Unless your words are very long, memory will not be an issue here.

If you are speaking of the English language on the standard Latin alphabet then memory will not be an issue.

But you have to be specific on your word length. If you are dealing with, say, words on {A, C, G, T} and those words happen to be DNA then, yes, memory will be an issue.

Any suggestions in regards to loading and searching this type of list would be appreciated.

What type of search are you doing? Are you searching for existence or are you searching for a nearest match (say, a nearest alphabetical match)? If just existence, use a HashSet<string>. If nearest alphabetical match, I would start with a sorted List<string> and do a binary search. But if your words are very long, I might consider something like a prefix tree.

The answer to this last question depends deeply on what exactly is it you are doing.

like image 63
jason Avatar answered Dec 08 '25 16:12

jason