Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient collection class in C# for string search

string[] words = System.IO.File.ReadAllLines("word.txt");
var query = from word in words
            where word.Length > "abe".Length && word.StartsWith("abe")
            select word;
foreach (var w in query.AsParallel())
{
    Console.WriteLine(w);
}

Basically the word.txt contains 170000 English words. Is there a collection class in C# that is faster than array of string for the above query? There will be no insert or delete, just search if a string starts with "abe" or "abdi".

Each word in the file is unique.

EDIT 1 This search will be performed potentially millions of times in my application. Also I want to stick with LINQ for collection query because I might need to use aggregate function.

EDIT 2 The words from the file are sorted already, the file will not change

like image 913
nobody Avatar asked May 01 '11 05:05

nobody


People also ask

Which collection is fast in C#?

A HashSet, similar to a Dictionary, is a hash-based collection, so look ups are very fast with O(1). But unlike a dictionary, it doesn't store key/value pairs; it only stores values.

Which collection type is faster if you have to find an item?

If you need fast add and removal of elements, use LinkedList (but it has a very poor seeking performance).

Are there collections in C?

There are no standard collections in C outside of the array.

What are the collection types can be used in C#?

Collections in C# are classified into two types - Generic collections and non-generic collections.


Video Answer


2 Answers

myself I'd create a Dictionary<char, List<string>>, where I'd group words by their first letter. This will reduce substantially the lookup of needed word.

like image 61
Eugen Avatar answered Oct 12 '22 17:10

Eugen


If you need to do search once there is nothing better than linear search - array is perfectly fine for it.

If you need to perform repeated searches you can consider soring the array (n Log n) and search by any prefix will be fast (long n). Depending on type of search using dictionary of string lists indexed by prefix may be another good option.

like image 20
Alexei Levenkov Avatar answered Oct 12 '22 16:10

Alexei Levenkov