Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C#: Efficiently search a large string for occurences of other strings

Tags:

string

c#

I'm using C# to continuously search for multiple string "keywords" within large strings, which are >= 4kb. This code is constantly looping, and sleeps aren't cutting down CPU usage enough while maintaining a reasonable speed. The bog-down is the keyword matching method.

I've found a few possibilities, and all of them give similar efficiency.

1) http://tomasp.net/articles/ahocorasick.aspx -I do not have enough keywords for this to be the most efficient algorithm.

2) Regex. Using an instance level, compiled regex. -Provides more functionality than I require, and not quite enough efficiency.

3) String.IndexOf. -I would need to do a "smart" version of this for it provide enough efficiency. Looping through each keyword and calling IndexOf doesn't cut it.

Does anyone know of any algorithms or methods that I can use to attain my goal?

like image 202
Jon Willis Avatar asked Jun 23 '09 08:06

Jon Willis


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr. Stroustroupe.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

Is C programming hard?

C is more difficult to learn than JavaScript, but it's a valuable skill to have because most programming languages are actually implemented in C. This is because C is a “machine-level” language. So learning it will teach you how a computer works and will actually make learning new languages in the future easier.


2 Answers

Are you always looking for the same keywords? Try Boyer-Moore. It requires some pre-processing for the keywords, but gains speed afterwards.

like image 57
Erich Kitzmueller Avatar answered Sep 26 '22 23:09

Erich Kitzmueller


I haven't tried it, but have you looked at Rabin-Karp? Apparently it has a bad worst-case complexity, but is usually quite good.

What do your keywords look like? In particular, are they always delimited by spaces (or something similar)? If so, you could basically look through the string once looking for "words" and then either create a map from a word to the list of indexes of that word, or perhaps only do so for keywords you're interested in.

If you could give more details of the exact situation (such as the keywords, delimiters and what you need the result of your search to be) that would help.

like image 39
Jon Skeet Avatar answered Sep 24 '22 23:09

Jon Skeet