I'm using C# to continuously search for multiple string "keywords" within large strings, which are >= 4kb. This code is constantly looping, and sleeps aren't cutting down CPU usage enough while maintaining a reasonable speed. The bog-down is the keyword matching method.
I've found a few possibilities, and all of them give similar efficiency.
1) http://tomasp.net/articles/ahocorasick.aspx -I do not have enough keywords for this to be the most efficient algorithm.
2) Regex. Using an instance level, compiled regex. -Provides more functionality than I require, and not quite enough efficiency.
3) String.IndexOf. -I would need to do a "smart" version of this for it provide enough efficiency. Looping through each keyword and calling IndexOf doesn't cut it.
Does anyone know of any algorithms or methods that I can use to attain my goal?
C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...
In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr. Stroustroupe.
C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.
C is more difficult to learn than JavaScript, but it's a valuable skill to have because most programming languages are actually implemented in C. This is because C is a “machine-level” language. So learning it will teach you how a computer works and will actually make learning new languages in the future easier.
Are you always looking for the same keywords? Try Boyer-Moore. It requires some pre-processing for the keywords, but gains speed afterwards.
I haven't tried it, but have you looked at Rabin-Karp? Apparently it has a bad worst-case complexity, but is usually quite good.
What do your keywords look like? In particular, are they always delimited by spaces (or something similar)? If so, you could basically look through the string once looking for "words" and then either create a map from a word to the list of indexes of that word, or perhaps only do so for keywords you're interested in.
If you could give more details of the exact situation (such as the keywords, delimiters and what you need the result of your search to be) that would help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With