Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delphi: efficient and fast Unicode text search

Is there a fast and efficient text search in a Unicode text/string? I need to search a part of a word too, not only a whole word.

SearchBuf?

Thanks!

like image 588
maxfax Avatar asked Feb 21 '26 09:02

maxfax


1 Answers

As has already been pointed out, the fastest way of doing this depends on a number of things, most importantly whether you need to be able to search repeatedly or not. The second question is how important is it to you to really have the "fastest" approach rather than a reasonably fast approach and the amount of time you are willing to invest in optimisations.

Repeated searches

If you need to search repeatedly, the most efficient way for string searching I know of is by the use of suffix arrays (often combined with Burrows-Wheeler transforms). This approach is used extensively in bioinformatics where one often has to deal with a huge number of string searches over really large data sets (e.g. here). A very good suffix array (and BWT) library is the libdivsufsort C library, but unfortunately I know of no Delphi port of this library. (I believe this library is capable of handling unicode strings.)

Single searches

If you don't need to search repeatedly, a brute-force string search algorithm can be efficient, for instance the assembly-optimised FastCode versions of Pos and friends. These were, however, written before Delphi was unicode-ified and I know of no similar optimised unicode implementations. If I were to write one today and wanted to optimise it for a modern processor (capable of the SSE4.2 instruction set), I would have a serious look at the PCMPESTRI assembly instruction (reference pdf here; see also e.g. here, but I have no idea whether that code is working), which can handle the 2-byte characters you'd need for unicode string searching.

like image 66
PhiS Avatar answered Feb 24 '26 15:02

PhiS



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!