Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP – Slow String Manipulation

I have some very large data files and for business reasons I have to do extensive string manipulation (replacing characters and strings). This is unavoidable. The number of replacements runs into hundreds of thousands.

It's taking longer than I would like. PHP is generally very quick but I'm doing so many of these string manipulations that it's slowing down and script execution is running into minutes. This is a pain because the script is run frequently.

I've done some testing and found that str_replace is fastest, followed by strstr, followed by preg_replace. I've also tried individual str_replace statements as well as constructing arrays of patterns and replacements.

I'm toying with the idea of isolating string manipulation operation and writing in a different language but I don't want to invest time in that option only to find that improvements are negligible. Plus, I only know Perl, PHP and COBOL so for any other language I would have to learn it first.

I'm wondering how other people have approached similar problems?

I have searched and I don't believe that this duplicates any existing questions.

like image 673
Simon Roberts Avatar asked Dec 04 '12 10:12

Simon Roberts


People also ask

Are PHP strings manipulated?

String functions in PHP are used to manipulate string values. Used to return part of the string.

What is used for faster temporary string manipulation?

If you know the length of a string, you can use mem functions instead of str functions. For example, memcpy is faster than strcpy because it does not have to search for the end of the string. If you are certain that the source and target do not overlap, use memcpy instead of memmove .

How are strings handled in PHP?

The PHP strpos() function searches for a specific text within a string. If a match is found, the function returns the character position of the first match. If no match is found, it will return FALSE.

Which of the following is the use of strpos () function in PHP?

strpos in PHP is a built-in function. Its use is to find the first occurrence of a substring in a string or a string inside another string. The function returns an integer value which is the index of the first occurrence of the string.


2 Answers

Well, considering that in PHP some String operations are faster than array operation, and you are still not satisfied with its speed, you could write external program as you mentioned, probably in some "lower level" language. I would recommend C/C++.

like image 190
Dusan Kasan Avatar answered Oct 20 '22 06:10

Dusan Kasan


There are two ways of handling this, IMO:

  • [easy] Precompute some generic replacements in a background process and store them in a DB/file (this trick comes from a gamedev, where all the sinuses and cosinuses are precomputed once and then stored in RAM). You can easily run into curse of dimensionality here, though;
  • [not so easy] Implement replacement tool in C++ or other fast and compilable programming language and use it afterwards. Sphinx is a good example of fast manipulation tool on big textual data sets implemented in C++.
like image 21
Evgeniy Chekan Avatar answered Oct 20 '22 06:10

Evgeniy Chekan