I'm writing PHP code to parse a string. It needs to be as fast as possible, so are regular expressions the way to go? I have a hunch that PHP string functions are more expensive, but it's just a guess. What's the truth?
Here's specifically what I need to do with the string:
Grab the first half (based on the third location of a substring "000000") and compare its hash to the next 20 bytes, throwing away anything left.
Parse the 9th byte through the next "000000" as one piece of data. Then grab the next 19 bytes after that, and split that into 8 (toss 1) and 8. Then I do some other stuff that converts those two 8 byte strings into dates.
So that's the kind of thing I need to do.
Regex is instrinsically a process of pattern matching and should be used when the types of strings you want to match are variable or only conform to a particular pattern. For cases when a simple string search would suffice, I would always recommend using the in-built methods of the String class.
String operations will always be faster than regular expression operations.
In PHP, regular expressions are strings composed of delimiters, a pattern and optional modifiers. $exp = "/w3schools/i"; In the example above, / is the delimiter, w3schools is the pattern that is being searched for, and i is a modifier that makes the search case-insensitive.
It depends on your case: if you're trying to do something fairly basic (eg: search for a string, replace a substring with something else), then the regular string functions are the way to go. If you want to do something more complicated (eg: search for IP addresses), then the Regex functions are definitely a better choice.
I haven't profiled regexes so I can't say that they'll be faster at runtime, but I can tell you that the extra time spent hacking together the equivalent using the basic functions wouldn't be worth it.
Edit with the new information in the OP:
It sounds as though you actually need to do a number of small string operations here. Since each one individually is quite basic, and I doubt you'd be able to do all those steps (or even a couple of those steps) at one time using a regex, I'd go with the basic functions:
Grab the first half (based on the third location of a substring "000000") and compare its hash to the next 20 bytes, throwing away anything left.
Use: strpos()
and substr()
Or : /$(.*?0{6}.*?0{6}.*?)0{6}/
Then grab the next 19 bytes after that, and split that into 8 (toss 1) and 8.
Use: substr()
- (I assume you mean 17 bytes here -- 8 + 1 + 8)
$part1 = substr($myStr, $currPos, 8);
$part2 = substr($myStr, $currPos + 9, 8);
I think if you want highest performance, you should avoid regex as it helps to minimize effort, but won't have the best performance as you can almost always adjust code using string routines to a specific problem and gain a big performance boost of it. But for simple parsing routines that can't be optimized much, you can still use regex as it won't make a big difference there.
EDIT: For this specific problem you posted I'd favorize string operations, but only because I wouldn't know how to do it in regex. This seems to be pretty straight-forward, except for the hash, so I think regex/string functions won't make a big difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With