Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP : non-preg_match version of: preg_match("/[^a-z0-9]/i", $a, $match)?

Tags:

php

preg-match

Supposedly string is:

$a = "abc-def"


if (preg_match("/[^a-z0-9]/i", $a, $m)){
  $i = "i stopped scanning '$a' because I found a violation in it while 
  scanning it from left to right. The violation was: $m[0]";
}

echo $i;

example above: should indicate "-" was the violation.

I would like to know if there is a non-preg_match way of doing this.

I will likely run benchmarks if there is a non-preg_match way of doing this perhaps 1000 or 1 million runs, to see which is faster and more efficient.

In the benchmarks "$a" will be much longer. To ensure it is not trying to scan the entire "$a" and to ensure it stops soon as it detects a violation within the "$a"

Based on information I have witnessed on the internet, preg_match stops when the first match is found.

UPDATE:

this is based on the answer that was given by "bishop" and will likely to be chosen as the valid answer soon ( shortly ).

i modified it a little bit because i only want it to report the violator character. but i also commented that line out so benchmark can run without entanglements.

let's run a 1 million run based on that answer.

$start_time = microtime(TRUE);

$count = 0;
while ($count < 1000000){

$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$input   = 'abc-def';

$validLen = strspn($input, $allowed);
if ($validLen < strlen($input)){
    #echo "violation at: ". substr($input, $validLen,1);
}

$count = $count + 1;
};

$end_time = microtime(TRUE);
$dif = $end_time - $start_time;

echo $dif;

the result is: 0.606614112854 ( 60 percent of a second )

let's do it with the preg_match method.

i hope everything is the same. ( and fair ).. ( i say this because there is the ^ character in the preg_match )

$start_time = microtime(TRUE);

$count = 0;
while ($count < 1000000){

$input   = 'abc-def';
preg_match("/[^a-z0-9]/i", $input, $m);
#echo "violation at:". $m[0];

$count = $count + 1;
};

$end_time = microtime(TRUE);
$dif = $end_time - $start_time;

echo $dif;

i use "dif" in reference to the terminology "difference".

the "dif" was.. 1.1145210266113

( took 11 percent more than a whole second )

( if it was 1.2 that would mean it is 2x slower than the php way )

like image 546
Steady State Avatar asked Oct 19 '22 14:10

Steady State


1 Answers

You want to find the location of the first character not in the given range, without using regular expressions? You might want strspn or its complement strcspn:

$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$input   = 'abc-def';

$validLen = strspn($input, $allowed);
if (strlen($input) !== $validLen) {
    printf('Input invalid, starting at %s', substr($input, $validLen)); 
} else {
    echo 'Input is valid';
}

Outputs Input invalid, starting at -def. See it live.

strspn (and its complement) are very old, very well specified (POSIX even). The standard implementations are optimized for this task. PHP just leverages that platform implementation, so PHP should be fast, too.

like image 51
bishop Avatar answered Nov 03 '22 02:11

bishop