Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can i detect hebrew characters both iso8859-8 and utf8 in a string using php

Tags:

regex

php

hebrew

I want to be able to detect (using regular expressions) if a string contains hebrew characters both utf8 and iso8859-8 in the php programming language. thanks!

like image 866
ufk Avatar asked Nov 07 '09 20:11

ufk


3 Answers

Here's map of the iso8859-8 character set. The range E0 - FA appears to be reserved for Hebrew. You could check for those characters in a character class:

[\xE0-\xFA]

For UTF-8, the range reserved for Hebrew appears to be 0591 to 05F4. So you could detect that with:

[\u0591-\u05F4]

Here's an example of a regex match in PHP:

echo preg_match("/[\u0591-\u05F4]/", $string);
like image 133
Andomar Avatar answered Sep 23 '22 16:09

Andomar


well if your PHP file is encoded with UTF-8 as should be in cases that you have hebrew in it, you should use the following RegX:

$string="אבהג";
echo preg_match("/\p{Hebrew}/u", $string);
// output: 1
like image 30
talsibony Avatar answered Sep 25 '22 16:09

talsibony


Here's a small function to check whether the first character in a string is in hebrew:

function IsStringStartsWithHebrew($string)
{
    return (strlen($string) > 1 && //minimum of chars for hebrew encoding
        ord($string[0]) == 215 && //first byte is 110-10111
        ord($string[1]) >= 144 && ord($string[1]) <= 170 //hebrew range in the second byte.
        );
}

good luck :)

like image 36
Roey Avatar answered Sep 22 '22 16:09

Roey