Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I preg_match for words in Hebrew

Tags:

php

preg-match

I need a function that matches full words in hebrew in php.

Please help.

like image 469
Haim Bender Avatar asked Dec 16 '09 18:12

Haim Bender


2 Answers

Try this regular expression describing Unicode character properties:

/\p{Hebrew}+/u
like image 138
Gumbo Avatar answered Nov 07 '22 05:11

Gumbo


Assuming your source data is UTF-8 encoded

$input = "ט״סת תעסתינג O״ת סOמע העברעו תעחת";

preg_match_all( "/[\\x{0590}-\\x{05FF}]+/u", $input, $matches );

echo '<pre>';
print_r( $matches );
echo '</pre>';

Yields

Array
(
    [0] => Array
        (
            [0] => ט״סת
            [1] => תעסתינג
            [2] => ״ת
            [3] => ס
            [4] => מע
            [5] => העברעו
            [6] => תעחת
        )

)

I based the range of 0590 through 05FF on this Unicode chart (edit: found more good hebrew/unicode info here). I used this to generate my sample input. Since I don't know hebrew I can't actually verify that the matched output is valid.

You may need to tweak it but hopefully this gets you headed in the right direction.

like image 5
Peter Bailey Avatar answered Nov 07 '22 05:11

Peter Bailey