Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression - preg_match Latin and Greek characters [duplicate]

I am trying to create a regular expression for any given string.

Goal: remove ALL characters which are not "latin" or "lowercase greek" or "numbers" .

What I have done so far: [^a-z0-9]
This works perfect for latin characters.

When I try this: [^a-z0-9α-ω] no luck. Works BUT leaves out any other symbol like !!#$%@%#$@,`

My knowledge is limited when it comes to regexp. Any help would be much appreciated!

EDIT:
Posted below is the function that matches characters specified and creates a slug out of it, with a dash as a separation character:

        $q_separator = preg_quote('-');
        $trans = array(
            '&.+?;'                 => '',
            '[^a-z0-9 -]'           => '',
            '\s+'                   => $separator,
            '('.$q_separator.')+'   => $separator
        );

        $str = strip_tags($str);

        foreach ($trans as $key => $val){
            $str = preg_replace("#".$key."#i", $val, $str);
        }

        if ($lowercase === TRUE){
            $str = strtolower($str);
        }

        return trim($str, '-');  

So if the string is: OnCE upon a tIME !#% @$$ in MEXIco
Using the function the output will be: once-upon-a-time-in-mexico

This works fine but I want the preg_match also to exclude greek characters.

like image 366
mallix Avatar asked Dec 22 '25 10:12

mallix


1 Answers

Ok, can this replace your function?

$subject = 'OnCEΨΩ é-+@àupon</span> aαθ tIME !#%@$ in MEXIco in the year 1874 <or 1875';

function format($str, $excludeRE = '/[^a-z0-9]+/u', $separator = '-') {
    $str = strip_tags($str);
    $str = strtolower($str);
    $str = preg_replace($excludeRE, $separator, $str);
    $str = trim($str, $separator);
    return $str;
}
echo format($subject);

Note that you will loose all characters after a < (cause of strip_tags) until you meet a >


// Old answer when I tought you wanted to preserve greek characters

It's possible to build a character range such as α-ω or any strange characters you want! The reason your pattern doesn't work is that you don't inform the regex engine you are dealing with a unicode string. To do that, you must add the u modifier at the end of the pattern. Like that:

/[^a-z0-9α-ω]+/u

You can use chars hexadecimal code too:

/[^a-z0-9\x{3B1}-\x{3C9}]+/u 

Note that if you are sure not to have or want to preserve, uppercase Greek chars in your string, you can use the character class \p{Greek} like this :

/[^a-z0-9\p{Greek}]+/u

(It's a little longer but more explicit)

like image 95
Casimir et Hippolyte Avatar answered Dec 23 '25 22:12

Casimir et Hippolyte



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!