Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 in PHP regular expressions [duplicate]

Tags:

regex

php

utf-8

I need help with regular expressions. My string contains unicode characters and code below doesn't work.

First four characters must be numbers, then comma and then any alphabetic characters or whitespaces... I already read that if i add /u on end of regular expresion but it didn't work for me...

My code works with non-unicode characters

$post = '9999,škofja loka';;
echo preg_match('/^[0-9]{4},[\s]*[a-zA-Z]+', $post);

Thanks for your answers!

like image 526
Gasper Avatar asked Jun 20 '11 07:06

Gasper


1 Answers

Updated answer:
This is now tested and working

$post = '9999, škofja loka';
echo preg_match('/^\\d{4},[\\s\\p{L}]+$/u', $post);

\\w will not work, because it does not contain all unicode letters and contains also [0-9_] additionally to the letters.

Important is also the u modifier to activate the unicode mode.

If there can be letters or whitespace after the comma then you should put those into the same character class, in your regex there are 0 or more whitespace after the comma and then there are only letters.

See http://www.regular-expressions.info/php.html for php regex details

The \\p{L} (Unicode letter) is explained here

Important is also the use of the end of string boundary $ to ensure that really the complete string is verified, otherwise it will match only the first whitespace and ignore the rest for example.

like image 149
stema Avatar answered Oct 20 '22 14:10

stema