I have the following regex that I have been using successfully:
preg_match_all('/(\d+)\n(\w.*)\n(\d{3}\.\d{3}\.\d{2})\n(\d.*)\n(\d.*)/', $text, $matches)
However I have just found that if the text that the (\w.*)
part matches starts with a foreign character such as Ä
, then it doesn't match anything.
Can anyone help me with what the correct pattern should be instead of (\w.*)
to match a string that starts with any character?
Many thanks
If you do want to match umlauts, then add the regex /u
modifier, or use \pL
in place of \w
. That will allow the regex to match letters outside of the ASCII range.
Reference: http://www.regular-expressions.info/unicode.html
and http://php.net/manual/en/regexp.reference.unicode.php
Ä is a German Umlaut if I am not mistaken. \w
Matches (in most flavors) [a-zA-Z0-9_]
.
You will need to match the unicode range of characters that you want.
\x{00C4}
(php) equals the character you want. You will probably need to create a character class to support your unicode characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With