I'm trying to strip all characters from a string except:
$
)_
)U+0080
and U+FFFF
I've got the first three conditions by doing this:
preg_replace('/[^a-zA-Z\d$_]+/', '', $foo);
How do I go about matching the fourth condition? I looked at using \X
but there has to be a better way than listing out 65000+ characters.
This will make your regular expressions work with all Unicode regex engines. In addition to the standard notation, \p{L}, Java, Perl, PCRE, the JGsoft engine, and XRegExp 3 allow you to use the shorthand \pL. The shorthand only works with single-letter Unicode properties.
To show a range of characters, use square backets and separate the starting character from the ending character with a hyphen. For example, [0-9] matches any digit. Several ranges can be put inside square brackets. For example, [A-CX-Z] matches 'A' or 'B' or 'C' or 'X' or 'Y' or 'Z'.
\u000d — Carriage return — \r. \u2028 — Line separator. \u2029 — Paragraph separator.
\p{L} matches a single code point in the category "letter". \p{N} matches any kind of numeric character in any script.
You can use:
$foo = preg_replace('/[^\w$\x{0080}-\x{FFFF}]+/u', '', $foo);
\w
- is equivalent of [a-zA-Z0-9_]
\x{0080}-\x{FFFF}
to match characters between code points U
+0080and
U+FFFF`/u
for unicode support in regexIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With