I use PHP. My string can look like this <code>This is a string-test width åäö and some über+strange characters: _like this?</code> Question Is there a way to remove non-alphanumeric characters and replace them with a space? Here are some non-alphanumeric characters: <ul> <li>-</li> <li>+</li> <li>:</li> <li>_</li> <li>?</li> </ul> I've read many threads about it but they don't support other languages, like this one: <pre class="prettyprint"><code>preg_replace("/[^A-Za-z0-9 ]/", '', $string); </code></pre> Requirements <ul> <li>My list of none letter characters might not be complete.</li> <li>My content contain characters in different languages, like åäöü. Could be very many more.</li> <li>The non-alphanumeric characters should be replaced with a space. Else the word would be glued to eachother.</li> </ul>

You can try this: <pre class="prettyprint"><code>preg_replace('~[^\p{L}\p{N}]++~u', ' ', $string); </code></pre> <code>\p{L}</code> stands for all alphabetic characters (whatever the alphabet). <code>\p{N}</code> stands for numbers. With the u modifier characters of the subject string are treated as unicode characters. Or this: <pre class="prettyprint"><code>preg_replace('~\P{Xan}++~u', ' ', $string); </code></pre> <code>\p{Xan}</code> contains unicode letters and digits. <code>\P{Xan}</code> contains all that is not unicode letters and digits. (Be careful, it contains white spaces too that you can preserve with <code>~[^\p{Xan}\s]++~u</code> ) If you want a more specific set of allowed letters you must replace <code>\p{L}</code> with ranges in unicode table. Example: <pre class="prettyprint"><code>preg_replace('~[^a-zÀ-ÖØ-öÿ&Yuml;\d]++~ui', ' ', $string); </code></pre> Why using a possessive quantifier (++) here? <code>~\P{Xan}+~u</code> will give you the same result as <code>~\P{Xan}++~u</code>. The difference here is that in the first the engine records each backtracking position (that we don't need) when in the second it doesn't (as in an atomic group). The result is a small performance profit. I think it's a good practice to use possessive quantifiers and atomic groups when it's possible. However, the PCRE regex engine makes automatically a quantifier possessive in obvious situations (example: <code>a+b</code> => <code>a++b</code>) except If the PCRE module has been compiled with the option PCRE_NO_AUTO_POSSESS. (http://www.pcre.org/pcre.txt) More informations about possessive quantifiers and atomic groups here (possessive quantifiers) and here (atomic groups) or here

PHP Regular expression - Remove all non-alphanumeric characters

Tags:

regex

replace

php

utf-8

I use PHP.

My string can look like this

This is a string-test width åäö and some über+strange characters: _like this?

Question

Is there a way to remove non-alphanumeric characters and replace them with a space? Here are some non-alphanumeric characters:

I've read many threads about it but they don't support other languages, like this one:

preg_replace("/[^A-Za-z0-9 ]/", '', $string);

Requirements

My list of none letter characters might not be complete.
My content contain characters in different languages, like åäöü. Could be very many more.
The non-alphanumeric characters should be replaced with a space. Else the word would be glued to eachother.

758

asked May 07 '13 19:05

Jens Törnell

1 Answers

You can try this:

preg_replace('~[^\p{L}\p{N}]++~u', ' ', $string);

\p{L} stands for all alphabetic characters (whatever the alphabet).

\p{N} stands for numbers.

With the u modifier characters of the subject string are treated as unicode characters.

Or this:

preg_replace('~\P{Xan}++~u', ' ', $string);

\p{Xan} contains unicode letters and digits.

\P{Xan} contains all that is not unicode letters and digits. (Be careful, it contains white spaces too that you can preserve with ~[^\p{Xan}\s]++~u )

If you want a more specific set of allowed letters you must replace \p{L} with ranges in unicode table.

Example:

preg_replace('~[^a-zÀ-ÖØ-öÿŸ\d]++~ui', ' ', $string);

Why using a possessive quantifier (++) here?

~\P{Xan}+~u will give you the same result as ~\P{Xan}++~u. The difference here is that in the first the engine records each backtracking position (that we don't need) when in the second it doesn't (as in an atomic group). The result is a small performance profit.

I think it's a good practice to use possessive quantifiers and atomic groups when it's possible.

However, the PCRE regex engine makes automatically a quantifier possessive in obvious situations (example: a+b => a++b) except If the PCRE module has been compiled with the option PCRE_NO_AUTO_POSSESS. (http://www.pcre.org/pcre.txt)

More informations about possessive quantifiers and atomic groups here (possessive quantifiers) and here (atomic groups) or here

138

answered Oct 30 '22 10:10

Casimir et Hippolyte

Related questions
                            
                                for vs foreach vs while which is faster for iterating through arrays in php
                            
                                PHP GD Allowed memory size exhausted
                            
                                magento limiting number of returned items in product collection call
                            
                                PHP: What's the difference between initializing an array with "new" vs without it?
                            
                                How do I resolve a strpos() "empty delimiter" error?
                            
                                Unit Testing in PHP [closed]
                            
                                Can't use method return value in write context
                            
                                How can i add syntax highlight in eclipse for cakephp view.ctp files?
                            
                                What does $this actually mean> Codeigniter
                            
                                PHP, MYSQL: Order by date but empty dates last not first
                            
                                php DOMDocument nodeName property returning '#text' with the nodeName
                            
                                Checking if php variable is a number or not [duplicate]
                            
                                Multiple routes with the same anonymous callback using Slim Framework
                            
                                Deny access multiple .php files with .htaccess? [duplicate]
                            
                                php - replace string occurrences
                            
                                password_compat for older php version
                            
                                How to match exactly 3 digits in PHP's preg_match? [duplicate]
                            
                                Long running background PHP script blocks other PHP pages until it is finished
                            
                                Three columns - get MAX with SQL
                            
                                .htaccess not redirecting to 404.php instead shows page name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With