Assuming I have a sting which is <code>"a s d d"</code> and <code>htmlentities</code> turns it into <code>"a&nbsp;s&nbsp;d&nbsp;d"</code>. How to replace (using preg_replace) it without encoding it to entities? I tried <code>preg_replace('/[\xa0]/', '', $string);</code>, but it's not working. I'm trying to remove those special characters from my string as I don't need them What are possibilities beyond regexp? Edit String I want to parse: http://pastebin.com/raw/7eNT9sZr with function <code>preg_replace('/[\r\n]+/', "[##]", $text)</code> for later <code>implode("", explode("[##]", $text))</code> My question is not exactly "how" to do this (since I could encode entities, remove entities i don't need and decode entities). But how to remove those with just str_replace or preg_replace.

<h3>Problem Explanation</h3> The reason why it's not working is that you are specifying the non-breaking space incorrectly. The proper code for the non-breaking space in the UTF-8 encoding is <code>0xC2A0</code>, it consists of two bytes - <code>0xC2</code> (<code>194</code>) and <code>0xA0</code> (<code>160</code>), so technically, you're specifying only the half of the character's code. <h3>A Bit of Theory</h3> Legacy character encodings were using the constant number of bits to encode every character in their set. For example, the original ASCII encoding was using 7 bits per character, extended ASCII 8 bits. The UTF-8 encoding is so-called variable width character encoding, which means that the number of bits used to represent individual characters is variable, in the case of UTF-8, character codes consist of one up to four (8 bit) bytes (octets). In general, similarly to the Huffman coding, more frequently used characters have shorter codes while more rare characters have longer codes. That helps reduce the data size of the average text. <h3>Solution</h3> You can replace all occurences of the UTF-8 non-breaking space in text using a simple (and fast) <code>str_replace</code> or using a more flexible regular expression, depending on your needs: <pre class="prettyprint"><code>// faster solution $regular_spaces = str_replace("\xc2\xa0", ' ', $original_string); // more flexible solution $regular_spaces = preg_replace('/\xc2\xa0/', ' ', $original_string); </code></pre> <h3>Notes</h3> Note that in case of <code>str_replace</code>, you have to use double quotes (<code>"</code>) to enclose the search string because it doesn't understand the textual representation of character codes so it needs those codes to be converted into actual characters first. That's made automatically by PHP because strings enclosed in double quotes are being processed and special sequences (e.g. newline character <code>\n</code>, textual representation of character codes, etc.) are replaced by actual characters (e.g. <code>0x0A</code> for <code>\n</code> in UTF-8) before the string value is being used. In contrast, the <code>preg_replace</code> function itself understands the textual representation of character codes so you don't need PHP to convert them into actual characters and you can use apostrophes (single quotes, <code>'</code>) to enclose the search string in this case.

Sanitize every type of white spaces. <pre class="prettyprint"><code>preg_replace("/\s+/u", " ", $str); </code></pre> https://stackoverflow.com/a/40264711/635364 FYI, PHP Sanitization filter_var() has no filter about these white spaces.

How to replace decoded Non-breakable space (nbsp)

2 Answers

Problem Explanation

The reason why it's not working is that you are specifying the non-breaking space incorrectly.

The proper code for the non-breaking space in the UTF-8 encoding is 0xC2A0, it consists of two bytes - 0xC2 (194) and 0xA0 (160), so technically, you're specifying only the half of the character's code.

A Bit of Theory

Legacy character encodings were using the constant number of bits to encode every character in their set. For example, the original ASCII encoding was using 7 bits per character, extended ASCII 8 bits.

The UTF-8 encoding is so-called variable width character encoding, which means that the number of bits used to represent individual characters is variable, in the case of UTF-8, character codes consist of one up to four (8 bit) bytes (octets). In general, similarly to the Huffman coding, more frequently used characters have shorter codes while more rare characters have longer codes. That helps reduce the data size of the average text.

Solution

You can replace all occurences of the UTF-8 non-breaking space in text using a simple (and fast) str_replace or using a more flexible regular expression, depending on your needs:

// faster solution
$regular_spaces = str_replace("\xc2\xa0", ' ', $original_string);

// more flexible solution
$regular_spaces = preg_replace('/\xc2\xa0/', ' ', $original_string);

Notes

Note that in case of str_replace, you have to use double quotes (") to enclose the search string because it doesn't understand the textual representation of character codes so it needs those codes to be converted into actual characters first. That's made automatically by PHP because strings enclosed in double quotes are being processed and special sequences (e.g. newline character \n, textual representation of character codes, etc.) are replaced by actual characters (e.g. 0x0A for \n in UTF-8) before the string value is being used.

In contrast, the preg_replace function itself understands the textual representation of character codes so you don't need PHP to convert them into actual characters and you can use apostrophes (single quotes, ') to enclose the search string in this case.

answered Oct 11 '22 17:10

David Ferenczy Rogožan

Sanitize every type of white spaces.

preg_replace("/\s+/u", " ", $str);

https://stackoverflow.com/a/40264711/635364

FYI, PHP Sanitization filter_var() has no filter about these white spaces.

answered Oct 11 '22 17:10

Jehong Ahn

Related questions
                            
                                PHP Mocking Final Class
                            
                                PHP, how to pass func-get-args values to another function as list of arguments?
                            
                                What is the difference between split() and explode()?
                            
                                Profiling PHP code
                            
                                PHP $string{0} vs. $string[0];
                            
                                MySQL/SQL retrieve first 40 characters of a text field?
                            
                                How to select PHP version 5 and 7 per virtualhost in Apache 2.4 on Debian?
                            
                                symfony redirect with 2 parameters
                            
                                Is it better to use require_once('filename.php') or require_once 'filename.php';
                            
                                PHP best way to check whether a string is empty or not
                            
                                Limit amount of links shown with Laravel pagination
                            
                                How to run PHP exec() as root?
                            
                                Try Catch cannot work with require_once in PHP?
                            
                                search a php array for partial string match [duplicate]
                            
                                Laravel retrieve binded model in Request
                            
                                Where does IIS 7.5 log errors?
                            
                                wkhtmltopdf - libfontconfig.so.1: cannot open shared object file [closed]
                            
                                Are Magic Methods Best practice in PHP? [closed]
                            
                                Will enabling XDebug on a production server make PHP slower?
                            
                                This distribution is not configured to allow the HTTP request

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to replace decoded Non-breakable space (nbsp)

Tags:

php

special-characters

htmlspecialchars

Grzegorz

People also ask