I have a text in Burmese language, UTF-8. I am using PHP to work with the text. At some point along the way, some ZWSPs have crept in and I would like to remove them. I have tried two different ways of removing the characters, and neither seems to work. First I have tried to use: <pre class="prettyprint"><code> $newBody = str_replace("&#8203;", "", $newBody); </code></pre> to search for the HTML entity and remove it, as this is how it appears under Web Inspector. The spaces don't get removed. I have also tried it as: <pre class="prettyprint"><code> $newBody = str_replace("&#8203", "", $newBody); </code></pre> and get the same no result. The second method I tried was found on this question Remove ZERO WIDTH NON-JOINER character from a string in PHP which looked like this: <pre class="prettyprint"><code> $newBody = str_replace("\xE2\x80\x8C", "", $newBody); </code></pre> but I also got no result. The ZWSP was not removed. <pre class="prettyprint"><code>An example word in the text ($newBody) looks like this : ယူ&#8203;က&#8203;ရိန် And I want to make it look like this : ယူကရိန်း </code></pre> Any ideas? Would a preg_replace work better somehow? So I did try <pre class="prettyprint"><code>$newBody = preg_replace("/\xE2\x80\x8B/", "", $newBody); </code></pre> and it appears to be workings, but now there is another issue. <pre class="prettyprint"><code><a class="defined" title="Ukraine">ယူ&#8203;က&#8203;ရိန်း</a> </code></pre> gets transformed into <pre class="prettyprint"><code><a class="defined _tt_t_" title="Ukraine" style="font-family: 'Masterpiece Uni Sans', TharLon, Myanmar3, Yunghkio, Padauk, Parabaik, 'WinUni Innwa', 'Win Uni Innwa', 'MyMyanmar Unicode', Panglong, 'Myanmar Sangam MN', 'Myanmar MN';">ယူကရိန်း</a> </code></pre> I don't want it to add all that extra stuff. Any ideas why this is happening? Apart from coming up with some way to target only the text in between , is there another way to prevent the preg_replace from adding all this extra stuff? Btw, using google chrome on a mac. It seems to act a bit differently with firefox...

This: <pre class="prettyprint"><code>$newBody = str_replace("&#8203;", "", $newBody); </code></pre> presumes the text is HTML entity encoded. This: <pre class="prettyprint"><code>$newBody = str_replace("\xE2\x80\x8C", "", $newBody); </code></pre> should work if the offending characters are not encoded, but matches the wrong character (0xe2808c). To match the same character as #8203; you need 0xe2808b: <pre class="prettyprint"><code>$newBody = str_replace("\xE2\x80\x8B", "", $newBody); </code></pre>

Remove Unicode Zero Width Space PHP

Tags:

php

unicode

str-replace

I have a text in Burmese language, UTF-8. I am using PHP to work with the text. At some point along the way, some ZWSPs have crept in and I would like to remove them. I have tried two different ways of removing the characters, and neither seems to work.

First I have tried to use:

  $newBody = str_replace("&#8203;", "", $newBody);

to search for the HTML entity and remove it, as this is how it appears under Web Inspector. The spaces don't get removed. I have also tried it as:

  $newBody = str_replace("&#8203", "", $newBody);

and get the same no result.

The second method I tried was found on this question Remove ZERO WIDTH NON-JOINER character from a string in PHP

which looked like this:

 $newBody = str_replace("\xE2\x80\x8C", "", $newBody);

but I also got no result. The ZWSP was not removed.

An example word in the text ($newBody) looks like this : ယူ&#8203;က&#8203;ရိန်
And I want to make it look like this : ယူကရိန်း

Any ideas? Would a preg_replace work better somehow?

So I did try

$newBody = preg_replace("/\xE2\x80\x8B/", "", $newBody);

and it appears to be workings, but now there is another issue.

<a class="defined" title="Ukraine">ယူ&#8203;က&#8203;ရိန်း</a>

gets transformed into

<a class="defined _tt_t_" title="Ukraine" style="font-family: 'Masterpiece Uni Sans', TharLon, Myanmar3, Yunghkio, Padauk, Parabaik, 'WinUni Innwa', 'Win Uni Innwa', 'MyMyanmar Unicode', Panglong, 'Myanmar Sangam MN', 'Myanmar MN';">ယူကရိန်း</a>

I don't want it to add all that extra stuff. Any ideas why this is happening? Apart from coming up with some way to target only the text in between , is there another way to prevent the preg_replace from adding all this extra stuff? Btw, using google chrome on a mac. It seems to act a bit differently with firefox...

577

asked Mar 24 '14 02:03

Jimmy Long

1 Answers

This:

$newBody = str_replace("&#8203;", "", $newBody);

presumes the text is HTML entity encoded. This:

$newBody = str_replace("\xE2\x80\x8C", "", $newBody);

should work if the offending characters are not encoded, but matches the wrong character (0xe2808c). To match the same character as #8203; you need 0xe2808b:

$newBody = str_replace("\xE2\x80\x8B", "", $newBody);

answered Sep 24 '22 09:09

Jef

Related questions
                            
                                php compare two associative arrays
                            
                                Unable to work with FOSRestBundle
                            
                                Laravel: Permission denied in laravel Blade File
                            
                                if connection is keep alive how to read until end of stream php
                            
                                get header information from php curl post request [duplicate]
                            
                                PhpExcel - How insert the same row after row N?
                            
                                Building query string programmatically in Guzzle?
                            
                                PHP parse HTML tags [duplicate]
                            
                                How to properly display Chinese characters in PHP?
                            
                                Check whether a field has the property `UNIQUE` in mysql and PHP
                            
                                Yii with PHP Storm Auto Complete and Class Recognition
                            
                                Array_unique SORT_REGULAR flag
                            
                                php time() vs mktime() for current timestamp
                            
                                What is better to use: in_array or array_unique?
                            
                                AddType in htaccess causes page to download
                            
                                what is the difference between sqlite3 and pdo_sqlite
                            
                                How to run an php application without installing xampp on client system?
                            
                                Unable to connect to ssl://gateway.sandbox.push.apple.com:2195 (Connection refused)
                            
                                Object oriented php class simple example
                            
                                Dynamic Images for email such as countdown clocks (in light of gmail image caching)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With