Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace ideographic space in PHP string?

Tags:

php

ideographic space is http://www.charbase.com/3000-unicode-ideographic-space, it is a CJK punctuation. It looks like a normal space but it actually takes 2 positions in screen instead of 1(like a Chinese character would do)

I tried using str_replace(" ","",$mystring) to get rid of them, but of course it doesn't work, because the space I input here is an ASCII space. I also tried manually input the ideographic space using Chinese character input method, but it looks like this way I will also get rid of part of other characters' code and it returns gibberish.

So how can I get rid of these spaces?

like image 864
shenkwen Avatar asked Oct 19 '22 20:10

shenkwen


2 Answers

I was able to replace the character just fine by copying the symbol from the informational page you linked to. You might want to create a CONST alias for the ideographic space just to help make and find/replace coding more clear.

// contains ideographic space between words
$start = 'before after';                    

// contains ideographic space in needle parameter
$test1 = str_replace(' ', '_', $start);     

// contains ideographic space
define('ID_SPACE', ' ');                    
$test2 = str_replace(ID_SPACE, '&', $start);

// contains normal space in needle parameter
$test3 = str_replace(' ','_',$start);       

// make sure we are using utf8 for this test
header('Content-Type: text/html; charset=utf-8');

echo $start.'<br/>';
echo $test1.'<br/>';
echo $test2.'<br/>';
echo $test3;

output:

before after
before_after
before&after
before after

Edit in response to question

While you cannot see it, the character is being displayed in the box shown, just click-drag to select like you would any other text, and then you can paste it as needed. You can also just copy the code from my answer which contains the space. If you see something like   then you need to set your charset to utf-8

enter image description here

like image 84
WebChemist Avatar answered Nov 01 '22 14:11

WebChemist


You can convert things from their escaped numeric values directly. I've had the following function sitting around for years. I didn't write it, and I'm afraid I don't recall where I found it. It's a bit of a hack, but a damn useful one I think.

<?php

function code2utf($num) {
  if($num<128)return chr($num);
  if($num<2048)return chr(($num>>6)+192).chr(($num&63)+128);
  if($num<65536)return chr(($num>>12)+224).chr((($num>>6)&63)+128).chr(($num&63)+128);
  if($num<2097152)return chr(($num>>18)+240).chr((($num>>12)&63)+128).chr((($num>>6)&63)+128).chr(($num&63)+128);
  return '';
}

print "a" . code2utf(0x3000) . "b" . code2utf(0x1f44d) . "\n";

And when I run this, I see:

$ php -f utftest
a b👍

Note that the what looks like two spaces is a single double-width character.

Perhaps you can use the above function to construct your input string, like this:

str_replace(code2utf(0x3000),"",$mystring);

The obvious advantage of a solution like this over WebChemist's copy-and-paste solution is that it's entirely programmatic, and does not require any special functions as part of the programmer's tools. You will not accidentally overwrite the ID_SPACE character when reformatting your code, and the function is reusable for other UTF8 characters you might need to represent, without the need to actually have those characters within your code.


Of course, the other way you could do this is with the built-in PHP function html_entity_decode(). The following produces results identical to my function, using HTML escaped characters as input:

$ php -r 'print html_entity_decode("a&#x3000;b&#x1f44d;") . "\n";'
a b👍
like image 42
ghoti Avatar answered Nov 01 '22 12:11

ghoti