Currently, I'm trying to look for a solution to encode url which contains unicode characters, Khmer Unicode. I've tried using php built-in function urlencode() and it gives result: For example: http://www.example.com/?kwd=Mac+Book+Pro+នៅប្រទេសយើង
While I've tested with Google search, it results: https://www.google.com.kh/#hl=en&sclient=psy-ab&q=Mac+Book+Pro+%E1%9E%93%E1%9F%85%E1%9E%94%E1%9F%92%E1%9E%9A%E1%9E%91%E1%9F%81%E1%9E%9F%E1%9E%99%E1%9E%BE%E1%9E%84&oq=Mac+Book+Pro+%E1%9E%93%E1%9F%85%E1%9E%94%E1%9F%92%E1%9E%9A%E1%9E%91%E1%9F%81%E1%9E%9F%E1%9E%99%E1%9E%BE%E1%9E%84
How to do that? Hope someone here would help me. Thanks in advance!
Unicode contains many characters that have similar appearance to other characters. Allowing the full range of Unicode into a URL means that characters which look similar—or even identical to—other characters could be used to spoof users.
PHP | urlencode() Function The urlencode() function is an inbuilt function in PHP which is used to encode the url. This function returns a string which consist all non-alphanumeric characters except -_. and replace by the percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs.
Instead of encoding a space as “%20,” you can use the plus sign reserved character to represent a space. For example, the URL “http://www.example.com/products%20and%20services.html” can also be encoded as http://www.example.com/products+and+services.html.
For UTF-8 you can use:
urlencode(utf8_encode($string)); //for encoding
utf8_decode(urldecode($string)); //for decoding
For UTF-16 you can use this function (from notes for urlencode
in http://php.net/urlencode):
function utf16_urlencode ( $str ) {
# convert characters > 255 into HTML entities
$convmap = array( 0xFF, 0x2FFFF, 0, 0xFFFF );
$str = mb_encode_numericentity( $str, $convmap, "UTF-8");
# escape HTML entities, so they are not urlencoded
$str = preg_replace( '/&#([0-9a-fA-F]{2,5});/i', 'mark\\1mark', $str );
$str = urlencode($str);
# now convert escaped entities into unicode url syntax
$str = preg_replace( '/mark([0-9a-fA-F]{2,5})mark/i', '%u\\1', $str );
return $str;
}
function cleanUrl($url) {
$res= urlencode(utf8_encode($url));
$res = str_replace("%3A",":",$res);
$res = str_replace("%2F","/",$res);
return $res;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With