Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Encode URL Contains Unicode Characters with PHP

Tags:

php

urlencode

Currently, I'm trying to look for a solution to encode url which contains unicode characters, Khmer Unicode. I've tried using php built-in function urlencode() and it gives result: For example: http://www.example.com/?kwd=Mac+Book+Pro+នៅប្រទេសយើង

While I've tested with Google search, it results: https://www.google.com.kh/#hl=en&sclient=psy-ab&q=Mac+Book+Pro+%E1%9E%93%E1%9F%85%E1%9E%94%E1%9F%92%E1%9E%9A%E1%9E%91%E1%9F%81%E1%9E%9F%E1%9E%99%E1%9E%BE%E1%9E%84&oq=Mac+Book+Pro+%E1%9E%93%E1%9F%85%E1%9E%94%E1%9F%92%E1%9E%9A%E1%9E%91%E1%9F%81%E1%9E%9F%E1%9E%99%E1%9E%BE%E1%9E%84

How to do that? Hope someone here would help me. Thanks in advance!

like image 684
Thavarith Avatar asked Mar 28 '12 04:03

Thavarith


People also ask

Can URL contains Unicode?

Unicode contains many characters that have similar appearance to other characters. Allowing the full range of Unicode into a URL means that characters which look similar—or even identical to—other characters could be used to spoof users.

How encode URL in PHP?

PHP | urlencode() Function The urlencode() function is an inbuilt function in PHP which is used to encode the url. This function returns a string which consist all non-alphanumeric characters except -_. and replace by the percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs.

How do you add %20 to a URL?

Instead of encoding a space as “%20,” you can use the plus sign reserved character to represent a space. For example, the URL “http://www.example.com/products%20and%20services.html” can also be encoded as http://www.example.com/products+and+services.html.


2 Answers

For UTF-8 you can use:

urlencode(utf8_encode($string)); //for encoding

utf8_decode(urldecode($string)); //for decoding

For UTF-16 you can use this function (from notes for urlencode in http://php.net/urlencode):

function utf16_urlencode ( $str ) {
     # convert characters > 255 into HTML entities
     $convmap = array( 0xFF, 0x2FFFF, 0, 0xFFFF );
     $str = mb_encode_numericentity( $str, $convmap, "UTF-8");

     # escape HTML entities, so they are not urlencoded
     $str = preg_replace( '/&#([0-9a-fA-F]{2,5});/i', 'mark\\1mark', $str );
     $str = urlencode($str);

     # now convert escaped entities into unicode url syntax
     $str = preg_replace( '/mark([0-9a-fA-F]{2,5})mark/i', '%u\\1', $str );
     return $str;
 }
like image 85
Andy Avatar answered Oct 11 '22 17:10

Andy


function cleanUrl($url) {
    $res= urlencode(utf8_encode($url));
    $res = str_replace("%3A",":",$res);
    $res = str_replace("%2F","/",$res);
    return $res;
}
like image 31
Homayoon Ahmadi Avatar answered Oct 11 '22 16:10

Homayoon Ahmadi