I have a URL that looks like this (note the “„ symbols):
http://tinklarastis.omnitel.lt/kokius-aptarnavimo-kanalus-klientui-siulo-„omnitel“-1494
I receive it from SimplePie parser, if that matters. Now, if you try going to this specific URL in your browser and copy it from the address bar, you would get a URL that has the non-ASCII symbols percent encoded:
http://tinklarastis.omnitel.lt/kokius-aptarnavimo-kanalus-klientui-siulo-%E2%80%9Eomnitel%E2%80%9C-1494
I am trying to understand how can I mimic the same conversion in PHP. I cannot simply use urlencode()
or urlrawencode()
as they encode both non-ASCII symbols and reserved symbols, while in my case the reserved symbols (/?&, etc) should stay as they are.
So far I have only seen solutions that involve splitting the URL into pieces between reserved symbols and then using urlencode()
, but that feels hackish to me and I hope there's a more elegant solution. I have tried various variations of iconv()
, mb_convert_encoding()
, yet with no success yet.
I have a simple one-liner that I use to do in-place encoding only on non-ASCII characters using preg_match_callback:
preg_replace_callback('/[^\x20-\x7f]/', function($match) {
return urlencode($match[0]);
}, $url);
Note that the anonymous function is only supported in PHP 5.3+.
After researching a bit, I came to a conclusion that there's no way to do nicely in PHP (however, other languages like python / perl do seem to have functions exactly for this use case). This is the function I came up with (ensures encoding of path fragment of the URL):
function url_path_encode($url) {
$path = parse_url($url, PHP_URL_PATH);
if (strpos($path,'%') !== false) return $url; //avoid double encoding
else {
$encoded_path = array_map('urlencode', explode('/', $path));
return str_replace($path, implode('/', $encoded_path), $url);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With