I have Chinese users of my PHP web application who enter products into our system. The information the’re entering is for example a product title and price.
We would like to use the product title to generate a nice URL slug for those product. Seems like we cannot just use Chinese as HREF attributes.
Does anyone know how we handle a title like “婴儿服饰” so that we can generate a clean url like http://www.site.com/婴儿服饰
?
Everything works fine for “normal” languages, but high UTF‐8 languages give us problems.
Also, when generating the clean URL, we want to keep SEO in mind, but I have no experience with Chinese in that matter.
If your string is already UTF-8, just use rawurlencode
to encode the string properly:
$path = '婴儿服饰';
$url = 'http://example.com/'.rawurlencode($path);
UTF-8 is the preferred character encoding for non-ASCII characters (although only ASCII characters are allowed in URIs which is why you need to use the percent-encoding). The result is the same as in tchrist’s example:
http://example.com/%E5%A9%B4%E5%84%BF%E6%9C%8D%E9%A5%B0
This code, which uses the CPAN module, URI::Escape:
#!/usr/bin/env perl
use v5.10;
use utf8;
use URI::Escape qw(uri_escape_utf8);
my $url = "http://www.site.com/";
my $path = "婴儿服饰";
say $url, uri_escape_utf8($path);
when run, prints:
http://www.site.com/%E5%A9%B4%E5%84%BF%E6%9C%8D%E9%A5%B0
Is that what you're looking for?
BTW, those four characters are:
CJK UNIFIED IDEOGRAPH-5A74
CJK UNIFIED IDEOGRAPH-513F
CJK UNIFIED IDEOGRAPH-670D
CJK UNIFIED IDEOGRAPH-9970
Which, according to the Unicode::Unihan database, seems to be yīng ér fú shì, or perhaps just ying er fú shi per Lingua::ZH::Romanize::Pinyin. And maybe even jing¹ jan⁴ fuk⁶ sik¹ or jing˥ jan˨˩ fuk˨ sik˥, using the Cantonese version from Unicode::Unihan.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With