I am having trouble concatenating a utf8 string to another after a string has been encoded and then decoded.
#!/usr/bin/perl
use strict;
use utf8;
use URI::Escape;
# binmode(STDOUT, ":utf8");
my $v = "ضثصثضصثشس";
my $v2 = uri_unescape(uri_escape_utf8($v));
print "Works: $v, ", "$v2\n";
print "Fails: $v, $v2\n";
print "Works: " . "$v2\n";
Here's the output:
Works: ضثصثضصثشس ,ضثصثضصثشس
Wide character in print at ./testUTF8.pl line 14.
Fails: ضثصثضصثشس, ضثصثضصثشس
Works: ضثصثضصثشس
If I use binmode utf8, as perl's docs suggest, the warning message disappears but all 3 fail:
Fails: ضثصثضصثشس, ضثصثضصثشس
Fails: ضثصثضصثشس, ضثصثضصثشس
Fails: ضثصثضصثشس
What's going on? How can I fix this?
P.S. I need it URL escaped. Is there any way I can escape/unescape in perl like javascript does? For example, Perl gives me: %D8%B6%D8%AB%D8%B5%D8%AB%D8%B6%D8%B5%D8%AB%D8%B4%D8%B3
This unescapes to: ضثصثضصثشس
When I escape the same text with Javascript, I get: %u0636%u062B%u0635%u062B%u0636%u0635%u062B%u0634%u0633
From the documentation of URI::Escape
:
uri_unescape($string,...)
Returns a string with each%XX
sequence replaced with the actual byte (octet).
It does not interpret the resulting bytes as UTF-8 and will not decode them, you will have to do this manually:
use Encode qw/decode_utf8/;
# untested
my $v2 = decode_utf8 uri_unescape uri_escape_utf8 $v;
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With