Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl UTF8 Concatenation Problems

Tags:

utf-8

perl

I am having trouble concatenating a utf8 string to another after a string has been encoded and then decoded.

#!/usr/bin/perl
use strict;
use utf8;
use URI::Escape;

# binmode(STDOUT, ":utf8");

my $v = "ضثصثضصثشس";
my $v2 = uri_unescape(uri_escape_utf8($v));

print "Works: $v, ", "$v2\n";
print "Fails: $v, $v2\n";
print "Works: " . "$v2\n";

Here's the output:

Works: ضثصثضصثشس ,ضثصثضصثشس
Wide character in print at ./testUTF8.pl line 14.
Fails: ضثصثضصثشس, ضثصثضصثشس
Works: ضثصثضصثشس

If I use binmode utf8, as perl's docs suggest, the warning message disappears but all 3 fail:

Fails: ضثصثضصثشس, ضثصثضصثشس
Fails: ضثصثضصثشس, ضثصثضصثشس
Fails: ضثصثضصثشس

What's going on? How can I fix this?

P.S. I need it URL escaped. Is there any way I can escape/unescape in perl like javascript does? For example, Perl gives me: %D8%B6%D8%AB%D8%B5%D8%AB%D8%B6%D8%B5%D8%AB%D8%B4%D8%B3

This unescapes to: ضثصثضصثشس

When I escape the same text with Javascript, I get: %u0636%u062B%u0635%u062B%u0636%u0635%u062B%u0634%u0633

like image 509
DemiImp Avatar asked Apr 01 '14 21:04

DemiImp


1 Answers

From the documentation of URI::Escape:

uri_unescape($string,...)
Returns a string with each %XX sequence replaced with the actual byte (octet).

It does not interpret the resulting bytes as UTF-8 and will not decode them, you will have to do this manually:

use Encode qw/decode_utf8/;

# untested
my $v2 = decode_utf8 uri_unescape uri_escape_utf8 $v;
...
like image 137
amon Avatar answered Oct 12 '22 11:10

amon