I'm trying to decode the unicode characters. So I simply tried the hexadecimal escape sequence \x{}
inside the regex substitution e
use LWP::Simple;
my $k = get("url");
my ($kv) =map{/js_call\(\\"(.+?)\\"\)/} $k;
#now $kv data is https://someurl/call.pl?id=15967737\u0026locale=en-GB\u0026mkhun=ccce
$kv=~s/\\u(.{4})/"\x{$1}"/eg;
I'm trying substitute the all unicode character.
My expected output is:
https://someurl/call.pl?id=15967737&locale=en-GB&mkhun=ccce
Below mentioned print
statement gives the expected output. However the regex seems doesn't working properly.
print "\x{0026}";
The problem with s/\\u(.{4})/"\x{$1}"/e
is that the backslash escape \x{$1}
is evaluated at compile time, which gives a NULL byte:
$ perl -E 'printf "%vX\n", "\x{$1}"'
0
If we escape the backslash in front of x
( s/\\u(.{4})/"\\x{$1}"/ge
) we get a string with literal escape sequences, but still not the desired unicode character:
use feature qw(say);
$kv = '\u0026';
$kv =~ s/\\u(.{4})/"\\x{$1}"/ge;
say $kv;
The output is now:
\x{0026}
With a small modification, you can produce "\x{0026}"
instead, which is Perl code you can compile and execute to produce the desired value. To do this, you need involve eval(EXPR)
.
$kv =~ s/\\u(.{4})/ my $s = eval(qq{"\\x{$1}"}); die $@ if $@; $s /ge;
This can be shortened to
$kv =~ s/\\u(.{4})/ qq{"\\x{$1}"} /gee;
Howver, a far better solution is to use the following:
$kv =~ s/\\u(.{4})/chr hex $1/ge;
If you enable use warnings
you'll see that the $1
gets evaluated literally before the backreference gets interpolated.
$kv =~ s/\\u(.{4})/ sprintf("\"\\x{%s}\"", $1) /eeg;
sort of works, but it is hideously ugly. I've been trying to simplify it, but the various ideas I tried always got me back to "Illegal hexadecimal digit '$' ignored" warnings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With