How to encode cyrillic characters for URL and then decode them?

Tags:

I have a form on one page:

<form method="POST" accept-charset="UTF-8" action="index.cgi" name="TestForm">

One of the input fields "search_string" may be used to send Cyrillic characters and if that happens the URL string looks like this:

search_string=%41F%2F%424+%41F%41E%414%416%410%420%41A%410+%418%417+%421%412%418%41D

How do I decode this back to the original string on the page I post to?

944

asked Mar 22 '12 08:03

goe

3 Answers

Correct solution, including spaces:

use open ':std', ':encoding(UTF-8)';
use Encode;

my $escaped = '%41F%2F%424+%41F%41E%414%416%410%420%41A%410+%418%417+%421%412%418%41D';
(my $unescaped = $escaped) =~ s/\+/ /g;
$unescaped =~ s/%([[:xdigit:]]+)/chr hex $1/eg;
print $unescaped;
# П/Ф ПОДЖАРКА ИЗ СВИН

Credit goes to Renaud Bompuis for recognising as the first that these are Unicode code-points prefixed with %.

I wish to add that the encoding scheme from the question is very unusual, I haven't seen it before. Normally one would expect the characters string П/Ф ПОДЖАРКА ИЗ СВИН to be encoded as %D0%9F%2F%D0%A4+%D0%9F%D0%9E%D0%94%D0%96%D0%90%D0%A0%D0%9A%D0%90+%D0%98%D0%97+%D0%A1%D0%92%D0%98%D0%9D, that is to say, first the characters are encoded into UTF-8, then the octets are percent-escaped. This scheme works with the answer from Dr.Kameleon.

109

answered Oct 22 '22 19:10

daxim

A solution that preserves the + and any other character in the original string:

my $s = '%41F%2F%424+%41F%41E%414%416%410%420%41A%410+%418%417+%421%412%418%41D';
$s =~ s/%([[:xdigit:]]+)/chr(hex($1))/eg;
print $s;

Result:

П/Ф+ПОДЖАРКА+ИЗ+СВИН

answered Oct 22 '22 19:10

Renaud Bompuis

Try that in your script (index.cgi) :

use Encode;

Then...

$search_string = decode_utf8( $search_string );

Another idea (if you want to create a UTF8-friendly hash of your CGI input) :

require Encode;
require CGI;
my $query = CGI ->new;
my $form_input = {};  
foreach my $name ( $query ->param ) {
  my @val = $query ->param( $name );
  foreach ( @val ) {
    $_ = Encode::decode_utf8( $_ );
  }
  $name = Encode::decode_utf8( $name );
  if ( scalar @val == 1 ) {   
    $form_input ->{$name} = $val[0];
  } else {                      
    $form_input ->{$name} = \@val;  # save value as an array ref
  }
}

Taken from : http://ahinea.com/en/tech/perl-unicode-struggle.html

answered Oct 22 '22 20:10

Dr.Kameleon

Related questions
                            
                                Insert multiple values from an array into another array
                            
                                Script languages: Max. Line Length
                            
                                Does PCRE support unicode string correctly?
                            
                                Accessing class variables using a variable with the class name in perl
                            
                                What's a one-liner with low computational complexity for returning the two biggest values of an array?
                            
                                Why do we use Catalyst's Context Object? What is its purpose?
                            
                                How to read the end of a script file as a data file (Perl or any other language)
                            
                                how should I combine these regex?
                            
                                What's %^H used for in Perl?
                            
                                What's the purpose of such code?
                            
                                Can the empty list be in scalar context?
                            
                                Multiple app directories with Dancer perl
                            
                                using perl with emacs editor
                            
                                How can I throw a warning in Template::Toolkits CATCH block?
                            
                                Is this the cleanest way to extract an AoH subset in Perl?
                            
                                Capture variable assignments in a Perl eval
                            
                                How can I use Catalyst and uri chaining with a REST interface?
                            
                                Installing Perl Modules on Android OS
                            
                                How to read query parameter value from URL using perl command
                            
                                Comparing and validating data structures

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to encode cyrillic characters for URL and then decode them?

Tags:

character-encoding

utf-8

perl

utf8-decode

goe

People also ask

3 Answers

daxim

Renaud Bompuis

Dr.Kameleon

Recent Activity

Donate For Us