I need to handle URI (i.e. percent) encoding and decoding in my Perl script. How do I do that? <hr> This is a question from the official perlfaq. We're importing the perlfaq to Stack Overflow.

DIY encode (improving above version): <pre class="prettyprint"><code>$string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%02x", ord $1 /eg; </code></pre> (note the '%02x' rather than only '%0x') DIY decode (adding '+' -> ' '): <pre class="prettyprint"><code>$string =~ s/\+/ /g; $string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg; </code></pre> Coders helping coders - bartering knowledge!

Maybe this will help deciding which method to choose. Benchmarks on perl 5.32. Every function returns same result for given <code>$input</code>. Code: <pre class="prettyprint lang-perl prettyprint-override"><code>#!/usr/bin/env perl my $input = "ala ma 0,5 litra 40%'owej vodki :)"; use Net::Curl::Easy; my $easy = Net::Curl::Easy->new(); use URI::Encode qw( uri_encode ); use URI::Escape qw( uri_escape ); use Benchmark(cmpthese); cmpthese(-3, { 'a' => sub { my $string = $input; $string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg; }, 'b' => sub { my $string = $input; $string = $easy->escape( $string ); }, 'c' => sub { my $string = $input; $string = uri_encode( $string, {encode_reserved => 1} ); }, 'd' => sub { my $string = $input; $string = uri_escape( $string ); }, }); </code></pre> And results: <pre class="prettyprint lang-none prettyprint-override"><code> Rate c d a b c 5618/s -- -98% -99% -100% d 270517/s 4716% -- -31% -80% a 393480/s 6905% 45% -- -71% b 1354747/s 24016% 401% 244% -- </code></pre> Not surprising. A specialized C solution is the fast. An in-place regex with no sub calls is quite fast, followed closely by a copying regex with a sub call. I didn't look into why <code>uri_encode</code> was so much worse than <code>uri_escape</code>.

Using Perl, how do I decode or create those %-encodings on the web?

I need to handle URI (i.e. percent) encoding and decoding in my Perl script. How do I do that?

This is a question from the official perlfaq. We're importing the perlfaq to Stack Overflow.

How do I encode a string in Perl?

$octets = encode(ENCODING, $string [, CHECK]) Encodes a string from Perl's internal form into ENCODING and returns a sequence of octets. ENCODING can be either a canonical name or an alias. For encoding names and aliases, see Defining Aliases. For CHECK, see Handling Malformed Data.

What is Unicode in Perl?

Description. This document gives a general idea of Unicode and how to use Unicode in Perl. Unicode. Unicode is a character set standard which plans to codify all of the writing systems of the world, plus many other symbols.

What is Uri decode?

The decodeURI() function decodes a Uniform Resource Identifier (URI) previously created by encodeURI() or by a similar routine.

This is the official FAQ answer minus subsequent edits.

Those % encodings handle reserved characters in URIs, as described in RFC 2396, Section 2. This encoding replaces the reserved character with the hexadecimal representation of the character's number from the US-ASCII table. For instance, a colon, :, becomes %3A.

In CGI scripts, you don't have to worry about decoding URIs if you are using CGI.pm. You shouldn't have to process the URI yourself, either on the way in or the way out.

If you have to encode a string yourself, remember that you should never try to encode an already-composed URI. You need to escape the components separately then put them together. To encode a string, you can use the URI::Escape module. The uri_escape function returns the escaped string:

my $original = "Colon : Hash # Percent %";

my $escaped = uri_escape( $original );

print "$escaped\n"; # 'Colon%20%3A%20Hash%20%23%20Percent%20%25'

To decode the string, use the uri_unescape function:

my $unescaped = uri_unescape( $escaped );

print $unescaped; # back to original

If you wanted to do it yourself, you simply need to replace the reserved characters with their encodings. A global substitution is one way to do it:

# encode
$string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;

#decode
$string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;

DIY encode (improving above version):

$string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%02x", ord $1 /eg;

(note the '%02x' rather than only '%0x')

DIY decode (adding '+' -> ' '):

$string =~ s/\+/ /g; $string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;

Coders helping coders - bartering knowledge!

Maybe this will help deciding which method to choose.

Benchmarks on perl 5.32. Every function returns same result for given $input.

Code:

#!/usr/bin/env perl

my $input = "ala ma 0,5 litra 40%'owej vodki :)";

use Net::Curl::Easy;
my $easy = Net::Curl::Easy->new();
use URI::Encode qw( uri_encode );
use URI::Escape qw( uri_escape );
use Benchmark(cmpthese);

cmpthese(-3, {
    'a' => sub {
        my $string = $input;
        $string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;
    },
    'b' => sub {
        my $string = $input;
        $string = $easy->escape( $string );
    },
    'c' => sub {
        my $string = $input;
        $string = uri_encode( $string, {encode_reserved => 1} ); 
    },
    'd' => sub {
        my $string = $input;
        $string = uri_escape( $string );
    },
});

And results:

       Rate      c      d      a      b
c    5618/s     --   -98%   -99%  -100%
d  270517/s  4716%     --   -31%   -80%
a  393480/s  6905%    45%     --   -71%
b 1354747/s 24016%   401%   244%     --

Not surprising. A specialized C solution is the fast. An in-place regex with no sub calls is quite fast, followed closely by a copying regex with a sub call. I didn't look into why uri_encode was so much worse than uri_escape.

Using Perl, how do I decode or create those %-encodings on the web?

Tags:

url-encoding

perl

percent-encoding

perlfaq

People also ask

3 Answers

2 revs

Joseph Martin

alan

Recent Activity

Donate For Us

Using Perl, how do I decode or create those %-encodings on the web?

Tags:

url-encoding

perl

percent-encoding

perlfaq

People also ask

3 Answers

2 revs

Joseph Martin

alan

Related questions

Recent Activity

Donate For Us