I am writing a web application and learning how to urlencode html links... All the urlencode questions here (see tag below) are "How to...?" questions. My question is not "How?" but "Why?". Even the wikipedia article only addresses the mechanics of it: http://en.wikipedia.org/wiki/Urlencode but not why I should use urlencode in my application at all. What are the security implications of using (or rather not using) urlencode? How can a failure to use urlencode be exploited? What kind of bugs or failures can crop up with unencoded urls? I'm asking because even without urlencode, a link to my application dev web site like the following works as expected: <code>http://myapp/my%20test/ée/ràé</code> Why should I use urlencode? Or another way to put it: When should I use urlencode? In what kind of situations?

The main reason is it essentially escapes characters to be included in the URL of your webpage. Suppose a user inputs a user form field as "&joe" and we would like to redirect to a page which contains that name as part of the URL, using URL encoding, it would then be, for example: <pre class="prettyprint"><code>localhost/index.php?name=%26joe //note how the ampersand is escaped </code></pre> If you didnt use urlencoding, you would end up with: <pre class="prettyprint"><code>localhost/index.php?name=&joe </code></pre> and that ampersand would cause all sorts of unpredictability

Why should I use urlencode?

Q: What is the difference between Htmlencode and Urlencode?

HTMLEncoding turns this character into "&lt;" which is the encoded representation of the less-than sign. URLEncoding does the same, but for URLs, for which the special characters are different, although there is some overlap.

2 Answers

Update: There is an even better explanation (imo) further above:

A URI is represented as a sequence of characters, not as a sequence of octets. That is because URI might be "transported" by means that are not through a computer network, e.g., printed on paper, read over the radio, etc.

and

For original character sequences that contain non-ASCII characters, however, the situation is more difficult. Internet protocols that transmit octet sequences intended to represent character sequences are expected to provide some way of identifying the charset used, if there might be more than one [RFC2277]. However, there is currently no provision within the generic URI syntax to accomplish this identification. An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used.

Because it is stated in the RFC:

2.4. Escape Sequences

Data must be escaped if it does not have a representation using an unreserved character; this includes data that does not correspond to a printable character of the US-ASCII coded character set, or that corresponds to any US-ASCII character that is disallowed, as explained below.

and

2.4.2. When to Escape and Unescape

A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics. Normally, the only time escape encodings can safely be made is when the URI is being created from its component parts; each component may have its own set of characters that are reserved, so only the mechanism responsible for generating or interpreting that component can determine whether or not escaping a character will change its semantics. Likewise, a URI must be separated into its components before the escaped characters within those components can be safely decoded.

In some cases, data that could be represented by an unreserved character may appear escaped; for example, some of the unreserved "mark" characters are automatically escaped by some systems. If the given URI scheme defines a canonicalization algorithm, then unreserved characters may be unescaped according to that algorithm. For example, "%7e" is sometimes used instead of "~" in an http URL path, but the two are equivalent for an http URL.

Because the percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. Implementers should be careful not to escape or unescape the same string more than once, since unescaping an already unescaped string might lead to misinterpreting a percent data character as another escaped character, or vice versa in the case of escaping an already escaped string.

114

answered Sep 28 '22 07:09

Felix Kling

The main reason is it essentially escapes characters to be included in the URL of your webpage.

Suppose a user inputs a user form field as "&joe" and we would like to redirect to a page which contains that name as part of the URL, using URL encoding, it would then be, for example:

localhost/index.php?name=%26joe //note how the ampersand is escaped

If you didnt use urlencoding, you would end up with:

localhost/index.php?name=&joe

and that ampersand would cause all sorts of unpredictability

answered Sep 28 '22 08:09

Dean P

Related questions
                            
                                How to know if a URL is decoded/encoded?
                            
                                HttpUtility.UrlEncode in console application
                            
                                Url encoding quotes and spaces
                            
                                .net UrlEncode - lowercase problem
                            
                                Is there a package to marshal in and out of x-www-form-urlencoding in golang
                            
                                How to URL encode periods?
                            
                                Why is the comma URL encoded?
                            
                                Best way to get query string from a URL in python?
                            
                                how to insert %20 in place of space in android
                            
                                Using mod_rewrite to convert paths with hash characters into query strings
                            
                                Encoding of XHTML and & (ampersand)
                            
                                Pass a percent (%) sign in a url and get exact value of it using php
                            
                                urllib.quote() throws KeyError
                            
                                URL Encoding—Ampersand Problem
                            
                                cannot urllib.urlencode a URL in python
                            
                                urllib.urlencode doesn't like unicode values: how about this workaround?
                            
                                urlencoding in Dart
                            
                                facebook error 'Error validating verification code'
                            
                                How do I encode URI parameter values?
                            
                                URL-encoded slash in URL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why should I use urlencode?

Tags:

urlencode

augustin

People also ask

2 Answers

Felix Kling

Dean P

Recent Activity

Donate For Us