When should an asterisk be encoded in an HTTP URL?

Tags:

According to RFC1738, an asterisk (*) "may be used unencoded within a URL":

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

However, w3.org's Naming and Addressing material says that the asterisk is "reserved for use as having special signifiance within specific schemes" and implies that it should be encoded.

Also, according to RFC3986, a URL is a URI:

The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location").

It also specifies that the asterisk is a "sub-delim", which is part of the "reserved set" and:

URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component.

It also explicitly specifies that it updates RFC1738.

I read all of this as requiring that asterisks be encoded in a URL unless they are used for a special purpose defined by the URI scheme.

Is RFC1738 the canonical reference for the HTTP URI scheme? Does it somehow exempt the asterisk from encoding, or is it obsolete in that regard due to RFC3986?

Wikipedia says that "[t]he character does not need to be percent-encoded when it has no reserved purpose." Does RFC1738 remove the reserved purpose of the asterisk?

Various resources and tools seems split on this question.

PHP's urlencode and rawurlencode-- the latter of which purports to follow RFC3986 -- do encode the asterisk.

However, JavaScript's escape and encodeURIComponent do not encode the asterisk.

And Java's URLEncoder does not encode the asterisk:

The special characters ".", "-", "*", and "_" remain the same.

Popular online tools (top two results for a Google search for "online url encoder") also do not encode the asterisk. The URL Encode and Decode Tool specifically states that "[t]he reserved characters have to be encoded only under certain circumstances." It goes on to list the asterisk and ampersand as reserved characters. It encodes the ampersand but not the asterisk.

Other similar questions in the Stack Exchange community seem to have stale, incomplete, or unconvincing answers:

urlencode() the 'asterisk' (star?) character This question highlights the differences between Java's and PHP's treatment of the asterisk and asks which is "right". The accepted answer references only RFC1738, not mentioning the more recent RFC3986 and resolving the conflict. Another answer acknowledges the discrepancy and suggests that asterisks are different for URLs specifically, as opposed to other URIs, but it doesn't provide specific authority for that conclusion.
Can an URL have an asterisk? One answer cites only the older RFC1738 and the accepted answer implies it's acceptable when being used as a delimiter, which one presumes is the "reserved purpose".
Can I use asterisks in URLs? The accepted answer seems to discourage use of the asterisk without clarifying the rules governing the use. Another answer says you can use the asterisk "because it's a reserved character". But isn't that only true if you're using it for its reserved purpose?
escaping special character in a url One answer points out that "there is some ambiguity on whether an asterisk must be encoded in a URL". I'm trying to resolve that ambiguity with this question.
Spring UriUtils and RFC3986 This question notes that UriUtil's encodeQueryParam purports to follow RFC3986, but it doesn't encode the asterisk. There are no answers to that question as of 2014-08-01 12:50 PM CDT.
How to encode a URL in JavaScript? This seems to be the canonical JavaScript URL encoding question on Stack Overflow, and although the answers note that asterisks are excluded from the various methods, they don't address whether they should be.

With all this in mind, when should an asterisk be encoded in an HTTP URL?

525

asked Aug 01 '14 17:08

Riley Major

1 Answers

##Short answer

The current definition of URL syntax indicates that you never need to percent-encode the asterisk character in the path, query, or fragment components of a URL.

HTTP 1.1

As @Riley Major pointed out, the RFC that HTTP 1.1 references for URL syntax has been obsoleted by RFC3986, which isn't as black and white about the use of asterisks as the originally referenced RFC was.

RFC2396 (URL spec before January 2005 - original answer)

An asterisk never needs to be encoded in HTTP 1.1 URLs as * is listed as an "unreserved character" in RFC2396, which is used to define URI syntax in HTTP 1.1. Unreserved characters are allowed in the path component of a URL.

2.3. Unreserved Characters

Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include upper and lower case letters, decimal digits, and a limited set of punctuation marks and symbols.
   unreserved  = alphanum | mark     mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" 
Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.

RFC3986 (current URL syntax for HTTP)

RFC3986 modifies RFC2396 to make the asterisk a reserved character, with the reason that it is "typically unsafe to decode". My understanding of this RFC is that the unencoded asterisk character is allowed in the path, query, and fragment components of a URL, as these components do not specify the asterisk as a delimiter (2.2. Reserved Characters):

These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax... If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

Additionally, 3.3 Path confirms that a subset of reserved characters (sub-delims) can be used unencoded in path segments (parts of the path component broken up by /):

Aside from dot-segments ("." and "..") in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment. ... For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same.

HTTP 1.0

HTTP 1.0 references RFC1738 to define URL syntax, which through a series of updates and obsoletes means it uses the same RFC as HTTP 1.1 for URL syntax.

As far as backwards compatibility goes, RFC1738 specifies the asterisk as a reserved character, though as HTTP 1.0 doesn't actually define any special meaning for an unencoded asterisk in the path component of a URL, it shouldn't break anything if you use one. This should mean you're still safe putting asterisks in the URLs pointing to the oldest of systems.

As a side note, the asterisk character does have a special meaning in a Request-URI in both HTTP specs, but it's not possible to represent it with an HTTP URL:

The asterisk "*" means that the request does not apply to a particular resource, but to the server itself, and is only allowed when the method used does not necessarily apply to a resource. One example would be
   OPTIONS * HTTP/1.1 

Disclaimer: I'm just reading and interpreting these RFCs myself, so I may be wrong.

159

answered Sep 25 '22 15:09

Stecman

Related questions
                            
                                How to send a file from remote URL as a GET response in Node.js Express app?
                            
                                Nodejs POST request multipart/form-data
                            
                                CORS and Origin header?
                            
                                How to get the raw content of a response in requests with Python?
                            
                                RestTemplate client with cookies
                            
                                Such thing as a dummy REST server to test HTTP requests? [closed]
                            
                                Error java.lang.RuntimeException: Stub! in Android with Fitnesse testing
                            
                                Guzzle HTTP - add Authorization header directly into request
                            
                                Python 3 Get HTTP page
                            
                                What's the best way to monitor your REST API? [closed]
                            
                                Why don't we send binary around instead of text on http?
                            
                                HTTPS test server that checks client certificates
                            
                                BinaryFileResponse in Laravel undefined
                            
                                Why Is HTTP/SOAP considered to be "thick"
                            
                                Simple objective-c GET request
                            
                                Java URLConnection : how can I find out the size of a web file?
                            
                                Making a HTTP GET request with HTTP-Basic authentication
                            
                                Safari 7 Network Request Timeline doesn't show redirects?
                            
                                How to view the whole raw http request?
                            
                                nginx map directive: why is it allowed only on http level?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With