Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: is urlencode() a safe way to allow valid UTF-8 strings in the URL?

Tags:

url

php

utf-8

I have user submitted tags that can be any type of (valid) UTF-8 string. I want to know if it is safe to include them in the URL merly by running them through urlencode().

In other words, is urlencode() safe to use for valid UTF-8 strings? (by valid I mean id have already force-encoded them to UTF-8)

like image 858
Xeoncross Avatar asked Jan 07 '10 23:01

Xeoncross


1 Answers

urlencode does not depend on a specific character encoding. It just looks at the bytes, interprets them as ASCII characters and replaces any byte that is either not allowed in ASCII (0x80–0xFF) or not allowed in plain in a URL.

Now to your question: Yes, using urlencode does encode any string in any character encoding to be safely used – but only in the URL query! Because urlencode formats the input according to application/x-www-form-urlencoded that differs from the “normal” percent encoding in how the space is encoded: In application/x-www-form-urlencoded spaces are replaced by + while the “normal” percent encoding replaces them by %20.

If you want to “normal” percent encoding use rawurlencode instead.

like image 57
Gumbo Avatar answered Oct 24 '22 06:10

Gumbo