I am sending a file to a server as an octet-stream, and I need to specify the filename in the header:
String filename = "«úü¡»¿.doc"
URL url = new URL("http://www.myurl.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.addRequestProperty("Accept", "application/json; charset=UTF-8");
conn.addRequestProperty("Content-Type", "application/octet-stream; charset=UTF-8");
conn.addRequestProperty("Filename", filename);
// do more stuff here
The problem is, some of the files I need to send have filenames that contain non-ASCII characters. I have read that you cannot send non-ASCII text in an HTTP header.
My questions are:
The URL can't contain any non-ASCII character or even a space. This issue commonly arises from developers misusing symbols or making coding mistakes — it could arise from a lack of knowledge or even negligence.
The name of the HTTP request header you want to set or remove can only contain: Alphanumeric characters: a - z and A - Z. The following special characters: - and _
JSON allows for both escaped or non-escaped non-ascii characters. It'd be useful for this document to include guidance on which style is preferred, or if there is no preference.
Non-ASCII characters are those that are not encoded in ASCII, such as Unicode, EBCDIC, etc. ASCII is limited to 128 characters and was initially developed for the English language.
You cannot use non ASCII character in HTTP headers, see the RFC 2616. URI are themselves standardized by RFC 2396 and don't permit non-ASCII either. The RFC says :
The URI syntax was designed with global transcribability as one of its main concerns. A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters.
In order to use non ASCII characters in URI you need to escape them using the %hexcode syntax (see section 2 of RFC 2396).
In Java you can do this using the java.net.URLEncoder
class.
2020 edit: RFC 2616 has been updated and the relevant section on header syntax is now at https://www.rfc-editor.org/rfc/rfc7230#section-3.2
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-fold = CRLF 1*( SP / HTAB )
; obsolete line folding
; see Section 3.2.4
Where VCHAR is defined in https://www.rfc-editor.org/rfc/rfc7230#section-1.2 as "any visible [USASCII] character". With the [USASCII] reference being
[USASCII] American National Standards Institute, "Coded Character
Set -- 7-bit American Standard Code for Information
Interchange", ANSI X3.4, 1986.
The standards are still very clear, HTTP header are still US-ASCII ONLY
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With