Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sending non-ASCII text in HTTP POST header

Tags:

java

I am sending a file to a server as an octet-stream, and I need to specify the filename in the header:

String filename = "«úü¡»¿.doc"
URL url = new URL("http://www.myurl.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.addRequestProperty("Accept", "application/json; charset=UTF-8");
conn.addRequestProperty("Content-Type", "application/octet-stream; charset=UTF-8");
conn.addRequestProperty("Filename", filename);
// do more stuff here

The problem is, some of the files I need to send have filenames that contain non-ASCII characters. I have read that you cannot send non-ASCII text in an HTTP header.

My questions are:

  1. Can you send non-ASCII text in an HTTP header?
  2. If you can, how do you do this? The code above does not work when filename contains non-ASCII text. The server responds with "Bad Request 400".
  3. If you cannot, what is the typical way to get around this limitation?
like image 825
guest99 Avatar asked Mar 09 '11 20:03

guest99


People also ask

Can a URL contain non-ASCII characters?

The URL can't contain any non-ASCII character or even a space. This issue commonly arises from developers misusing symbols or making coding mistakes — it could arise from a lack of knowledge or even negligence.

What characters are allowed in HTTP header?

The name of the HTTP request header you want to set or remove can only contain: Alphanumeric characters: a - z and A - Z. The following special characters: - and _

Can JSON contain non-ASCII characters?

JSON allows for both escaped or non-escaped non-ascii characters. It'd be useful for this document to include guidance on which style is preferred, or if there is no preference.

What is non-ASCII text?

Non-ASCII characters are those that are not encoded in ASCII, such as Unicode, EBCDIC, etc. ASCII is limited to 128 characters and was initially developed for the English language.


1 Answers

You cannot use non ASCII character in HTTP headers, see the RFC 2616. URI are themselves standardized by RFC 2396 and don't permit non-ASCII either. The RFC says :

The URI syntax was designed with global transcribability as one of its main concerns. A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters.

In order to use non ASCII characters in URI you need to escape them using the %hexcode syntax (see section 2 of RFC 2396).

In Java you can do this using the java.net.URLEncoder class.

2020 edit: RFC 2616 has been updated and the relevant section on header syntax is now at https://www.rfc-editor.org/rfc/rfc7230#section-3.2

 header-field   = field-name ":" OWS field-value OWS

 field-name     = token
 field-value    = *( field-content / obs-fold )
 field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
 field-vchar    = VCHAR / obs-text

 obs-fold       = CRLF 1*( SP / HTAB )
                ; obsolete line folding
                ; see Section 3.2.4

Where VCHAR is defined in https://www.rfc-editor.org/rfc/rfc7230#section-1.2 as "any visible [USASCII] character". With the [USASCII] reference being

[USASCII]     American National Standards Institute, "Coded Character
              Set -- 7-bit American Standard Code for Information
              Interchange", ANSI X3.4, 1986.

The standards are still very clear, HTTP header are still US-ASCII ONLY

like image 52
Bruno Rohée Avatar answered Sep 29 '22 04:09

Bruno Rohée