Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching charset of HTTP Header Content-Type

In JavaScript, I want to get "charset" attribute of the HTTP header field name 'Content-Type'

The Regex I've seen thus far has been something like:

var charset = (/^charset=(.+)/im).exec(ContentType)[1];

With ContentType contain informations of Content-Type HTTP header.

But in my testing, the matched result is 'null'

Edit: follow response to @andris leduskrasts, I do this

var ctype = 'text/html; charset=utf-8';
var charset = new RegExp('charset=.*?(?=$|\s|\;|\")').exec(ctype);
system.stdout.writeLine(charset);

I get 'charset=utf-8'. But some idea to get only 'utf-8'. ?

like image 526
LeMoussel Avatar asked Dec 20 '22 03:12

LeMoussel


2 Answers

I just experienced the same problem.

If you need to extract just the charset value from an arbitrary content-type header (which permits characters after the charset assignment as per rfc1341) you can use the following JS regexp:

var re = /charset=([^()<>@,;:\"/[\]?.=\s]*)/i;

This works because the matched group starts after = and excludes the possible endings of the charset specification given in the link; namely ()<>@,;:\"/[]?.=, spaces, and (implicitly) end-of-string.

Since the charset is optional, you can set an appropriate value with something like:

var charset = re.test(ctype) ? re.exec(ctype)[1] : 'utf8';

or some other default.

like image 184
Colin Avatar answered Dec 24 '22 02:12

Colin


If you're fine with the "charset=" part being a part of your result, this will do:

charset=.*?(?=\s|\;|\|$")

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"> results in charset=ISO-8859-1.

If you want to get rid of the "charset=" part already in the regex, it's a bit more tricky, as javascript doesn't support lookbehinds.

EDIT:

If you want only the UTF-8 part, it's easily doable IF your variable is always the content type and, hence, it ends with the actual charset. In this case: [^\s\;\=]*?(?=$) ; which will really just select the last word of your string, after a space, a semicolon and a =. This is by no means a good solution for finding the charset in a random string, but it might do the trick for your particular case.

like image 41
Andris Leduskrasts Avatar answered Dec 24 '22 02:12

Andris Leduskrasts