Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match and validate internet media type?

I want to validate internet types input via my API.

Can you help writing a regex to match?

Example types below from http://en.wikipedia.org/wiki/Internet_media_type

application/atom+xml
application/EDI-X12
application/xml-dtd
application/zip
application/vnd.openxmlformats-officedocument.presentationml.presentation
video/quicktime

Must meet standard:

type / media type name [+suffix]
like image 580
Pete Thorne Avatar asked Dec 15 '22 20:12

Pete Thorne


1 Answers

I recently had a need to validate media types a bit more strictly than the existing answers. Here's what I came up with, based on the intersection of the grammar from RFC 2045 Section 5.1 and RFC 7231 Section 3.1.1.1 (which disallows {} in tokens and whitespace except between parameters). For a C-like language with (?:) non-capturing groups:

ows = "[ \t]*";
token = "[0-9A-Za-z!#$%&'*+.^_`|~-]+";
quotedString = "\"(?:[^\"\\\\]|\\.)*\"";
type = "(application|audio|font|example|image|message|model|multipart|text|video|x-(?:" + token + "))";
parameter = ";" + ows + token + "=" + "(?:" + token + "|" + quotedString + ")";
mediaType = type + "/" + "(" + token + ")((?:" + ows + parameter + ")*)";

This ends up with a rather monstrous

"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)((?:[ \t]*;[ \t]*[0-9A-Za-z!#$%&'*+.^_`|~-]+=(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+|\"(?:[^\"\\\\]|\\.)*\"))*)"

which captures type, subtype, and parameters, or just

"(application|audio|font|example|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_`|~-]+))/([0-9A-Za-z!#$%&'*+.^_`|~-]+)"

omitting parameters. Note that these could be made more forward-compatible (and less strict) by allowing any token for type (as RFC 7231 does) rather than limiting to "application", "audio", etc.

In practice you may want to additionally limit inputs to IANA Registered Media Types or mailcap or specific types appropriate for your application based on intended use.

like image 140
Kevinoid Avatar answered Jan 08 '23 01:01

Kevinoid