Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression for FileTypes

Tags:

python

regex

I'm using an application which has the following line in it:

ACCEPT_FILE_TYPES = re.compile('image/(gif|p?jpeg|(x-)?png)')

Obviously, it limits uploads to images of the specified extensions. But I plan to use it for uploading these formats (perhaps even more than that)

  • Microsoft Office files (.doc, .docx, .xls, .xlsx, etc.)
  • Adobe Reader (.pdf)
  • And probably archives (.rar, .zip, .7z)

I guess it needs to be rewritten to the following form:

ACCEPT_FILE_TYPES = re.compile('/(docx?|xlsx?|pdf|rar|zip|7z)')

Any help would be appreciated.

like image 973
user197171 Avatar asked Oct 06 '22 11:10

user197171


1 Answers

Those aren't file extensions that you're trying to match but MIME types.

The MIME types for common image formats happen to be quite straightforward, for example:

image/png
image/jpeg
image/gif

But most other types are not, but instead are using MIME types like these:

.pdf    application/pdf

.doc    application/msword
.xls    application/vnd.ms-excel

.rar    application/x-rar-compressed
.7z     application/x-7z-compressed
.zip    application/zip

.xlsx   application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
.xltx   application/vnd.openxmlformats-officedocument.spreadsheetml.template
.potx   application/vnd.openxmlformats-officedocument.presentationml.template
.ppsx   application/vnd.openxmlformats-officedocument.presentationml.slideshow
.pptx   application/vnd.openxmlformats-officedocument.presentationml.presentation
.sldx   application/vnd.openxmlformats-officedocument.presentationml.slide
.docx   application/vnd.openxmlformats-officedocument.wordprocessingml.document
.dotx   application/vnd.openxmlformats-officedocument.wordprocessingml.template

Note: These are only the most commonly used MIME types for the respective file formats. The IANA is the official authority for registering MIME types, but in the wild you'll encounter many different variations, depending on the programs that use them (Mail clients, browsers, web servers, ...).

So you shouldn't be matching them by using regular expressions, but instead maintain a registry of allowed MIME types (can be a simple Python list, or a dictionary if you want to really make sure and account for variants).

Read up on MIME types, check the IANA MIME Media Types list as an authoritative source for registered MIME types and use the Python mimetypes module to look up mimetypes by file extensior or vice versa.

like image 110
Lukas Graf Avatar answered Oct 09 '22 01:10

Lukas Graf