I have a script that lets the user upload text files (PDF or doc) to the server, then the plan is to convert them to raw text. But until the file is converted, it's in its raw format, which makes me worried about viruses and all kinds of nasty things.
Any ideas what I need to do to minimize the risk of these unknown files. How to check if it's clean, or if it's even the format it claims to be and that it does not crash the server.
As I commented to Aerik but it's really the answer to the question.
If you have PHP >= 5.3 use finfo_file()
. If you have an older version of PHP you can use mime_content_type()
(less reliable) or load the Fileinfo extension from PECL.
Both of these functions return the mime type of the file (by looking at the type of data inside them). For PDF it should be
text/pdf
For a word doc it could be a few things. Generally it should be
application/msword
If your server is running *nix then make sure the files you're saving aren't executable. Even better: save them to a folder that isn't accessible by the web server. You can still write code to access the files but someone requesting a web page won't be able to access them at all.
If you've ever opened or executed any user-uploaded file on the server, you should expect that your server is now compromised.
Even a JPG can contain executable php. If you include
or require
the file in any way in your script, that can also compromise your server. An image you stumble upon on the web served like so...
header('Content-type: image/jpeg'); header('Content-Disposition: inline; filename="test.jpg"'); echo file_get_contents('/some_image.jpg'); echo '<?php phpinfo(); ?>';
... which you save and re-host on your own server like so...
$q = $_GET['q']; // pretend this is sanitized for the moment header('Content-type: '.mime_content_type($q)); header('Content-Disposition: inline; filename="'.$_GET['q'].'"'); include $q;
...will execute phpinfo()
on your server. Your site users can then simply save the image to their desktop and open it with notepad to see your server settings. Simply converting the file to another format will discard that script, and should not trigger any actual virus attached to the file.
It might also be best to do a virus search on upload. You should be able to do an inline system command to a checker and parse its output to see if it finds any. Your site users should be checking files they download anyway.
Otherwise, even a virus laiden user uploaded file just sitting there on your server shouldn't harm anything... as far as I know.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With