Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Word/Excel files corrupted when downloading from PHP

Tags:

php

xlsx

docx

I'm building a simple file upload/file download functionality into my database. The only complicated part is that all files need to be encrypted using my fancy-shmancy encryption methods.

So what I do is make an SQL entry that stores things like: id_file, filename, extension, size, dateadded, etc

Then once I've got the id_file I take the file contents, encrypt them, then save the contents to my server as [id_file].txt.

Then here's the code for downloading the file again:

header("Pragma: public");
header('Content-Disposition: attachment;filename="'.$file['name'].'.'.$file['extension'].'"');
header('Cache-Control: max-age=0');

echo someFunctionIMadeForGettingAndDecryptingFileContents($_GET['id_file']);

exit;

Really simple stuff and works PERFECTLY for all file types EXCEPT .docx and .xlsx. When downloading .docx or .xlsx files Office gives me an error saying "Word found unreadable content in "NAME OF FILE". Do you want to recover the contents of this document? If you trust the source... bla bla" I then click 'Yes'. It thinks a bit, and the file opens up just fine. But obviously I can't have my clients using this if they're going to get that error every time.

The code I've written works perfectly for all other file types. Even .doc, .xls, and .zip files work fine.

My first thought was to look at the headers. I've tried all sorts of solutions like the ones listed here:

why my downloaded file is alwayes damaged or corrupted? PHP downloading excel file becomes corrupt

Those didn't work.

I know an issue can be with extra padding or white space being added to the file. But if I upload a .txt file and then download it again... I can see that there isn't anything extra being added.

If I MD5 the original file (good.docx) and the downloaded version of the original file (bad.docx), the hashes ARE different.

If I change good.docx to good.zip and unzip the archive. Then do the same for bad.docx. Then MD5 both directories, the hashes are the SAME. And I've hashed each file inside good.zip and bad.zip and each file hash is the same.

Also to note, elsewhere on my server I use PHPWord and PHPExcel to generate Office files dynamically and those files all download great. The headers/code I use for PHPExcel are:

header("Pragma: public");
header('Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet');
header('Content-Disposition: attachment;filename="'.$filename.'.xlsx"');
header('Cache-Control: max-age=0');
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel2007');
$objWriter->save('php://output');
exit;

(Yes, I've tried using the "Content-Type" header on my other code above but that didn't help.)

I've also tried saving the file on my server, downloading it, and opening it. I get the same error when going through that process. Here is the code I used to do that:

$f=fopen("/myPath/temp.docx","w");
fwrite($f,someFunctionIMadeForGettingAndDecryptingFileContents($_GET['id_file']));
fclose($f);
exit;

I've tried creating an empty Word file called "blank.docx". Then made it so instead of the function save a new file.... it replaces the contents of blank.docx with the decrypted file contents. But when downloading blank.docx after that process I get all the same... an error but it eventually opens. None of the file properties (like Template: Normal.dotm) that were originally on blank.docx are there on the served modified blank.docx.

I'm using Office 2007

UPDATE

Here is a link to download the good (original) version of a file: http://empowerdb.org/good.docx

And here is a link to download the bad (processed) version of the file: http://empowerdb.org/bad.docx

SOLUTION

As Mr. Llama pointed out below, my encryption function was lopping off some extra null bytes. But it turned out the culprit wasn't as obvious as you'd think. Here's my encryption:

trim(base64_encode(IV.mcrypt_encrypt(MCRYPT_RIJNDAEL_128,ENCKEY,$contents,MCRYPT_MODE_CBC,IV)))

The problem wasn't with trim() or with base64_encode(). It was with the mcrypt function. The way I solved this was before passing my file contents to get encrypted I did another base64_encode(). So like this...

$file_contents_encrypted=base64_encode(myEncryptionFunction($file_contents));

And of course the reverse upon decryption.

The base64_encode is technically being run twice. But I can see how it's needed to be run in this case BEFORE mcrypt because of the unique zip like format of .docx and .xlsx

like image 805
rgbflawed Avatar asked Jul 10 '14 15:07

rgbflawed


1 Answers

Your decryption function is lopping off null bytes at the end of files.

The good.docx file ends with four 0x00 bytes, while the bad.docx file ends with none. Aside from those missing bytes, the files are identical.

$ wc -c good.docx
25123 good.docx

$ wc -c bad.docx
25119 bad.docx

$ tail -c 32 good.docx | od -x
0000000 6666 6365 7374 782e 6c6d 4b50 0605 0000
0000020 0000 0010 0010 041c 0000 5df1 0000 0000

$ tail -c 32 bad.docx | od -x
0000000 7469 4568 6666 6365 7374 782e 6c6d 4b50
0000020 0605 0000 0000 0010 0010 041c 0000 5df1

If you skip the last four bytes of good.docx, the md5 sums match exactly:

$ head -c -4 good.docx | md5sum
fbd32fbcc02d62dfd8bd39d390252a4b *-

$ cat bad.docx | md5sum
fbd32fbcc02d62dfd8bd39d390252a4b *-
like image 89
Mr. Llama Avatar answered Oct 08 '22 15:10

Mr. Llama