Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine if a byte[] is a pdf file

Is there any way of checking if a byte[] is a pdf without opening?

I have some code to display a list of byte[] as pdf thumbnails. I previously knew all the byte[] were pdf's because we filtered the servlet to only return these. Now the requirement has changed and I need to bring all file types back. Is there any way of checking what the byte[] is, or more specifically determining if it isn't, a pdf?

like image 544
rik Avatar asked May 31 '11 11:05

rik


People also ask

How can I tell if a byte array is PDF?

Check the first 4 bytes of the array. If those are 0x25 0x50 0x44 0x46 then it's most probably a PDF file.

How can I determine if a file is a PDF file?

If you've ever downloaded a printable form or document from the Web, such as an IRS tax form, there's a good chance it was a PDF file. Whenever you see a file that ends with . pdf, that means it's a PDF file.

How do I know if a file is PDF in Python?

Try the libmagic (the "file" command on the bash uses it). This does exactly the same check as in (1) Take a lib and try to read the page-count out of the file. If the lib is able to read a pagecount it should be a valid pdf.


4 Answers

Check the first 4 bytes of the array.

If those are 0x25 0x50 0x44 0x46 then it's most probably a PDF file.

like image 115
a_horse_with_no_name Avatar answered Oct 22 '22 18:10

a_horse_with_no_name


First four bytes should be: 0x25 0x50 0x44 0x46 (in hex format, in ASCII it's %PDF). "Magic numbers" for another formats you can find here

like image 42
chopikadze Avatar answered Oct 22 '22 17:10

chopikadze


As far as I know all PDF's start with %PDF, so you could check the first bytes against this string.

like image 37
DanielB Avatar answered Oct 22 '22 17:10

DanielB


While the marked answer and the other answers are correct, they will not be successful 100% of the time. The problem is the PDF spec says the %PDF-1.x only needs to be in the first 1024 bytes and not the first 4. Some programs will add information before %PDF and still be valid.

I would recommend seeing the answer for the following Stack Overflow question: How to detect if a file is PDF or TIFF?

like image 6
Consulting Mechanic Avatar answered Oct 22 '22 18:10

Consulting Mechanic