Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to verify that a posted file is pdf or not? [duplicate]

Tags:

c#

asp.net

pdf

The conserned website primary work is to accept files from users and save it. Every thing was fine till 2 months back when i was told to enforce a constraint to accept only pdf files.

Before that users were in the habit of submitting various formats from text,rtf to good pdf.

I applied the constraint by checking the file extention --simple right?? however when the admin checked those files some good 60% of the files were corrupt.

I spent many sleepless nights to determine the cause of curruption then suddenly i thought may be they are submitting corrupt files.

I took the previous records and determined the favourite format of file type of some users from whome we were getting corrupt files.

I changed the extention back to there favourite extention and boom.. the file opened.

what I came to know however dispite telling in bold to user how to convet there files to pdf some(many) were just changing the extention and submitting. Since the website rewards the users on no. of file submitted administration people are grunting at me. Is there any way i can check the file is pdf or not without relying on the extention??

I am using fileupload in c# 3.5 asp.net

like image 884
Ratna Avatar asked Apr 15 '13 11:04

Ratna


People also ask

How can I verify a PDF?

To start, drop your PDF/A file or upload it from your device or your cloud storage service. Select the level of conformance to check, or leave on "Auto-detection" if you do not know it, then click the button "Start validation."

How do you check if PDF has been edited?

Acrobat provides access to PDFs on your desktop. Select the PDF you want to check for changes. With your original PDF and the one you want to check for changes now appearing in their appropriate document boxes, click on the blue COMPARE button below. Acrobat creates report, indicating number of changes made.

How do you check the uploaded file type is PDF?

You can check the MIME type of the file using PHP's File Info Functions. If it returns with the type 'application/pdf' then it should be a PDF. The File Info Functions were added in PHP 5.3 but previous to that you are able to use the mime_content_type function.

Why is the PDF file name different when opened?

Reason: The browser is simply reading the metadata that is saved in the actual PDF file as the document title, which may be different from the document file name. You may confirm this by opening the PDF file in Adobe Acrobat Reader > navigate to File > Properties.


2 Answers

As all PDF files start with the ASCII string "%PDF-", simply test the first few bytes of the file to ensure that they start with this string.

bool IsPdf(string path)
{
    var pdfString = "%PDF-";
    var pdfBytes = Encoding.ASCII.GetBytes(pdfString);
    var len = pdfBytes.Length;
    var buf = new byte[len];
    var remaining = len;
    var pos = 0;
    using(var f = File.OpenRead(path))
    {
        while(remaining > 0)
        {
            var amtRead = f.Read(buf, pos, remaining);
            if(amtRead == 0) return false;
            remaining -= amtRead;
            pos += amtRead;
        }
    }
    return pdfBytes.SequenceEqual(buf);
}
like image 99
spender Avatar answered Oct 24 '22 19:10

spender


I've found this site very useful in helping to determine if a file matches its extension. It's a huge list of file signatures that you can use with spender's code.

like image 42
khelmar Avatar answered Oct 24 '22 19:10

khelmar