Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

iTextSharp exception: PDF header signature not found

Tags:

c#

.net

pdf

itext

I'm using iTextSharp to read the contents of PDF documents:

PdfReader reader = new PdfReader(pdfPath);
using (StringWriter output = new StringWriter())
{
    for (int i = 1; i <= reader.NumberOfPages; i++)
        output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()));

    reader.Close();
    pdfText = output.ToString();
}

99% of the time it works just fine. However, there is this one PDF file that will sometimes throw this exception:

PDF header signature not found. StackTrace: at
iTextSharp.text.pdf.PRTokeniser.CheckPdfHeader() at
iTextSharp.text.pdf.PdfReader.ReadPdf() at
iTextSharp.text.pdf.PdfReader..ctor(String filename, Byte[]> ownerPassword) at
Reader.PDF.DownloadPdf(String url) in

What's annoying is that I can't always reproduce the error. Sometimes it works, sometimes it doesn't. Has anyone encountered this problem?

like image 921
broke Avatar asked May 16 '12 15:05

broke


2 Answers

After some research, I've found that this problem relates to either a file being corrupted during PDF generation, or an error related to an object in the document that doesn't conform to the PDF standard as implemented in iTextSharp. It also seems to happen only when you read from a PDF file from disk.

I have not found a complete solution to the problem, but only a workaround. What I've done is read the PDF document using the PdfReader itextsharp object and see if an error or exception happens before reading the file in a normal operation.

So running something similar to this:

private bool IsValidPdf(string filepath)
{
    bool Ret = true;

    PdfReader reader = null;

    try
    {
        reader = new PdfReader(filepath);
    }
    catch
    {
        Ret = false;
    }

    return Ret;
}
like image 76
Anonymous coward Avatar answered Oct 31 '22 14:10

Anonymous coward


I found it was because I was calling new PdfReader(pdf) with the PDF stream position at the end of the file. By setting the position to zero it resolved the issue.

Before:

// Throws: InvalidPdfException: PDF header signature not found.
var pdfReader = new PdfReader(pdf);

After:

// Works correctly.
pdf.Position = 0;
var pdfReader = new PdfReader(pdf);
like image 37
Bern Avatar answered Oct 31 '22 15:10

Bern