Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading PDF per Line

Tags:

c#

pdf

itext

How can I read a PDF file line by line using iText5 for .NET? I have search through the internet but I only found reading PDF file per page content.

Please see below code.

public string ReadPdfFile(object Filename)
{

    string strText = string.Empty;
    try
    {
        PdfReader reader = new PdfReader((string)Filename);

        for (int page = 1; page <= reader.NumberOfPages; page++)
        {
            ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();

            String s = PdfTextExtractor.GetTextFromPage(reader, page, its);

            s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
            strText = strText + s;

        }
        reader.Close();
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
    }
    return strText;
}
like image 499
Bryan Avatar asked Dec 09 '11 08:12

Bryan


People also ask

How can I make a PDF easier to read?

Make Your Text Engaging and LegibleFind an appropriate, legible font and stick with it throughout. A sans serif type of font is likely your best option, as it makes for the most easily readable text. Since you can rely on the PDF file type to keep your formatting intact, make full use of bold and italic type.

How do I extract text from a PDF line by line in Python?

pdf reader object has function getPage() which takes page number (starting form index 0) as argument and returns the page object. Page object has function extractText() to extract text from the pdf page.


1 Answers

Try this, use theLocationTextExtractionStrategy instead of the SimpleTextExtractionStrategy it will add new line characters to the text returned. Then you can use strText.Split('\n') to split your text into a string[] and consume it on a per line basis.

like image 192
Jonathan Avatar answered Sep 30 '22 09:09

Jonathan