Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a PDF File using iText5 for .NET

Tags:

c#

pdf

itext

I'm using C# as programming platform and iTextSharp to read PDF content. I have used the below code to read the content but it seems it read per page.

        public string ReadPdfFile(object Filename)
        {

            string strText = string.Empty;
            try
            {
                PdfReader reader = new PdfReader((string)Filename);

                for (int page = 1; page <= reader.NumberOfPages; page++)
                {
                    ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
                    String s = PdfTextExtractor.GetTextFromPage(reader, page, its);

                    s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
                    strText = strText + s;

                }
                reader.Close();
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.Message);
            }
            return strText;
        }

Can anyone help me on how can I write a code reading pdf content per line?

like image 418
Mark Avatar asked Dec 09 '11 07:12

Mark


1 Answers

Try this, use the LocationTextExtractionStrategy instead of the SimpleTextExtractionStrategy it will add new line characters to the text returned. Then you can use strText.Split('\n') to split your text into a string[] and consume it on a per line basis.

like image 103
Jonathan Avatar answered Nov 10 '22 20:11

Jonathan