Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading PDF content with itextsharp dll in VB.NET or C#

How can I read PDF content with the itextsharp with the Pdfreader class. My PDF may include Plain text or Images of the text.

like image 286
user221185 Avatar asked Mar 31 '10 05:03

user221185


People also ask

What is the use of ITextSharp DLL?

What is ITextSharp? iTextSharp is a free and open source assembly that helps to convert page output or HTML content in a PDF file. Now add that DLL in the application. Getting Started: Start Visual Studio and create a new website in ASP.Net and add these 2 DLLs to the solution.

What is ITextSharp in C#?

Itextsharp is an advanced tool library which is used for creating complex pdf repors. itext is used by different techonologies -- Android , . NET, Java and GAE developer use it to enhance their applications with PDF functionality.

What is chunk in ITextSharp?

A Chunk is the smallest significant piece of text that you can work with.


1 Answers

using iTextSharp.text.pdf; using iTextSharp.text.pdf.parser; using System.IO;  public string ReadPdfFile(string fileName) {     StringBuilder text = new StringBuilder();      if (File.Exists(fileName))     {         PdfReader pdfReader = new PdfReader(fileName);          for (int page = 1; page <= pdfReader.NumberOfPages; page++)         {             ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();             string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);              currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));             text.Append(currentText);         }         pdfReader.Close();     }     return text.ToString(); } 
like image 87
ShravankumarKumar Avatar answered Sep 19 '22 10:09

ShravankumarKumar