How can I read PDF content with the itextsharp with the Pdfreader class. My PDF may include Plain text or Images of the text.
What is ITextSharp? iTextSharp is a free and open source assembly that helps to convert page output or HTML content in a PDF file. Now add that DLL in the application. Getting Started: Start Visual Studio and create a new website in ASP.Net and add these 2 DLLs to the solution.
Itextsharp is an advanced tool library which is used for creating complex pdf repors. itext is used by different techonologies -- Android , . NET, Java and GAE developer use it to enhance their applications with PDF functionality.
A Chunk is the smallest significant piece of text that you can work with.
using iTextSharp.text.pdf; using iTextSharp.text.pdf.parser; using System.IO;  public string ReadPdfFile(string fileName) {     StringBuilder text = new StringBuilder();      if (File.Exists(fileName))     {         PdfReader pdfReader = new PdfReader(fileName);          for (int page = 1; page <= pdfReader.NumberOfPages; page++)         {             ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();             string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);              currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));             text.Append(currentText);         }         pdfReader.Close();     }     return text.ToString(); } 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With