How can I read PDF content with the itextsharp with the Pdfreader class. My PDF may include Plain text or Images of the text.
What is ITextSharp? iTextSharp is a free and open source assembly that helps to convert page output or HTML content in a PDF file. Now add that DLL in the application. Getting Started: Start Visual Studio and create a new website in ASP.Net and add these 2 DLLs to the solution.
Itextsharp is an advanced tool library which is used for creating complex pdf repors. itext is used by different techonologies -- Android , . NET, Java and GAE developer use it to enhance their applications with PDF functionality.
A Chunk is the smallest significant piece of text that you can work with.
using iTextSharp.text.pdf; using iTextSharp.text.pdf.parser; using System.IO; public string ReadPdfFile(string fileName) { StringBuilder text = new StringBuilder(); if (File.Exists(fileName)) { PdfReader pdfReader = new PdfReader(fileName); for (int page = 1; page <= pdfReader.NumberOfPages; page++) { ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy); currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText))); text.Append(currentText); } pdfReader.Close(); } return text.ToString(); }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With