How can I read a PDF file line by line
using iText5 for .NET?
I have search through the internet but I only found reading PDF file per page content.
Please see below code.
public string ReadPdfFile(object Filename)
{
string strText = string.Empty;
try
{
PdfReader reader = new PdfReader((string)Filename);
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText = strText + s;
}
reader.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
return strText;
}
Make Your Text Engaging and LegibleFind an appropriate, legible font and stick with it throughout. A sans serif type of font is likely your best option, as it makes for the most easily readable text. Since you can rely on the PDF file type to keep your formatting intact, make full use of bold and italic type.
pdf reader object has function getPage() which takes page number (starting form index 0) as argument and returns the page object. Page object has function extractText() to extract text from the pdf page.
Try this, use theLocationTextExtractionStrategy
instead of the SimpleTextExtractionStrategy
it will add new line characters to the text returned. Then you can use strText.Split('\n') to split your text into a string[]
and consume it on a per line basis.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With