I'm using C# as programming platform and iTextSharp
to read PDF content. I have used the below code to read the content but it seems it read per page.
public string ReadPdfFile(object Filename)
{
string strText = string.Empty;
try
{
PdfReader reader = new PdfReader((string)Filename);
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText = strText + s;
}
reader.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
return strText;
}
Can anyone help me on how can I write a code reading pdf content per line?
Try this, use the LocationTextExtractionStrategy
instead of the SimpleTextExtractionStrategy
it will add new line characters to the text returned. Then you can use strText.Split('\n')
to split your text into a string[]
and consume it on a per line basis.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With