Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a PDF file line by line in c#?

In my windows 8 application, I would like to read a PDF line by line then I would like to assign a String array. How can I do it?

    public StringBuilder addd= new StringBuilder();
    string[] array;

    private async void btndosyasec_Click(object sender, RoutedEventArgs e)
    {
        FileOpenPicker openPicker = new FileOpenPicker();
        openPicker.ViewMode = PickerViewMode.List;
        openPicker.SuggestedStartLocation = PickerLocationId.PicturesLibrary;
        openPicker.FileTypeFilter.Add(".pdf");

        StorageFile file = await openPicker.PickSingleFileAsync();



        if (file != null)
        {

            PdfReader reader = new PdfReader((await file.OpenReadAsync()).AsStream());

            for (int page = 1; page <= reader.NumberOfPages; page++)
            {

                addd.Append(PdfTextExtractor.GetTextFromPage(reader, page));
                string tmp= PdfTextExtractor.GetTextFromPage(reader, page);

                array[page] = tmp.ToString();

                reader.Close();
            }
        }
    }
like image 582
Deniz Avatar asked Aug 21 '14 11:08

Deniz


2 Answers

Hi I had this problem too, I used this code, it worked.

You will need a reference to the iTextSharp lib.

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;

PdfReader reader = new PdfReader(@"D:\test pdf\Blood Journal.pdf");
int intPageNum = reader.NumberOfPages;
string[] words;
string line;

    for (int i = 1; i <= intPageNum; i++)
    {
        text = PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy());

        words = text.Split('\n');
        for (int j = 0, len = words.Length; j < len; j++)
        {
            line = Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(words[j]));
        }
    }

words array contains lines of pdf file

like image 185
mansureh Avatar answered Oct 07 '22 21:10

mansureh


Below code work for iText7

using iText.Kernel.Pdf;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf.Canvas.Parser.Listener;


public void ExtractTextFromPDF(string filePath)
{
    PdfReader pdfReader = new PdfReader(filePath);
    PdfDocument pdfDoc = new PdfDocument(pdfReader);

    for (int page = 1; page <= pdfDoc.GetNumberOfPages(); page++)
    {
        ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
        string pageContent = PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page), strategy);

        Console.WriteLine("pageContent : " + pageContent);
    }
    pdfDoc.Close();
    pdfReader.Close();
}
like image 21
Somnath Kadam Avatar answered Oct 07 '22 19:10

Somnath Kadam