Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PDF Text search C# [closed]

I have requirement to read a pdf file and search for a text. I should display in which page that text exist and the number of occurances. I can read the pdf to text but i need to know the page number.

Thanks

like image 926
dps123 Avatar asked Feb 04 '11 02:02

dps123


1 Answers

You can use Docotic.Pdf for this (I work for Bit Miracle).

Here is a sample for how to search text in PDF:

PdfDocument doc = new PdfDocument("file.pdf");
string textToSearch = "some text";
for (int i = 0; i < doc.Pages.Count; i++)
{
    string pageText = doc.Pages[i].GetText();
    int count = 0;
    int lastStartIndex = pageText.IndexOf(textToSearch, 0, StringComparison.CurrentCultureIgnoreCase);
    while (lastStartIndex != -1)
    {
        count++;
        lastStartIndex = pageText.IndexOf(textToSearch, lastStartIndex + 1, StringComparison.CurrentCultureIgnoreCase);
    }

    if (count != 0)
        Console.WriteLine("Page {0}: '{1}' found {2} times", i, textToSearch, count);
}

You may want to remove third argument for IndexOf method if you want to perform case-sensitive search.

like image 131
Bobrovsky Avatar answered Oct 04 '22 03:10

Bobrovsky