Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search Particular Word in PDF using Itextsharp

Tags:

c#

pdf

itextsharp

This is my first post in StackOverflow.

I have a PDF file in my System drive... I want to write a program in C# using Itextsharp.dll reference to search for a Particular word in that PDF ... say I want to search "StackOverFlow"... If the PDF contains the Word " StackOverFlow" , it should return true.

Else it should return false.

I have looked into many articles but didn't get the solution till now ..:-(

What I have tried till now is :

public string ReadPdfFile(string fileName)
        {
            StringBuilder text = new StringBuilder();

            if (File.Exists(fileName))
            {
                PdfReader pdfReader = new PdfReader(fileName);

                for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                {
                    ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                    string currentText = "2154/MUM/2012 A";// PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

                    currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
                    text.Append(currentText);
                }
                pdfReader.Close();
            }
            return text.ToString();
        }

Thanks in advance, Sabya Dev

like image 753
user2553159 Avatar asked Jul 05 '13 09:07

user2553159


1 Answers

The following method works fine. It gives the list of pages in which the text is found.

     public  List<int> ReadPdfFile(string fileName, String searthText)
            {
                List<int> pages = new List<int>();
                if (File.Exists(fileName))
                {
                    PdfReader pdfReader = new PdfReader(fileName);
                    for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                    {
                        ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();

                        string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
                        if (currentPageText.Contains(searthText))
                        {
                            pages.Add(page);
                        }
                    }
                    pdfReader.Close();
                }
                return pages;
            }
like image 185
Lalitya Avatar answered Sep 21 '22 13:09

Lalitya