Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Selenium verify text inside a PDF loaded by the browser?

My web application loads a pdf in the browser. I have figured out how to check that the pdf has loaded correctly using:

verifyAttribute xpath=//embed/@src {URL of PDF goes here}

It would be really nice to be able to check the contents of the pdf with Selenium - for example verify that some text is present. Is there any way to do this?

like image 418
Daniel Alexiuc Avatar asked Aug 25 '10 05:08

Daniel Alexiuc


People also ask

Can you verify through PDF in Selenium?

To handle a PDF document in Selenium test automation, we can use a java library called PDFBox. Apache PDFBox is an open-source library that exclusively helps in handling the PDF documents. We can use it to verify the text present in the document, extract a specific section of text or image in the documents, and so on.

How do I validate the contents of a PDF?

Open the Preferences dialog box. Under Categories, select Signatures. For Verification, click More. To automatically validate all signatures in a PDF when you open the document, select Verify Signatures When The Document Is Opened.

How does Selenium detect presence of text?

New Selenium IDE We can use the getPageSource() method to fetch the full page source and then verify if the text exists there. This method returns content in the form of string. We can also check if some text exists with the help of findElements method with xpath locator.

How do you verify something in Selenium?

Verify in Selenium (also known as Soft Assertion) In a hard assertion, when the assertion fails, it terminates or aborts the test. If the tester does not want to terminate the script they cannot use hard assertions. To overcome this, one can use soft assertions.


2 Answers

While not natively supported, I have found a couple ways using the java driver. One way is to have the pdf open in your browser (having adobe acrobat installed) and then use keyboard shortcut keys to select all text (CTRL+A), then copy it to the clipboard (CTRL+C) and then you can verify the text in the clipboard. eg:

protected String getLastWindow() {
    return session().getEval("var windowId; for(var x in selenium.browserbot.openedWindows ){windowId=x;} ");
}

@Test
public void testTextInPDF() {
    session().click("link=View PDF");
    String popupName = getLastWindow();
    session().waitForPopUp(popupName, PAGE_LOAD_TIMEOUT);
    session().selectWindow(popupName);

    session().windowMaximize();
    session().windowFocus();
    Thread.sleep(3000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("65"); // Stands for A "ascii code for A"
    session().keyUpNative("17"); //Releases CTRL key
    Thread.sleep(1000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("67"); // Stands for C "ascii code for C"
    session().keyUpNative("17"); //Releases CTRL key

    TextTransfer textTransfer = new TextTransfer();
    assertTrue(textTransfer.getClipboardContents().contains("Some text in my pdf"));
}

Another way, still in java, is to download the pdf and then convert the pdf to text with PDFBox, see http://www.prasannatech.net/2009/01/convert-pdf-text-parser-java-api-pdfbox.html for an example on how to do this.

like image 130
AlexS Avatar answered Sep 22 '22 04:09

AlexS


You cannot do this using WebDriver natively. However, PDFBox API can be used here to read content of PDF file. You will have to first of all shift a focus to browser window where PDF file is opened. You can then parse all the content of PDF file and search for the desired text string.

Here is a code to use PDFBox API to search within PDF document.

like image 27
Maharshi Avatar answered Sep 23 '22 04:09

Maharshi