Can Selenium verify text inside a PDF loaded by the browser?

2 Answers

While not natively supported, I have found a couple ways using the java driver. One way is to have the pdf open in your browser (having adobe acrobat installed) and then use keyboard shortcut keys to select all text (CTRL+A), then copy it to the clipboard (CTRL+C) and then you can verify the text in the clipboard. eg:

protected String getLastWindow() {
    return session().getEval("var windowId; for(var x in selenium.browserbot.openedWindows ){windowId=x;} ");
}

@Test
public void testTextInPDF() {
    session().click("link=View PDF");
    String popupName = getLastWindow();
    session().waitForPopUp(popupName, PAGE_LOAD_TIMEOUT);
    session().selectWindow(popupName);

    session().windowMaximize();
    session().windowFocus();
    Thread.sleep(3000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("65"); // Stands for A "ascii code for A"
    session().keyUpNative("17"); //Releases CTRL key
    Thread.sleep(1000);

    session().keyDownNative("17"); // Stands for CTRL key
    session().keyPressNative("67"); // Stands for C "ascii code for C"
    session().keyUpNative("17"); //Releases CTRL key

    TextTransfer textTransfer = new TextTransfer();
    assertTrue(textTransfer.getClipboardContents().contains("Some text in my pdf"));
}

Another way, still in java, is to download the pdf and then convert the pdf to text with PDFBox, see http://www.prasannatech.net/2009/01/convert-pdf-text-parser-java-api-pdfbox.html for an example on how to do this.

130

answered Sep 22 '22 04:09

AlexS

You cannot do this using WebDriver natively. However, PDFBox API can be used here to read content of PDF file. You will have to first of all shift a focus to browser window where PDF file is opened. You can then parse all the content of PDF file and search for the desired text string.

Here is a code to use PDFBox API to search within PDF document.

answered Sep 23 '22 04:09

Maharshi

Related questions
                            
                                fetching images from Amazon s3 gives CORS error. (Chrome issue)
                            
                                Getting console.log output from Firefox with Selenium
                            
                                Http Auth in a Firefox 3 bookmarklet
                            
                                What could make Firefox render an incorrect border width?
                            
                                Firefox select dropdown keeps refreshing/reverting to default option due to running Javascript - AngularJS
                            
                                How can I disable automatic gain control (AGC) in WebRTC web-apps such as Google Hangouts or OpenTokRTC
                            
                                No possibility to select text inside <input> when parent is draggable
                            
                                How to debug Node.js with Firefox?
                            
                                Modifying HTTP response headers in Firefox
                            
                                Read Excel data with JQuery
                            
                                Why is hover for input triggered on corresponding label in CSS?
                            
                                How debug placeholder pseudo element in firefox?
                            
                                How to revert Firebug to old version?
                            
                                Firefox: box-sizing and min-height
                            
                                page-break-after not working in flexboxes
                            
                                Xvfb & Docker - cannot open display
                            
                                How to exit the view source mode in vimperator?
                            
                                Disabling loading specific JavaScript files with Firefox
                            
                                Website is returning "CSI/tbsd_" and "CSI/_tbnd" errors in browsers
                            
                                Saving the manipulated DOM/HTML after editing it with Firebug

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can Selenium verify text inside a PDF loaded by the browser?

Tags:

pdf

firefox

testing

selenium

selenium-ide

Daniel Alexiuc

People also ask

2 Answers

AlexS

Maharshi

Recent Activity

Donate For Us