Selenium - driver.getPageSource() differs than the source viewed from browser

Tags:

I am trying to capture the source code from the URL specified into an HTML file using selenium, but I don't know why, I am not getting the exact source code which we see from the browser.

Below is my java code to capture the source in an HTML file

private static void getHTMLSourceFromURL(String url, String fileName) {

    WebDriver driver = new FirefoxDriver();
    driver.get(url);

    try {
        Thread.sleep(5000);   //the page gets loaded completely

        List<String> pageSource = new ArrayList<String>(Arrays.asList(driver.getPageSource().split("\n")));

        writeTextToFile(pageSource, originalFile);

    } catch (InterruptedException e) {
        e.printStackTrace();
    }

    System.out.println("quitting webdriver");
    driver.quit();
}

/**
 * creates file with fileName and writes the content
 * 
 * @param content
 * @param fileName
 */
private static void writeTextToFile(List<String> content, String fileName) {
    PrintWriter pw = null;
    String outputFolder = ".";
    File output = null;
    try {
        File dir = new File(outputFolder + '/' + "HTML Sources");
        if (!dir.exists()) {
            boolean success = dir.mkdirs();
            if (success == false) {
                try {
                    throw new Exception(dir + " could not be created");
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }

        output = new File(dir + "/" + fileName);
        if (!output.exists()) {
            try {
                output.createNewFile();
            } catch (IOException ioe) {
                ioe.printStackTrace();
            }
        }
        pw = new PrintWriter(new FileWriter(output, true));
        for (String line : content) {
            pw.print(line);
            pw.print("\n");
        }
    } catch (IOException ioe) {
        ioe.printStackTrace();
    } finally {
        pw.close();
    }

}

Can someone throw some light into this as to why this happens? How WebDriver renders the page? And how browser shows the source?

962

asked Oct 14 '13 10:10

3 Answers

There are several places where you can get the source from.You can try

String pageSource=driver.findElement(By.tagName("body")).getText();

and see what comes up.

Generally you do not need to wait for the page to load.Selenium does that automatically,unless you have separate sections of Javascript/Ajax.

You might want to add what are the differences that you are seeing, so that we can understand what you really mean.

Webdriver does not render the page on its own,it just renders it as the browser sees it.

116

answered Sep 27 '22 01:09

Madusudanan

I encountered the same problem. I use these code to solve it:

......
String javascript = "return arguments[0].innerHTML";
String pageSource=(String)(JavascriptExecutor)driver)
    .executeScript(javascript, driver.findElement(By.tagName("html")));
pageSource = "<html>"+pageSource +"</html>";
System.out.println(pageSource);
//FileUtils.write(new File("e:\\test.html"), pageSource,);
......

By using JavaScript code to get the innerHTML property, it finally works, and the question marks disappeared.

answered Sep 25 '22 01:09

The "source" code you get from Selenium seems to not be the source at all. It seems to be the HTML for the current DOM. The source code you see in the browser is the HTML as given by the server, before any dynamic changes made to it by JavaScript. If the DOM changes at all, the browser source code doesn't reflect those changes, but Selenium will. If you want to see the current DOM in a browser, you'd use the developer tools, not the source code.

answered Sep 27 '22 01:09

Indigenuity

Related questions
                            
                                Java 8 java.util.stream.Streams
                            
                                How to Run TestNG Tests on Jenkins
                            
                                Java Convert Unknown Primitive Array to Object Array
                            
                                Why the enum constants must be declared before any other variables and methods declaration in an enum type?
                            
                                Container of Generic Types in java
                            
                                JPA How can I get the generated id/object when using merge from parent but child is created?
                            
                                Android background jobs for synchronization with a web service
                            
                                Merge CSV files into a single file with no repeated headers
                            
                                How does streams in Java affect memory consumption?
                            
                                changing final variables through reflection, why difference between static and non-static final variable
                            
                                Localisation error 'is translated here but not in default locale'
                            
                                does %d in String.format() work for unsigned integers also?
                            
                                Putting two linkedlists together without copying - Java, Using standard API [duplicate]
                            
                                Unsafe publication concurrency java [duplicate]
                            
                                How a positive value becomes negative after casting byte in Java?
                            
                                Escape % symbol in a java string to apply String.format
                            
                                Difference between raw types and <?> in Generics
                            
                                Netbeans warning: Exporting non-public type through public API [closed]
                            
                                at scala project, compiler error - Cannot resolve symbol List?
                            
                                Rabbit Mq java client parallel consumption

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Selenium - driver.getPageSource() differs than the source viewed from browser

Tags:

java

firefox

selenium

selenium-webdriver

webdriver

roger_that

People also ask

3 Answers

Madusudanan

mikemelon

Indigenuity

Recent Activity

Donate For Us