Ultimately I am just trying to get the href of the first link to google's search result The information I need also exists in an 'a' element, but it is stored in a 'data-href' attribute, which I could not figure how to extract the data from (<code>get_attribute('data-href')</code> returns <code>None</code>). I am using Phantomjs, but have also tried with Firefox web driver <hr> The href is displayed in a <code>cite</code> tag in a google search (which can be found by inspecting the small green link text under each link in google search results). The cite element is apparently found with Selenium, but the text returned (<code>element.text</code>, or <code>get_attribute('innerHTML')</code>, or (<code>text</code>)) is not what is shown in the html. For instance, there is a cite tag <code><cite class="_Rm">www.fcv.org.br/</cite></code>, but <code>element.text</code> shows “wikimapia.org/.../Fundação-Cristiano-Varella-Hospital...” I have tried to retrieve the cite element with <code>by_css_selector</code>, <code>tag_name</code>, <code>class_name</code>, and xpath with the same results. <pre class="prettyprint"><code>links = driver.find_elements_by_css_selector('div.g') # div[class="g"] link = links[0] # I am looking for the first link in the main links section next = link.find_element_by_css_selector('div[class="s"]') # location of cite tag nextB = next.find_element_by_tag_name('cite') </code></pre> div containing cite tag (there is only one in the div) <pre class="prettyprint"><code> <div class="s"> <div> <div class="f kv _SWb" style="white-space:nowrap"> <cite class="_Rm">www.fcv.org.br/</cite> </code></pre>

Try this one: <pre class="prettyprint"><code>public class GoogleSearchPage { // locators @FindBy(id = "lst-ib") private WebElement searchInputBox; @FindBy(name = "btnG") private WebElement searchButton; @FindBy(id = "ires") private WebElement searchResultContainer; By searchResultHeader = By.tagName("h3"); // perform search action with the given text public void searchText(String text) { searchInputBox.sendKeys(text); searchButton.click(); } public List<String> readSearchResults() { List<WebElement> searchResults = searchResultContainer .findElements(searchResultHeader); List<String> searchResultsHeaderText = new ArrayList<String>(); int size = searchResults.size(); for (int i = 0; i < size; i++) { searchResultsHeaderText.add(searchResults.get(i).getText()); } return searchResultsHeaderText; } } </code></pre> complete source: https://github.com/jagdeepjain/ui-automation-testng

How to extract a Google link's href from search results with Selenium?

Tags:

python

selenium

phantomjs

Ultimately I am just trying to get the href of the first link to google's search result

The information I need also exists in an 'a' element, but it is stored in a 'data-href' attribute, which I could not figure how to extract the data from (get_attribute('data-href') returns None).

I am using Phantomjs, but have also tried with Firefox web driver

The href is displayed in a cite tag in a google search (which can be found by inspecting the small green link text under each link in google search results).

The cite element is apparently found with Selenium, but the text returned (element.text, or get_attribute('innerHTML'), or (text)) is not what is shown in the html.

For instance, there is a cite tag <cite class="_Rm">www.fcv.org.br/</cite>, but element.text shows “wikimapia.org/.../Fundação-Cristiano-Varella-Hospital...”

I have tried to retrieve the cite element with by_css_selector, tag_name, class_name, and xpath with the same results.

links = driver.find_elements_by_css_selector('div.g') # div[class="g"]
link = links[0] # I am looking for the first link in the main links section
next = link.find_element_by_css_selector('div[class="s"]') # location of cite tag
nextB = next.find_element_by_tag_name('cite')

div containing cite tag (there is only one in the div)

    <div class="s">
         <div>
             <div class="f kv _SWb" style="white-space:nowrap">
                  <cite class="_Rm">www.fcv.org.br/</cite>

882

asked Feb 06 '16 12:02

Phillip

2 Answers

Find the first a element inside every search result and get it's href attribute value:

from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get("https://www.google.com/search?q=test")

results = driver.find_elements_by_css_selector('div.g')
link = results[0].find_element_by_tag_name("a")
href = link.get_attribute("href")

Then you can extract the actual url from the href value with urlparse:

import urlparse

print(urlparse.parse_qs(urlparse.urlparse(href).query)["q"])

Prints:

[u'http://www.speedtest.net/']

answered Sep 23 '22 02:09

alecxe

Try this one:

public class GoogleSearchPage {
    // locators
    @FindBy(id = "lst-ib")
    private WebElement searchInputBox;
    @FindBy(name = "btnG")
    private WebElement searchButton;
    @FindBy(id = "ires")
    private WebElement searchResultContainer;
    By searchResultHeader = By.tagName("h3");

    // perform search action with the given text
    public void searchText(String text) {
        searchInputBox.sendKeys(text);
        searchButton.click();
    }

    public List<String> readSearchResults() {
        List<WebElement> searchResults = searchResultContainer
                .findElements(searchResultHeader);

        List<String> searchResultsHeaderText = new ArrayList<String>();
        int size = searchResults.size();
        for (int i = 0; i < size; i++) {
            searchResultsHeaderText.add(searchResults.get(i).getText());
        }
        return searchResultsHeaderText;
    }

}

complete source: https://github.com/jagdeepjain/ui-automation-testng

answered Sep 23 '22 02:09

Jagdeep

Related questions
                            
                                Where can I find more information about new syntax supported in Google style docstrings with the napoleon extension of sphinx-doc?
                            
                                Pygame. How do I resize a surface and keep all objects within proportionate to the new window size?
                            
                                Type error: unhashable type 'list' while selecting subset from specific columns pandas dataframe
                            
                                generate multiple lists with one function
                            
                                merge and sum two dataframes where columns match python pandas
                            
                                Long to wide data. Pandas
                            
                                re.split with spaces in python
                            
                                Why is numpy list access slower than vanilla python?
                            
                                Environmental path to Python not working?
                            
                                OCaml map a string to a list of strings
                            
                                Decoding Ebcdic
                            
                                Drop multi-indexed rows of a DataFrame based on 'AND' condition between levels
                            
                                PILKit was unable to import the Python Imaging Library
                            
                                Removing columns which has only "nan" values from a NumPy array
                            
                                how to copy an array into a bigger array(partial copy)
                            
                                Using StatsModels to plot quantile regression for 2nd order polynomial
                            
                                Vagrant not installing pip during provision
                            
                                Custom iteration behavior in dict subclass
                            
                                Pylint complains "no value for argument 'cls'"
                            
                                How do I call the Google Vision API with an image stored in Google Cloud Storage?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With