Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract a Google link's href from search results with Selenium?

Ultimately I am just trying to get the href of the first link to google's search result

The information I need also exists in an 'a' element, but it is stored in a 'data-href' attribute, which I could not figure how to extract the data from (get_attribute('data-href') returns None).

I am using Phantomjs, but have also tried with Firefox web driver


The href is displayed in a cite tag in a google search (which can be found by inspecting the small green link text under each link in google search results).

The cite element is apparently found with Selenium, but the text returned (element.text, or get_attribute('innerHTML'), or (text)) is not what is shown in the html.

For instance, there is a cite tag <cite class="_Rm">www.fcv.org.br/</cite>, but element.text shows “wikimapia.org/.../Fundação-Cristiano-Varella-Hospital...”

I have tried to retrieve the cite element with by_css_selector, tag_name, class_name, and xpath with the same results.

links = driver.find_elements_by_css_selector('div.g') # div[class="g"]
link = links[0] # I am looking for the first link in the main links section
next = link.find_element_by_css_selector('div[class="s"]') # location of cite tag
nextB = next.find_element_by_tag_name('cite') 

div containing cite tag (there is only one in the div)

    <div class="s">
         <div>
             <div class="f kv _SWb" style="white-space:nowrap">
                  <cite class="_Rm">www.fcv.org.br/</cite>
like image 882
Phillip Avatar asked Feb 06 '16 12:02

Phillip


People also ask

How to find the href of a link in selenium?

The href is displayed in a cite tag in a google search (which can be found by inspecting the small green link text under each link in google search results). The cite element is apparently found with Selenium, but the text returned ( element.text, or get_attribute ('innerHTML'), or ( text )) is not what is shown in the html.

How to read Google search results in selenium?

How to Read Google Search Results in Selenium: Open Google web browser Search for – “top 10 python books” Grab all the google search URL results related to the above search Print the results on the console

How are links presented in Google search results?

How links are presented in Google search results can depend on your browser. Some browsers will show the direct links when you right click and copy them while others will show long URLs with loads of junk data. This is decided by your browser’s user agent string.

How to get all links to Google search results using beautifulsoup?

All elements are found using BeautifulSoup command .find_all () where we specify element and class as an inputs. For every search result we obtained, we need to extract hyperlink which is stored as href attribute of <a> element. We now have all the code blocks required to obtain the links to google search results.


2 Answers

Find the first a element inside every search result and get it's href attribute value:

from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get("https://www.google.com/search?q=test")

results = driver.find_elements_by_css_selector('div.g')
link = results[0].find_element_by_tag_name("a")
href = link.get_attribute("href")

Then you can extract the actual url from the href value with urlparse:

import urlparse

print(urlparse.parse_qs(urlparse.urlparse(href).query)["q"])

Prints:

[u'http://www.speedtest.net/']
like image 60
alecxe Avatar answered Sep 23 '22 02:09

alecxe


Try this one:

public class GoogleSearchPage {
    // locators
    @FindBy(id = "lst-ib")
    private WebElement searchInputBox;
    @FindBy(name = "btnG")
    private WebElement searchButton;
    @FindBy(id = "ires")
    private WebElement searchResultContainer;
    By searchResultHeader = By.tagName("h3");

    // perform search action with the given text
    public void searchText(String text) {
        searchInputBox.sendKeys(text);
        searchButton.click();
    }

    public List<String> readSearchResults() {
        List<WebElement> searchResults = searchResultContainer
                .findElements(searchResultHeader);

        List<String> searchResultsHeaderText = new ArrayList<String>();
        int size = searchResults.size();
        for (int i = 0; i < size; i++) {
            searchResultsHeaderText.add(searchResults.get(i).getText());
        }
        return searchResultsHeaderText;
    }

}

complete source: https://github.com/jagdeepjain/ui-automation-testng

like image 42
Jagdeep Avatar answered Sep 23 '22 02:09

Jagdeep