Ultimately I am just trying to get the href of the first link to google's search result
The information I need also exists in an 'a' element, but it is stored in a 'data-href' attribute, which I could not figure how to extract the data from (get_attribute('data-href')
returns None
).
I am using Phantomjs, but have also tried with Firefox web driver
The href is displayed in a cite
tag in a google search (which can be found by inspecting the small green link text under each link in google search results).
The cite element is apparently found with Selenium, but the text returned (element.text
, or get_attribute('innerHTML')
, or (text
)) is not what is shown in the html.
For instance, there is a cite tag <cite class="_Rm">www.fcv.org.br/</cite>
, but element.text
shows “wikimapia.org/.../Fundação-Cristiano-Varella-Hospital...”
I have tried to retrieve the cite element with by_css_selector
, tag_name
, class_name
, and xpath with the same results.
links = driver.find_elements_by_css_selector('div.g') # div[class="g"]
link = links[0] # I am looking for the first link in the main links section
next = link.find_element_by_css_selector('div[class="s"]') # location of cite tag
nextB = next.find_element_by_tag_name('cite')
div containing cite tag (there is only one in the div)
<div class="s">
<div>
<div class="f kv _SWb" style="white-space:nowrap">
<cite class="_Rm">www.fcv.org.br/</cite>
The href is displayed in a cite tag in a google search (which can be found by inspecting the small green link text under each link in google search results). The cite element is apparently found with Selenium, but the text returned ( element.text, or get_attribute ('innerHTML'), or ( text )) is not what is shown in the html.
How to Read Google Search Results in Selenium: Open Google web browser Search for – “top 10 python books” Grab all the google search URL results related to the above search Print the results on the console
How links are presented in Google search results can depend on your browser. Some browsers will show the direct links when you right click and copy them while others will show long URLs with loads of junk data. This is decided by your browser’s user agent string.
All elements are found using BeautifulSoup command .find_all () where we specify element and class as an inputs. For every search result we obtained, we need to extract hyperlink which is stored as href attribute of <a> element. We now have all the code blocks required to obtain the links to google search results.
Find the first a
element inside every search result and get it's href
attribute value:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("https://www.google.com/search?q=test")
results = driver.find_elements_by_css_selector('div.g')
link = results[0].find_element_by_tag_name("a")
href = link.get_attribute("href")
Then you can extract the actual url from the href
value with urlparse
:
import urlparse
print(urlparse.parse_qs(urlparse.urlparse(href).query)["q"])
Prints:
[u'http://www.speedtest.net/']
Try this one:
public class GoogleSearchPage {
// locators
@FindBy(id = "lst-ib")
private WebElement searchInputBox;
@FindBy(name = "btnG")
private WebElement searchButton;
@FindBy(id = "ires")
private WebElement searchResultContainer;
By searchResultHeader = By.tagName("h3");
// perform search action with the given text
public void searchText(String text) {
searchInputBox.sendKeys(text);
searchButton.click();
}
public List<String> readSearchResults() {
List<WebElement> searchResults = searchResultContainer
.findElements(searchResultHeader);
List<String> searchResultsHeaderText = new ArrayList<String>();
int size = searchResults.size();
for (int i = 0; i < size; i++) {
searchResultsHeaderText.add(searchResults.get(i).getText());
}
return searchResultsHeaderText;
}
}
complete source: https://github.com/jagdeepjain/ui-automation-testng
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With