Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse BeautifulSoup element into Selenium

I want to get the source code of a website using selenium; find a particular element using BeautifulSoup; and then parse it back into selenium as a selenium.webdriver.remote.webelement object. Like so:

driver.get("www.google.com")
soup = BeautifulSoup(driver.source)
element = soup.find(title="Search")

element = Selenium.webelement(element)
element.click()

How can I achieve this?

like image 697
Darth Ludius Avatar asked Jun 22 '16 23:06

Darth Ludius


People also ask

Can you use Beautiful Soup with Selenium?

The combination of Beautiful Soup and Selenium will do the job of dynamic scraping. Selenium automates web browser interaction from python. Hence the data rendered by JavaScript links can be made available by automating the button clicks with Selenium and then can be extracted by Beautiful Soup.

Is Selenium better than Beautiful Soup faster?

Developers should keep in mind some drawbacks when using Selenium for their web scraping projects. The most noticeable disadvantage is that it's not as fast as Beautiful Soup's HTTPS requests.

How to parse a website using selenium and beautifulsoup in Python?

We can parse a website using Selenium and Beautiful Soup in Python. Web Scraping is a concept used to extract content from the web pages, used extensively in Data Science and metrics preparation. In Python, it is achieved with the BeautifulSoup package. To have BeautifulSoup along with Selenium, we should run the command −

How do I extract information from a page in beautifulsoup?

The get_page () function below fetches a web page by URL, decodes it to UTF-8, and parses it into a BeautifulSoup object using the HTML parser. Once we have a BeautifulSoup object, we can start extracting information from the page. BeautifulSoup provides many find functions to locate elements inside the page and drill down deep nested elements.

How to get full page HTML code in selenium?

The driver.page_source will return the full page HTML code. Locating data on a website is one of the main use cases for Selenium, either for a test suite (making sure that a specific element is present/absent on the page) or to extract the data and save it for further analysis (web scraping).

How to select elements on the page using selenium API?

There are many methods available in the Selenium API to select elements on the page. You can use: As usual, the easiest way to locate an element is to open your Chrome dev tools and inspect the element that you need.


2 Answers

A general solution that worked for me is to compute the xpath of the bs4 element, then use that to find the element in selenium,

xpath = xpath_soup(soup_element)
selenium_element = driver.find_element_by_xpath(xpath)

...

import itertools

def xpath_soup(element):
    """
    Generate xpath of soup element
    :param element: bs4 text or node
    :return: xpath as string
    """
    components = []
    child = element if element.name else element.parent
    for parent in child.parents:
        """
        @type parent: bs4.element.Tag
        """
        previous = itertools.islice(parent.children, 0, parent.contents.index(child))
        xpath_tag = child.name
        xpath_index = sum(1 for i in previous if i.name == xpath_tag) + 1
        components.append(xpath_tag if xpath_index == 1 else '%s[%d]' % (xpath_tag, xpath_index))
        child = parent
    components.reverse()
    return '/%s' % '/'.join(components)
like image 59
Rob Hawkins Avatar answered Oct 19 '22 01:10

Rob Hawkins


from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get("http://www.google.com")
soup = BeautifulSoup(driver.page_source, 'html.parser')
search_soup_element = soup.find(title="Search")
input_element = soup.select('input.gsfi.lst-d-f')[0]

search_box = driver.find_element(by='name', value=input_element.attrs['name'])
search_box.send_keys('Hello World!')
search_box.send_keys(Keys.RETURN)

This pretty much works. I can see reason for working with both webdriver and BeautifulSoup but not necessarily for this example.

like image 29
Brian A Avatar answered Oct 19 '22 01:10

Brian A