<div id="a">This is some <div id="b">text</div> </div>
Getting "This is some" is non-trivial. For instance, this returns "This is some text":
driver.find_element_by_id('a').text
How does one, in a general way, get the text of a specific element without including the text of it's children?
(I'm providing an answer below but will leave the question open in case someone can come up with a less hideous solution).
The Selenium WebDriver interface has predefined the getText() method, which helps retrieve the text for a specific web element. This method gets the visible, inner text (which is not hidden by CSS) of the web-element.
We can get text from a webelement with Selenium webdriver. The getText() methods obtains the innerText of an element. It fetches the text of an element which is visible along with its sub elements. It ignores the trailing and leading spaces.
text() and contains methods text(): A built-in method in Selenium WebDriver that is used with XPath locator to locate an element based on its exact text value. contains(): Similar to the text() method, contains() is another built-in method used to locate an element based on partial text match.
Here's a general solution:
def get_text_excluding_children(driver, element): return driver.execute_script(""" return jQuery(arguments[0]).contents().filter(function() { return this.nodeType == Node.TEXT_NODE; }).text(); """, element)
The element passed to the function can be something obtained from the find_element...()
methods (i.e. it can be a WebElement
object).
Or if you don't have jQuery or don't want to use it you can replace the body of the function above above with this:
return self.driver.execute_script(""" var parent = arguments[0]; var child = parent.firstChild; var ret = ""; while(child) { if (child.nodeType === Node.TEXT_NODE) ret += child.textContent; child = child.nextSibling; } return ret; """, element)
I'm actually using this code in a test suite.
In the HTML which you have shared:
<div id="a">This is some <div id="b">text</div> </div>
The text This is some
is within a text node. To depict the text node in a structured way:
<div id="a"> This is some <div id="b">text</div> </div>
To extract and print the text This is some
from the text node using Selenium's python client you have 2 ways as follows:
Using splitlines()
: You can identify the parent element i.e. <div id="a">
, extract the innerHTML
and then use splitlines()
as follows:
using xpath:
print(driver.find_element_by_xpath("//div[@id='a']").get_attribute("innerHTML").splitlines()[0])
using xpath:
print(driver.find_element_by_css_selector("div#a").get_attribute("innerHTML").splitlines()[0])
Using execute_script()
: You can also use the execute_script()
method which can synchronously execute JavaScript in the current window/frame as follows:
using xpath and firstChild:
parent_element = driver.find_element_by_xpath("//div[@id='a']") print(driver.execute_script('return arguments[0].firstChild.textContent;', parent_element).strip())
using xpath and childNodes[n]:
parent_element = driver.find_element_by_xpath("//div[@id='a']") print(driver.execute_script('return arguments[0].childNodes[1].textContent;', parent_element).strip())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With