Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get visible text of page

How do I get the visible text portion of a web page with selenium webdriver without the HTML tags?

I need something equivalent to the function HtmlPage.asText() from Htmlunit.

It is not enough to take the text with the function WebDriver.getSource and parse it with jsoup because there could be in the page hidden elements (by external CSS) which I am not interested in them.

like image 874
David Michael Gang Avatar asked Aug 20 '13 13:08

David Michael Gang


People also ask

How to get visible text in Selenium?

To get the text of the visible on the page we can use the method findElement(By. tagname()) method to get hold of . Next can then use the getText() method to extract text from the body tag. WebElement l=driver.

How to return the body text of webpage?

Doing By. tagName("body") (or some other selector to select the top element), then performing getText() on that element will return all of the visible text.

What is get visible text in UiPath?

UiPath. Core. Activities. GetVisibleText Extracts a string and its information from an indicated UI element using the Native screen scraping method. This activity can also be automatically generated when performing screen scraping, along with a...

How do you extract text from a Web page using Selenium and save it as a text file?

We can extract text from a webpage using Selenium webdriver and save it as a text file using the getText method. It can extract the text for an element which is displayed (and not hidden by CSS).


2 Answers

Doing By.tagName("body") (or some other selector to select the top element), then performing getText() on that element will return all of the visible text.

like image 138
Nathan Merrill Avatar answered Sep 23 '22 10:09

Nathan Merrill


I can help you with C# Selenium.

By using this you can select all the text on that particular page and save it to a text file at your preferred location.

Make sure you are using this stuff:

using System.IO; using System.Text; using OpenQA.Selenium; using OpenQA.Selenium.Support.UI; 

After reaching the particular page try using this code.

IWebElement body = driver.FindElement(By.TagName("body")); var result = driver.FindElement(By.TagName("body")).Text;  // Folder location var dir = @"C:Textfile" + DateTime.Now.ToShortDateString();  // If the folder doesn't exist, create it if (!Directory.Exists(dir)) Directory.CreateDirectory(dir);  // Creates a file copiedtext.txt with all the contents on the page. File.AppendAllText(Path.Combine(dir, "Copiedtext.txt"), result); 
like image 27
Anuraj S.L Avatar answered Sep 19 '22 10:09

Anuraj S.L