Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup: Extracting innertext from anchor tag

Here's my problem. I have a html content: innerText I need to extract the "innerText". While trying this in Jsoup I found that the innertext goes outside the anchor tag when parsed by Jsoup.

Here's my code

Document doc=Jsoup.parse("<div>  <a href="#"> innerText  </a> </div>");
System.out.println(doc.html());

output:

<html>
 <head></head>
 <body>
  <div >
   <a href="#"></a>innerText
  </div>
 </body>
</html>

why is "innerText" moved outside the anchor tag?

like image 661
Santhosh Thamaraiselvan Avatar asked Feb 23 '15 08:02

Santhosh Thamaraiselvan


1 Answers

You can access the text by calling the text()method on the element.

Document doc = Jsoup.parse("<div>  <a href=\"#\"> innerText  </a> </div>");
System.out.println(doc.html());
Elements rows = doc.getElementsByTag("a");
for (Element element : rows) {
    System.out.println("element = " + element.text());
}

btw. Using your posted code (and JSoup 1.8.1) produces the following output

<html>
    <head></head>
    <body>
        <div> 
            <a href="#"> innerText </a> 
        </div>
    </body>
</html>
like image 133
SubOptimal Avatar answered Oct 11 '22 07:10

SubOptimal