Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup selector on RSS <link> tag returns empty string with .text() method

Tags:

java

rss

jsoup

I'm using jsoup to parse an rss feed using java. I'm having problems getting a result when trying to select the first <link> element in the document.

When I use title.text() I get an expected result with this code:

Document doc = Jsoup.connect(BLOG_URL).get();
Element title = doc.select("rss channel title").first();
System.out.println(title.text()); // print the blog title...

However, link.text() doesn't work the same way:

Element link = doc.select("rss channel link").first();
System.out.println(link.text()); // prints empty string

When I inspect doc.select("rss channel link") the Element link object is populated but the .println() statement is just an empty string.

What makes .select("rss channel link") so dang special that I can't figure out how to use it?

Edit: The RSS response begins like this:

   <?xml version="1.0" encoding="UTF-8"?>
    <rss>
    <channel>
    <title>The Blog Title</title>
    <link>http://www.the.blog/category</link>
like image 596
Pat Grady Avatar asked Feb 10 '23 15:02

Pat Grady


2 Answers

Your rss feed is XML, not HTML. For this to work, you must tell JSoup to use its XMLParser. This will work:

String rss = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
  +"<rss><channel>"
  +  "<title>The Blog Title</title>"
  +  "<link>http://www.the.blog/category</link>"
  +"</channel></rss>";

Document doc = Jsoup.parse(rss, "", Parser.xmlParser());

Element link = doc.select("rss channel link").first();
System.out.println(link.text()); // prints empty string

Explanation:

The link tag in HTML follows a different format and Jsoup tries to interpret the <link> of your rss as such html tag.

like image 111
luksch Avatar answered Feb 12 '23 06:02

luksch


Refer here. Jsoup added this XmlParser.

try {
    String xml = "<rss></rss><channel></channel><link>http://www.the.blog/category</link><title>The Blog Title</title>";
    Document doc = Jsoup.parse(xml, "", Parser.xmlParser());

    Element title = doc.select("title").first();
    System.out.println(title.text());

    Element link = doc.select("link").first();
    System.out.println(link.text());
} catch (Exception e) {
    e.printStackTrace();
}
like image 25
Wilts C Avatar answered Feb 12 '23 06:02

Wilts C