Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing the html meta tag with jsoup library

Just started exploring the Jsoup library as i will use it for one of my projects. I tried googling but i could not find the exact answer that can help me. Here is my problem, i have an html file with meta tags like below

<meta content="this is the title value" name="d.title">
<meta content="this is the description value" name="d.description">
<meta content="language is french" name="d.language">

And a java pojo like so,

public class Example {
    private String title;
    private String description;
    private String language;

    public Example() {}

    // setters and getters go here
} 

Now i want to parse the html file and extract the d.title content value and store in Example.title and d.description value of "content" and store in Example.description and so on and so forth.

What i have done by reading jsoup cookbook is somethink like,

Document doc = Jsoup.parse("test.html");
Elements metaTags = doc.getElementsByTag("meta");

for (Element metaTag : metaTags) {
    String content = metaTag.attr("content");
    String content = metaTag.attr("name");
}

what that will do is walk through all meta tags get the value of their "content" and "name" attributes, but what i want is to get the value of "content" attribute whose "name" attribute is say "d.title" so that i can store it in Example.title

Update: @P.J.Meisch answer below actually sovles the problem but that is too much code for my liking(was trying to avoid doing the exact same thing). I mean i was thinking it could be possible to do something like

String title = metaTags.getContent("d.title")

where d.title is the value of the "name" attribute That way it will reduce the lines of code, i have not found such a method but maybe that is because am still new to jsoup thats why i asked. But if such a method does not exist(which would be nice if it did cuz it makes life easier) i would just go with P.J.Meisch said.

like image 589
ivange Avatar asked Jun 02 '16 12:06

ivange


People also ask

What is jsoup parse?

jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.

How do I find metadata in HTML?

If you want to find out whether a given page is using meta tags, just right-click anywhere on the page and select “View Page Source.” A new tab will open in Chrome (in Firefox, it'll be a pop-up window). The part at the top, or “head” of the page, is where the meta tags would be.


1 Answers

ok, all the code:

Document doc = Jsoup.parse("test.html");
Elements metaTags = doc.getElementsByTag("meta");

Example ex = new Example();

for (Element metaTag : metaTags) {
  String content = metaTag.attr("content");
  String name = metaTag.attr("name");

  if("d.title".equals(name)) {
    ex.setTitle(content);
  }
  if("d.description".equals(name)) {
    ex.setDescription(content);
  }
  if("d.language".equals(name)) {
    ex.setLanguage(content);
  }
}
like image 138
P.J.Meisch Avatar answered Sep 22 '22 08:09

P.J.Meisch