Just started exploring the Jsoup library as i will use it for one of my projects. I tried googling but i could not find the exact answer that can help me. Here is my problem, i have an html file with meta tags like below
<meta content="this is the title value" name="d.title">
<meta content="this is the description value" name="d.description">
<meta content="language is french" name="d.language">
And a java pojo like so,
public class Example {
private String title;
private String description;
private String language;
public Example() {}
// setters and getters go here
}
Now i want to parse the html file and extract the d.title content value and store in Example.title and d.description value of "content" and store in Example.description and so on and so forth.
What i have done by reading jsoup cookbook is somethink like,
Document doc = Jsoup.parse("test.html");
Elements metaTags = doc.getElementsByTag("meta");
for (Element metaTag : metaTags) {
String content = metaTag.attr("content");
String content = metaTag.attr("name");
}
what that will do is walk through all meta tags get the value of their "content" and "name" attributes, but what i want is to get the value of "content" attribute whose "name" attribute is say "d.title" so that i can store it in Example.title
Update: @P.J.Meisch answer below actually sovles the problem but that is too much code for my liking(was trying to avoid doing the exact same thing). I mean i was thinking it could be possible to do something like
String title = metaTags.getContent("d.title")
where d.title is the value of the "name" attribute That way it will reduce the lines of code, i have not found such a method but maybe that is because am still new to jsoup thats why i asked. But if such a method does not exist(which would be nice if it did cuz it makes life easier) i would just go with P.J.Meisch said.
jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.
If you want to find out whether a given page is using meta tags, just right-click anywhere on the page and select “View Page Source.” A new tab will open in Chrome (in Firefox, it'll be a pop-up window). The part at the top, or “head” of the page, is where the meta tags would be.
ok, all the code:
Document doc = Jsoup.parse("test.html");
Elements metaTags = doc.getElementsByTag("meta");
Example ex = new Example();
for (Element metaTag : metaTags) {
String content = metaTag.attr("content");
String name = metaTag.attr("name");
if("d.title".equals(name)) {
ex.setTitle(content);
}
if("d.description".equals(name)) {
ex.setDescription(content);
}
if("d.language".equals(name)) {
ex.setLanguage(content);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With