I am trying to extract title and meta tag's description content from a URL, this is what I have:
fin[] //urls in a string array
for (int f = 0; f < fin.length; f++)
{
Document finaldoc = Jsoup.connect(fin[f]).get(); //fin[f] contains url at each instance
Elements finallink1 = finaldoc.select("title");
out.println(finallink1);
Elements finallink2 = finaldoc.select("meta");
out.println(finallink2.attr("name"));
out.println(fin[f]); //printing url at last
}
but it is not printing the title, and simply prints description as "description" and prints the url.
result :
description
plus.google.com
generator
en.wikipedia.org/wiki/google
description
earth.google.com
You can use this:
String getMetaTag(Document document, String attr) {
Elements elements = document.select("meta[name=" + attr + "]");
for (Element element : elements) {
final String s = element.attr("content");
if (s != null) return s;
}
elements = document.select("meta[property=" + attr + "]");
for (Element element : elements) {
final String s = element.attr("content");
if (s != null) return s;
}
return null;
}
Then:
String title = document.title();
String description = getMetaTag(document, "description");
if (description == null) {
description = getMetaTag(document, "og:description");
}
// and others you need to
String ogImage = getMetaTag(document, "og:image")
....
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With