Is there a way in jsoup to extract an image absolute url, much like one can get a link's absolute url?
Consider the following image element found in http://www.example.com/
<img src="images/chicken.jpg" width="60px" height="80px">
I would like to get http://www.example.com/images/chicken.jpg
. What should I do?
With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.
jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.
Once you have the image element, e.g.:
Element image = document.select("img").first(); String url = image.absUrl("src"); // url = http://www.example.com/images/chicken.jpg
Alternatively:
String url = image.attr("abs:src");
Jsoup has a builtin absUrl() method on all nodes to resolve an attribute to an absolute URL, using the base URL of the node (which could be different from the URL the document was retrieved from).
See also the Working with URLs jsoup documentation.
Document doc = Jsoup.connect("www.abc.com").get(); Elements img = doc.getElementsByTag("img"); for (Element el : img) { String src = el.absUrl("src"); System.out.println("Image Found!"); System.out.println("src attribute is : "+src); getImages(src); }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With