I am trying to extract all the image url's from this webpage using jsoup? Can anyone offer help on how to do it? All the tags are formatted like this, but I only need the src image, not the ajaxsrc:
<IMG ajaxsrc="/pics32/160/MP/MPYXBXTSYVKAKJQ.20110918032436.jpg" src="http://image.cdnllnwnl.xosnetwork.com/pics32/160/MP/MPYXBXTSYVKAKJQ.20110918032436.jpg">
Here is the link: http://www.ncataggies.com/PhotoAlbum.dbml?DB_OEM_ID=24500&PALBID=417884
Is this the format?
Document doc = null;
try {
doc = Jsoup.connect(articleLink).timeout(10000).get();
} catch (IOException ioe) {
return null;
}
Element content = doc.getElementById("div.thumb-image preview");
Elements links = content.getElementsByAttribute("IMG");
for (Element link : links) {
String source = link.attr("src");
Elements imageLinks = link.getElementsByAttribute(source);
for(Element imageLink: imageLinks){
//imageLink = picture link?
}
}
That doesn't seem to be it. I have print statements in my code, and they aren't getting hit.
Element imageElement = document. select("img"). first(); String absoluteUrl = imageElement. absUrl("src"); //absolute URL on src String srcValue = imageElement.
Description. The parse(String html) method parses the input HTML into a new Document. This document object can be used to traverse and get details of the html dom.
Jsoup parses the source code as delivered from the server (or in this case loaded from file). It does not invoke client-side actions such as JavaScript or CSS DOM manipulation.
You should be able to do something like this to get all img tags:
for (Element e : doc.select("img")) {
System.out.println(e.attr("src"));
}
This should select all img tags and then grab the src attribute and print to the console.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With