Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract image src using JSoup

Tags:

jsoup

I am trying to extract all the image url's from this webpage using jsoup? Can anyone offer help on how to do it? All the tags are formatted like this, but I only need the src image, not the ajaxsrc:

<IMG ajaxsrc="/pics32/160/MP/MPYXBXTSYVKAKJQ.20110918032436.jpg" src="http://image.cdnllnwnl.xosnetwork.com/pics32/160/MP/MPYXBXTSYVKAKJQ.20110918032436.jpg">

Here is the link: http://www.ncataggies.com/PhotoAlbum.dbml?DB_OEM_ID=24500&PALBID=417884

Is this the format?

        Document doc = null;
    try {
        doc = Jsoup.connect(articleLink).timeout(10000).get(); 
    } catch (IOException ioe) {
        return null;
    }
    Element content = doc.getElementById("div.thumb-image preview");
    Elements links = content.getElementsByAttribute("IMG");
    for (Element link : links) {
      String source = link.attr("src");
      Elements imageLinks = link.getElementsByAttribute(source);
      for(Element imageLink: imageLinks){
          //imageLink = picture link?
      }

}

That doesn't seem to be it. I have print statements in my code, and they aren't getting hit.

like image 882
Johnny Rocket Avatar asked May 04 '12 23:05

Johnny Rocket


People also ask

How to get image src in jsoup?

Element imageElement = document. select("img"). first(); String absoluteUrl = imageElement. absUrl("src"); //absolute URL on src String srcValue = imageElement.

What is Jsoup parse?

Description. The parse(String html) method parses the input HTML into a new Document. This document object can be used to traverse and get details of the html dom.

Can Jsoup parse JavaScript?

Jsoup parses the source code as delivered from the server (or in this case loaded from file). It does not invoke client-side actions such as JavaScript or CSS DOM manipulation.


1 Answers

You should be able to do something like this to get all img tags:

for (Element e : doc.select("img")) {
    System.out.println(e.attr("src"));
}

This should select all img tags and then grab the src attribute and print to the console.

like image 198
B. Anderson Avatar answered Nov 09 '22 02:11

B. Anderson