Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSOUP find all images in HTML file with ALT attribute?

Tags:

java

html

jsoup

Hi I am relatively new to Java but I am hoping to write a class that will find all the ALT (image) attributes in a HTML file using JSOUP. I am hoping to get an error message printed if there is no alt text on an image and if there is to remind users to check it.

import java.io.File;
import org.jsoup.Jsoup;
import org.jsoup.parser.Parser;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.TextNode;
import org.jsoup.select.Elements;


public class grabImages {
                File input = new File("...HTML");
                Document doc = Jsoup.parse(input, "UTF-8", "file:///C:...HTML");

                Elements img = doc.getElementsByTag("img"); 
                Elements alttext = doc.getElementsByAttribute("alt");

                 for (Element el : img){
                     if(el.attr("img").contains("alt")){
                         System.out.println("is the alt text relevant to the image? ");
                         }

                         else { System.out.println("no alt text found on image");
                         }
                    }

}       
like image 911
SOR Avatar asked Sep 05 '13 10:09

SOR


2 Answers

I think your logic was a little off.

For example: Here you are trying to load the 'img' attribute of the 'img' tag...

el.attr("img") 

Here's my implementation of the program. You should be able to alter it for your own needs.

 public class Controller {

        public static void main(String[] args) throws IOException {

            // Connect to website. This can be replaced with your file loading implementation
            Document doc = Jsoup.connect("http://www.google.co.uk").get();

            // Get all img tags
            Elements img = doc.getElementsByTag("img");

            int counter = 0;

            // Loop through img tags
            for (Element el : img) {
                // If alt is empty or null, add one to counter
                if(el.attr("alt") == null || el.attr("alt").equals("")) {
                    counter++;
                }
                System.out.println("image tag: " + el.attr("src") + " Alt: " + el.attr("alt"));
            }
            System.out.println("Number of unset alt: " + counter);

        }

    }
like image 157
Ben Dale Avatar answered Nov 14 '22 21:11

Ben Dale


public class grabImages {
      public static void main(String[] args) {
         Document doc;
     try {
         doc = Jsoup.connect("...HTML").get();
         Elements img = doc.getElementsByTag("img"); 

          for (Element el : img){
                                if(el.hasAttr("alt")){
                                    System.out.println("is the alt text relevant to the image? ");
                                }
                                else { 
                                    System.out.println("no alt text found on image");
                                }
                               }
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
               }
}

el.hasAttr("alt") will give 'alt' attr is there or not.

for more informatiom http://jsoup.org/cookbook/extracting-data/example-list-links

like image 31
NagarajSM Avatar answered Nov 14 '22 23:11

NagarajSM