Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove script in link jsoup

Tags:

java

jsoup

I want to remove the script when reading url not file, please help me

  Document connect =  Jsoup.connect("http://www.tutorialspoint.com/ant/ant_deploying_applications.htm");
            Elements selects = connect.select("div.middle-col");
            System.out.println(selects.removeAttr("script").html());
like image 362
Long Than Avatar asked Feb 09 '23 20:02

Long Than


2 Answers

This is how you need to remove script element:

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class TestJsoup {
    public static void main(String args[]) throws IOException {
        Document doc = Jsoup.connect("http://www.tutorialspoint.com/ant/ant_deploying_applications.htm").get();

        Elements selects = doc.select("div.middle-col");
        for (Element script : selects) {
            Elements scripts = script.select("script");
            scripts.remove();
        }   
        System.out.println(selects.html());
    }
}
like image 137
Susheel Singh Avatar answered Feb 19 '23 21:02

Susheel Singh


Additionally, you can use Jsoup.Clean(html,white).

like image 36
Ma Tâm Avatar answered Feb 19 '23 21:02

Ma Tâm