Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup parsing - parsing multiple links simultaneously

Tags:

java

jsoup

I have a program which is able to get html document from the list of sites one by one and then parse it.

ArrayList<String> links = new ArrayList<>();

for(String link : links) {
    try {
        Document doc = Jsoup.connect(link).get();
        getInfo(doc);
    }catch (IOException e) {
        e.printStackTrace();
    }
}

Problem is, that it's taking to long to get html documents like, site1 then site2 and site3 ..

My question is, is it possible to make that this code would connect to 5 links at same time and then parse them instead of one by one.

like image 809
user5564882 Avatar asked Sep 18 '25 21:09

user5564882


1 Answers

Yes.

Probably the most simple one with Java8 is to use a parallel stream

ArrayList<String> links = new ArrayList<>();

links.parallelStream().forEach(link -> {
  try {
    Document doc = Jsoup.connect(link).get();
    getInfo(doc);
  }catch (IOException e) {
    e.printStackTrace();
  }
});

Of course there are many alternative ways, including threads, executor pools, etc - just use google searching for concurrency, threads and whatnot.

like image 123
Gerald Mücke Avatar answered Sep 20 '25 11:09

Gerald Mücke