How can I get crawler4j to download all links from a page more quickly?

Question

What I do is:
- crawl the page
- fetch all links of the page, puts them in a list
- start a new crawler, which visits each links of the list
- download them

There must be a quicker way, where I can download the links directly when I visit the page? Thx!

Yasser · Accepted Answer

crawler4j automatically does this process for you. You first add one or more seed pages. These are the pages that are first fetched and processed. crawler4j then extracts all the links in these pages and passes them to your shouldVisit function. If you really want to crawl all of them this function should just return true on all functions. If you only want to crawl pages within a specific domain you can check the URL and return true or false based on that.

Those URLs that your shouldVisit returns true, are then fetched by crawler threads and the same process is performed on them.

The example code here is a good sample for starting.

Those URLs that your shouldVisit returns true, are then fetched by crawler threads and the same process is performed on them.

The example code here is a good sample for starting.

How can I get crawler4j to download all links from a page more quickly?

Tags:

java

crawler4j

seinecle

1 Answers

Yasser

Recent Activity

Donate For Us

How can I get crawler4j to download all links from a page more quickly?

Tags:

java

crawler4j

seinecle

1 Answers

Yasser

Related questions

Recent Activity

Donate For Us