I crawl few sites with Apache Nutch 2.1.
While crawling I see the following message on lot of pages:
ex. Skipping http://www.domainname.com/news/subcategory/111111/index.html; different batch id (null).
What causes this error ?
How can I resolve this problem, because the pages with different batch id (null) are not stored in database.
The site that I crawled is based on drupal, but i have tried on many others non drupal sites.
I think, the message is not problem. batch_id not assigned to all of url. So, if batch_id is null , skip url. Generate url when batch_id assined for url.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With