I have problem to crawling my site...there is a form with two drop-down lists....and when I start crawl , the crawler fetch only part of links from form....from first drop-down list it takes part of options, as from second drop-down....I try change some configurations in nutch-defaults.xml file, but everything is the same...
I change
fetcher.threads.per.queue 1 - 10
db.ignore.internal.links true - false
db.ignore.external.links false - true
http.content.limit 65536 - 65536000
file.content.limit 65536 - 65536000
db.update.max.inlinks 10.000 - 100.000
is there any other option, that can help me to crawl all options in my form......?? Thanks for answers.
Sorry, too low rep to post comment!!!
Have you got a link.
Also are the drop downs ajax or something fancy. Nutch from memory will only crawl what is on the page. I.e. if you load the first 10 on page load and the only load the rest with a service when the user scrolls I believe it can't find that.
Some more info would be good re the page....
Cheers Robin
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With