Maximum number of Apache Nutch worker instances

1 Answers

Not clear what you mean by crawler instances. If you want to run the crawl script several times in parallel e.g. you have distinct crawls with separate configs, seeds etc... then they will compete for slots on the Hadoop cluster. It will then boil down to how many mapper / reducer slots are available on your cluster, which itself depends on how many slaves are there.

Handling multiple Nutch crawls in parallel can get very tricky and resource inefficient. Instead re-think your architecture so that all the logical crawlers could run as a single physical one or have a look at StormCrawler, which should be a better fit for doing this.

170

answered Oct 14 '22 06:10

Julien Nioche

Related questions
                            
                                Hadoop in windows : file not found exception
                            
                                Hadoop: intermediate merge failed
                            
                                Hadoop dfs error : INFO ipc.Client: Retrying connect to server: localhost
                            
                                Writing files in hdfs in C++ (libhdfs)
                            
                                InvalidRequestException(why:empid cannot be restricted by more than one relation if it includes an Equal)
                            
                                Accumulo high speed ingest options
                            
                                How to find the right portion between hadoop instance types
                            
                                Hive unit testing on windows without hadoop setup
                            
                                Can I use Hadoop with AWS4-HMAC-SHA256?
                            
                                Hadoop multiple outputs with speculative execution
                            
                                hadoop 2.4.0 streaming generic parser options using TAB as separator

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Maximum number of Apache Nutch worker instances

Tags:

hadoop

nutch

Sanaz Marshall

People also ask

1 Answers

Julien Nioche

Recent Activity

Donate For Us