I am using nutch 2.3. All jobs run one after other i.e. first generator, fetch, parse, index etc. I want to run some jobs simultaneously. I know some jobs cannot run in parallel but other can e.g parse job, dbupdate, indexjob should be run with fetch.
Is it possible ? My basic objective is to run fetcher job all the time. I suppose that we can do it with different timestamp. Can anyone guide me the proper way ?
Go to your Github repository b. Go to .github /workflow/*.yml c. In order to run the jobs in parallel, we have to define the “n” number of jobs in our .yml file.
There is a possible chance that during any stage of nutch (fetch parse index etc.), network probelm occur or power shutdown happen. How I can resume previous incomplete job.
The --ntasks option is set to 28, so at most 28 tasks can be run simultaneously. #!/bin/sh # This script outputs some useful information so we can see what parallel # and srun are doing. sleepsecs= $ [ ( $RANDOM % 10 ) + 10 ] s # $1 is arg1: {1} from GNU parallel. # # $PARALLEL_SEQ is a special variable from GNU parallel.
The parallel program executes tasks simultaneously until all tasks have been completed.
If you check out the nutch web app server, you will find out that it can execute multiple crawl job in parallel.You should check out the source code of the Nutch 2.3 for webapp[NutchUiServer]. Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With