How can I trigger same Cloud Run job/service using different arguments?

Question

I'm trying to make a scrapy scraper work using cloud run. The main idea is that every 20 minutes a cloud scheduler cron should trigger the web scraper and get data from different sites. All sites have the same structure, so I would like to use same code and parallelize the execution of the scraping job, doing something like scrapy crawl scraper -a site=www.site1.com and scrapy crawl scraper -a site=www.site2.com.

I have already deployed a version of the scraper, but it only can do scrapy crawl scraper. How can I do that at execution the command's site change?

Also, should I be using cloud run job or service?

I have already deployed a version of the scraper, but it only can do scrapy crawl scraper. How can I do that at execution the command's site change?

Also, should I be using cloud run job or service?

guillaume blaquiere · Accepted Answer

According to that page of documentation, there is a trick.

Define a number of task, let's say, you set the number of task equal to the number of site to scrap. use the --task parameter for that
In your container (or in Cloud Storage, but if you do that, you have to download the file before starting the process), add a file with 1 website to scrap per line.
At runtime, use the CLOUD_RUN_TASK_INDEX environment variable. That variable indicate the number of the task in the execution. For each different number, pick a line in your file of websites (the number of the line equal to the env var value).

Like that, you can leverage Cloud Run jobs and parallelism.

The main tradeoff here is the static form of the websites list to scrap.

Mbrevda · Answer

You can pass in overrides. For example, when calling a job from the cloud cli sdk, you can pass args, which can contain alternative args to be passed to your script

How can I trigger same Cloud Run job/service using different arguments?

Tags:

python

scrapy

google-cloud-run

xerac

2 Answers

guillaume blaquiere

Mbrevda

Recent Activity

Donate For Us

How can I trigger same Cloud Run job/service using different arguments?

Tags:

python

scrapy

google-cloud-run

xerac

2 Answers

guillaume blaquiere

Mbrevda

Related questions

Recent Activity

Donate For Us