In Python Celery, how do I persist objects across consecutive worker calls?




I'm using Celery to automate some screen scraping. I'm using Selenium to open up a Chrome webdriver, manipulate the page, save some data, and then move on to the next page in the queue. The problem is that it builds up and breaks down the web driver for every task in the queue, which is very time consuming and resource intensive.

How do I persist objects across calls? I've read some things about connection pooling in Celery, but it's not clear to me how exactly this works - where do I build up the webdriver - in the tasks file or in the main queueing file? If the latter, how do the workers know which webdriver to use?



for page in list:  
  scrape.delay(str(row['product_id']), str(row['pg_code']))


def scrape:
  # do some stuff
1 Answers

Since each worker instantiates the task as a singleton, you can cache the web driver in the task object. The documentation specifically suggests this approach.


