pymongo MongoClient can not work in multiprocess?

Tags:

I am using pymongo 3.2, I want to use it in multiporcess:

client = MongoClient(JD_SEARCH_MONGO_URI, connect=False)
db = client.jd_search

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    for jd in db['sample_data'].find():
        jdId = jd["jdId"]
        for cv in db["sample_data"].find():
            itemId = cv["itemId"]
            executor.submit(intersect_compute, jdId, itemId)
            #  print "done {} => {}".format(jdId, itemId)

but I get error:

UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#using-pymongo-with-multiprocessing>

according to the documentation, I have set connect to False as you can see

304

asked Jan 14 '16 06:01

roger

1 Answers

You did exactly like in the documentation (URL in exception), but in the section Never do this.
p.s. I updated your code sample at the end of the comment.

Create connection to database inside each process:

# Each process creates its own instance of MongoClient.
def func():
    db = pymongo.MongoClient().mydb
    # Do something with db.

proc = multiprocessing.Process(target=func)
proc.start()

Never do this:

client = pymongo.MongoClient()

# Each child process attempts to copy a global MongoClient
# created in the parent process. Never do this.
def func():
  db = client.mydb
  # Do something with db.

proc = multiprocessing.Process(target=func)
proc.start()

What you need to change is to move database connection initialization to a fork of each process. As each of them will have own independent connection.

Your sample updated:

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    client = MongoClient(JD_SEARCH_MONGO_URI, connect=False)
    db = client.jd_search

    for jd in db['sample_data'].find():
        jdId = jd["jdId"]
        for cv in db["sample_data"].find():
            itemId = cv["itemId"]
            executor.submit(intersect_compute, jdId, itemId)
            #  print "done {} => {}".format(jdId, itemId)

answered Sep 25 '22 18:09

wowkin2

Related questions
                            
                                Use existing authenticated session from browser to perform https request on python
                            
                                pycharm code autocomplete work only in python console but not on python file
                            
                                Python / Flask / MongoEngine DateTimeField
                            
                                Correct way to convert a string to the proper type for an NDB property?
                            
                                Failing to convert Pandas dataframe timestamp
                            
                                How to get the signature parameters of a callable, or reliably determine when this is not possible?
                            
                                Flask CRUD programming without SQLAlchemy or other ORM
                            
                                NSException with Tkinter on mac
                            
                                Parallel python iteration
                            
                                References to mutables (e.g., lists) as values in Python dictionaries - what is best practice?
                            
                                GMail API - Get last message of a thread
                            
                                Django: Override Setting used in AppConfig Ready Function
                            
                                What is the correct way of passing parameters to stats.friedmanchisquare based on a DataFrame?
                            
                                Virtualenv: Command not found
                            
                                How to make Django Queryset that selects records with max value within a group
                            
                                C++ and cython - Seeking a design pattern that avoids template limitations
                            
                                Error using callback in Python
                            
                                How to get world coordinates from screen coordinates in Vispy
                            
                                OpenCV findContours in python
                            
                                django __init__ method causing argument error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pymongo MongoClient can not work in multiprocess?

Tags:

python

mongodb

multiprocessing

pymongo