Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I join data from two MongoDB collections in Python?

I'm making a mini twitter clone in Flask + MongoDB (w/ pymongo) as a learning exercise and I need some help joining data from two collections. I know and understand joins can't be done in MongoDB, that's why I'm asking how do it in Python.

I have a collection to store user information. Documents look like this:

{
    "_id" : ObjectId("51a6c4e3eedc89e34ee46e32"),
    "email" : "[email protected]",
    "message" : [
        ObjectId("51a6c5e1eedc89e34ee46e36")
    ],
    "pw_hash" : "alexhash",
    "username" : "alex",
    "who_id" : [
        ObjectId("51a6c530eedc89e34ee46e33"),
        ObjectId("51a6c54beedc89e34ee46e34")
    ],
    "whom_id" : [ ]
}

and another collection to store messages (tweets):

{
    "_id" : ObjectId("51a6c5e1eedc89e34ee46e36"),
    "author_id" : ObjectId("51a6c4e3eedc89e34ee46e32"),
    "text" : "alex first twit",
    "pub_date" : ISODate("2013-05-30T03:22:09.462Z")
}

As you can see, the message contains a reference to the user's "_id" in "author_id" in the message document and vice versa for the message's "_id" in "message" in the user document.

Basically, what I want to do is take every message's "author_id", get the corresponding username from the user collection and make a new dictionary containing the "username" + "text" + "pub_date". With that, I could easily render the data in my Jinja2 template.

I have the following code that sorta do what I want:

def getMessageAuthor():
    author_id = []
    # get a list of author_ids for every message
    for author in coll_message.find():
        author_id.append(author['author_id'])
    # iterate through every author_ids to find the corresponding username
    for item in author_id:
        message = coll_message.find_one({"author_id": item}, {"text": 1, "pub_date": 1})
        author = coll_user.find_one({"_id": item}, {"username": 1})
        merged = dict(chain((message.items() + author.items())))

Output looks this:

{u'username': u'alex', u'text': u'alex first twit', u'_id': ObjectId('51a6c4e3eedc89e34ee46e32'), u'pub_date': datetime.datetime(2013, 5, 30, 3, 22, 9, 462000)}

Which is exactly what I want.

The code doesn't work though because I'm doing .find_one() so I always get the first message even if a user has two or more. Doing .find() might resolve this issue, but .find() returns a cursor and not a dictionary like .find_one(). I haven't figured out how to convert cursors to the same dictionary format as the output from .find_one() and merge them to get the same output as above.

This is where I'm stuck. I don't know how I should proceed to fix this. Any help is appreciated.

Thank you.

like image 251
alexferl Avatar asked Oct 22 '22 08:10

alexferl


1 Answers

Append ("_id", "author_id") so that this id is used to retrive the corresponding message as expected and author_id to get username.

You just need unique key to do that :

def getMessageAuthor():
    author_id = []
    # get a list of ids and author_ids for every message
    for author in coll_message.find():
        author_id.append( (author['_id'], author['author_id']))
    # iterate through every author_ids to find the corresponding username
    for id, item in author_id:
        message = coll_message.find_one({"_id": id}, {"text": 1, "pub_date": 1})
        author = coll_user.find_one({"_id": item}, {"username": 1})
        merged = dict(chain((message.items() + author.items())))
like image 72
Zangetsu Avatar answered Oct 24 '22 10:10

Zangetsu