I'm making a mini twitter clone in Flask + MongoDB (w/ pymongo) as a learning exercise and I need some help joining data from two collections. I know and understand joins can't be done in MongoDB, that's why I'm asking how do it in Python.
I have a collection to store user information. Documents look like this:
{
"_id" : ObjectId("51a6c4e3eedc89e34ee46e32"),
"email" : "[email protected]",
"message" : [
ObjectId("51a6c5e1eedc89e34ee46e36")
],
"pw_hash" : "alexhash",
"username" : "alex",
"who_id" : [
ObjectId("51a6c530eedc89e34ee46e33"),
ObjectId("51a6c54beedc89e34ee46e34")
],
"whom_id" : [ ]
}
and another collection to store messages (tweets):
{
"_id" : ObjectId("51a6c5e1eedc89e34ee46e36"),
"author_id" : ObjectId("51a6c4e3eedc89e34ee46e32"),
"text" : "alex first twit",
"pub_date" : ISODate("2013-05-30T03:22:09.462Z")
}
As you can see, the message contains a reference to the user's "_id" in "author_id" in the message document and vice versa for the message's "_id" in "message" in the user document.
Basically, what I want to do is take every message's "author_id", get the corresponding username from the user collection and make a new dictionary containing the "username" + "text" + "pub_date". With that, I could easily render the data in my Jinja2 template.
I have the following code that sorta do what I want:
def getMessageAuthor():
author_id = []
# get a list of author_ids for every message
for author in coll_message.find():
author_id.append(author['author_id'])
# iterate through every author_ids to find the corresponding username
for item in author_id:
message = coll_message.find_one({"author_id": item}, {"text": 1, "pub_date": 1})
author = coll_user.find_one({"_id": item}, {"username": 1})
merged = dict(chain((message.items() + author.items())))
Output looks this:
{u'username': u'alex', u'text': u'alex first twit', u'_id': ObjectId('51a6c4e3eedc89e34ee46e32'), u'pub_date': datetime.datetime(2013, 5, 30, 3, 22, 9, 462000)}
Which is exactly what I want.
The code doesn't work though because I'm doing .find_one() so I always get the first message even if a user has two or more. Doing .find() might resolve this issue, but .find() returns a cursor and not a dictionary like .find_one(). I haven't figured out how to convert cursors to the same dictionary format as the output from .find_one() and merge them to get the same output as above.
This is where I'm stuck. I don't know how I should proceed to fix this. Any help is appreciated.
Thank you.
Append ("_id", "author_id") so that this id is used to retrive the corresponding message as expected and author_id to get username.
You just need unique key to do that :
def getMessageAuthor():
author_id = []
# get a list of ids and author_ids for every message
for author in coll_message.find():
author_id.append( (author['_id'], author['author_id']))
# iterate through every author_ids to find the corresponding username
for id, item in author_id:
message = coll_message.find_one({"_id": id}, {"text": 1, "pub_date": 1})
author = coll_user.find_one({"_id": item}, {"username": 1})
merged = dict(chain((message.items() + author.items())))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With