I have the following 4 dictionaries in a MongoDB collection called favoriteColors
:
{ "name" : "Johnny", "color" : "green" }
{ "name" : "Steve", "color" : "blue" },
{ "name" : "Ben", "color" : "red" },
{ "name" : "Timmy", "color" : "cyan" }
I'm trying to create an ORDERED list of color values matching a different ordered list.
For example, if I have the list ["Johnny", "Steve", "Ben", "Johnny"]
the new list will ["green", "blue", "red", "green"]
.
And if I have the list ["Steve", "Steve", "Ben", "Ben", "Johnny"]
the new list will be ["blue", "blue", "red", "red", "green"]
.
What's a good way of doing this using Python and/or PyMongo. This is what I have so far but it's not recognizing duplicates.
name_list = ["Steve", "Steve", "Ben", "Ben", "Johnny"]
color_list = []
for document in db.favoriteColors.aggregate([
{"$match": {"name": {"$in": name_list }}},
{"$project": {"color": 1}}
]):
for k, v in document.iteritems():
color_list.append(v)
print color_list
# ["blue", "red", "green"]
Actually, we can use the aggregation framework with client side processing to efficiently do this.
import pymongo
client = pymongo.MongoClient()
db = client.test # Or whatever is your database
favoriteColors = db.favoriteColors
first_list = ['Johnny', 'Steve', 'Ben', 'Johnny']
cursor = favoriteColors.aggregate([
{'$match': {'name': {'$in': first_list}}},
{'$project': {'part': {'$map': {
'input': first_list,
'as': 'inp',
'in': {
'$cond': [
{'$eq': [ '$$inp', '$name']},
'$color',
None
]
}
}}}},
{'$group': {'_id': None, 'data': {'$push': '$part'}}}
])
Because we $group
by None, our cursor contains one document which we can retrieve using next
. In fact the way we can verify that with print(list(cursor))
>>> import pprint
>>> pprint.pprint(list(cursor))
[{'_id': None,
'data': [['green', None, None, 'green'],
[None, 'blue', None, None],
[None, None, 'red', None]]}]
From here, we need to unpack the "data" field in the document with zip
, chain the inputs using chain.from_iterable
and filter out the elements that are None
.
from itertools import chain
result = [item
for item in chain.from_iterable(zip(*next(cursor)['data']))
if item is not None]
Which returns:
>>> result
['green', 'blue', 'red', 'green']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With