Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating over a dictionary to create a list

I have the following 4 dictionaries in a MongoDB collection called favoriteColors:

{ "name" : "Johnny", "color" : "green" }
{ "name" : "Steve", "color" : "blue" },
{ "name" : "Ben", "color" : "red" },
{ "name" : "Timmy", "color" : "cyan" }

I'm trying to create an ORDERED list of color values matching a different ordered list.

For example, if I have the list ["Johnny", "Steve", "Ben", "Johnny"] the new list will ["green", "blue", "red", "green"].

And if I have the list ["Steve", "Steve", "Ben", "Ben", "Johnny"] the new list will be ["blue", "blue", "red", "red", "green"].

What's a good way of doing this using Python and/or PyMongo. This is what I have so far but it's not recognizing duplicates.

name_list = ["Steve", "Steve", "Ben", "Ben", "Johnny"]

color_list = []
for document in db.favoriteColors.aggregate([
    {"$match": {"name": {"$in": name_list }}},
    {"$project": {"color": 1}}
]):
    for k, v in document.iteritems():
        color_list.append(v)

print color_list
# ["blue", "red", "green"]
like image 803
Johnny Metz Avatar asked Nov 08 '22 04:11

Johnny Metz


1 Answers

Actually, we can use the aggregation framework with client side processing to efficiently do this.

import pymongo


client = pymongo.MongoClient()
db = client.test # Or whatever is your database
favoriteColors = db.favoriteColors
first_list = ['Johnny', 'Steve', 'Ben', 'Johnny']

cursor = favoriteColors.aggregate([
    {'$match': {'name': {'$in': first_list}}}, 
    {'$project': {'part': {'$map': {
        'input': first_list, 
        'as': 'inp', 
        'in': {
            '$cond': [
                {'$eq': [ '$$inp', '$name']}, 
                '$color', 
                None
            ]
        }
    }}}},
    {'$group': {'_id': None, 'data': {'$push': '$part'}}}
])

Because we $group by None, our cursor contains one document which we can retrieve using next. In fact the way we can verify that with print(list(cursor))

>>> import pprint
>>> pprint.pprint(list(cursor))
[{'_id': None,
  'data': [['green', None, None, 'green'],
           [None, 'blue', None, None],
           [None, None, 'red', None]]}]

From here, we need to unpack the "data" field in the document with zip, chain the inputs using chain.from_iterable and filter out the elements that are None.

from itertools import chain

result = [item 
          for item in chain.from_iterable(zip(*next(cursor)['data']))
          if item is not None]

Which returns:

>>> result
['green', 'blue', 'red', 'green']
like image 197
styvane Avatar answered Nov 14 '22 21:11

styvane