Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing massive lists of dictionaries in python

Tags:

python

I never actually thought I'd run into speed-issues with python, but I have. I'm trying to compare really big lists of dictionaries to each other based on the dictionary values. I compare two lists, with the first like so

biglist1=[{'transaction':'somevalue', 'id':'somevalue', 'date':'somevalue' ...}, {'transactio':'somevalue', 'id':'somevalue', 'date':'somevalue' ...}, ...]

With 'somevalue' standing for a user-generated string, int or decimal. Now, the second list is pretty similar, except the id-values are always empty, as they have not been assigned yet.

biglist2=[{'transaction':'somevalue', 'id':'', 'date':'somevalue' ...}, {'transactio':'somevalue', 'id':'', 'date':'somevalue' ...}, ...]

So I want to get a list of the dictionaries in biglist2 that match the dictionaries in biglist1 for all other keys except id.

I've been doing

for item in biglist2:
    for transaction in biglist1:
       if item['transaction'] == transaction['transaction']:
          list_transactionnamematches.append(transaction)

for item in biglist2:
    for transaction in list_transactionnamematches:
       if item['date'] == transaction['date']:
          list_transactionnamematches.append(transaction)

... and so on, not comparing id values, until I get a final list of matches. Since the lists can be really big (around 3000+ items each), this takes quite some time for python to loop through.

I'm guessing this isn't really how this kind of comparison should be done. Any ideas?

like image 237
Tuomas Avatar asked Dec 19 '08 22:12

Tuomas


People also ask

How do I compare dictionaries in Python?

For simple dictionaries, comparing them is usually straightforward. You can use the == operator, and it will work.

Can you directly compare lists in Python?

We can club the Python sort() method with the == operator to compare two lists. Python sort() method is used to sort the input lists with a purpose that if the two input lists are equal, then the elements would reside at the same index positions.

Can you compare dictionary values in Python?

The compare method cmp() is used in Python to compare values and keys of two dictionaries. If method returns 0 if both dictionaries are equal, 1 if dic1 > dict2 and -1 if dict1 < dict2.


2 Answers

Index on the fields you want to use for lookup. O(n+m)

matches = []
biglist1_indexed = {}

for item in biglist1:
    biglist1_indexed[(item["transaction"], item["date"])] = item

for item in biglist2:
    if (item["transaction"], item["date"]) in biglist1_indexed:
        matches.append(item)

This is probably thousands of times faster than what you're doing now.

like image 75
recursive Avatar answered Oct 16 '22 01:10

recursive


What you want to do is to use correct data structures:

  1. Create a dictionary of mappings of tuples of other values in the first dictionary to their id.

  2. Create two sets of tuples of values in both dictionaries. Then use set operations to get the tuple set you want.

  3. Use the dictionary from the point 1 to assign ids to those tuples.

like image 40
J S Avatar answered Oct 16 '22 03:10

J S