I have a class called "UserDatabaseRecord". It has a bunch of fields like "username", "expiration_date", and so on.
I have TWO lists of UserDatabaseRecord objects: List A and List B
I want to verify that for all UserDatabaseRecords in List A, the username field does not match any UserDatabaseRecords username field in List B.
I'm able to accomplish this very inefficiently:
for record_a in List_A:
for record_b in List_B:
if record_a.username == record_b.username:
print "Duplicate username: {0}".format(record_a.username)
It works, I guess. I'd just like to make it more efficient and/or "Pythonic".
This question is related, but ultimately I couldn't figure out how to apply it to a List of objects when comparing on just one field: one-liner to check if at least one item in list exists in another list?
The problem with this is that, for each element in list A, you're checking all of the elements in list B. So, if the lengths of your lists are N and M, that's N*M comparisons.
If you make a set of the usernames from list B, then you can just use the in
operator on it—which is not only simpler, it's instantaneous, instead of having to check all of the values one by one. So, you only need N lookups instead of N*M.
So:
b_names = {record.username for record in List_B}
for record_a in List_A:
if record_a.username in b_names:
print "Duplicate username: {0}".format(record_a.username)
Or, even simpler, use set intersection:
a_names = {record.username for record in List_A}
b_names = {record.username for record in List_B}
for name in a_names & b_names:
print "Duplicate username: {0}".format(name)
And really, you don't need both of them to be sets, you can make one a set and the other just an iterator, using a generator expression:
a_names = {record.username for record in List_A}
b_names = (record.username for record in List_B)
for name in a_names.intersection(b_names):
print "Duplicate username: {0}".format(name)
One of these may be a little faster than the others, but they'll all be in the same ballpark—and, more importantly, they're all linear instead of quadratic. So, I'd suggest using whichever one makes the most sense to you.
If you just need to know whether there are duplicates rather than get a list of them, or just need to get one of the duplicates arbitrarily rather than all of them, you can speed it up by "short-circuiting" early—e.g., adding a break
after the print
in the first one, or using isdisjoint
instead of intersection
in the last one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With