Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which one scales better? ORM's distinct() or python set()

and happy new year!

I have a model which will hold some hundreds of thousands of records. The model looks like this:

class Transaction(models.Model):
    user = models.ForeignKey(User)
    client = models.ForeignKey(Client)
    amount = models.FloatField()

I want to know all clients a certain user is dealing with. To get the unique client ids, I could use the Django ORM alone:

Transaction.objects.filter(user=the_user).distinct('client_id').values_list('client_id', flat=True)

or do the following:

set(Transaction.objects.filter(user=the_user).values_list('client_id', flat=True))

Both will produce the same result. But which one will be quicker given the large set of records? I know distinct is a relatively slow operation on databases, but how it compares to python's set()?

Finally, if it goes down to databases, my options are MySql and PostgreSql for production. Would there be any difference between the two of them for this specific operation?

like image 516
ppetrid Avatar asked Jan 02 '13 03:01

ppetrid


1 Answers

I usually use the orm functions, it's more readable and it runs on database level, so is your RDBMS who actually computes the distinct values, and you have your result in a single step.

You can accomplish the same with a python set, but you need to first bring all your data set, and then apply a set(). So you have to perform a two step operation to do the same.

In your first case (orm) you only have the I/O overhead, and in the second case, you have the I/O overhead + a function call, so I would go with the ORM's distinct.

like image 68
iferminm Avatar answered Sep 18 '22 23:09

iferminm