Which one scales better? ORM's distinct() or python set()

Question

and happy new year!

I have a model which will hold some hundreds of thousands of records. The model looks like this:

class Transaction(models.Model):
    user = models.ForeignKey(User)
    client = models.ForeignKey(Client)
    amount = models.FloatField()

I want to know all clients a certain user is dealing with. To get the unique client ids, I could use the Django ORM alone:

Transaction.objects.filter(user=the_user).distinct('client_id').values_list('client_id', flat=True)

or do the following:

set(Transaction.objects.filter(user=the_user).values_list('client_id', flat=True))

Both will produce the same result. But which one will be quicker given the large set of records? I know distinct is a relatively slow operation on databases, but how it compares to python's set()?

Finally, if it goes down to databases, my options are MySql and PostgreSql for production. Would there be any difference between the two of them for this specific operation?

iferminm · Accepted Answer

I usually use the orm functions, it's more readable and it runs on database level, so is your RDBMS who actually computes the distinct values, and you have your result in a single step.

You can accomplish the same with a python set, but you need to first bring all your data set, and then apply a set(). So you have to perform a two step operation to do the same.

In your first case (orm) you only have the I/O overhead, and in the second case, you have the I/O overhead + a function call, so I would go with the ORM's distinct.

Which one scales better? ORM's distinct() or python set()

Tags:

python

mysql

postgresql

django

ppetrid

1 Answers

iferminm

Recent Activity

Donate For Us

Which one scales better? ORM's distinct() or python set()

Tags:

python

mysql

postgresql

django

ppetrid

1 Answers

iferminm

Related questions

Recent Activity

Donate For Us