and happy new year!
I have a model which will hold some hundreds of thousands of records. The model looks like this:
class Transaction(models.Model):
user = models.ForeignKey(User)
client = models.ForeignKey(Client)
amount = models.FloatField()
I want to know all clients a certain user is dealing with. To get the unique client ids, I could use the Django ORM alone:
Transaction.objects.filter(user=the_user).distinct('client_id').values_list('client_id', flat=True)
or do the following:
set(Transaction.objects.filter(user=the_user).values_list('client_id', flat=True))
Both will produce the same result. But which one will be quicker given the large set of records? I know distinct
is a relatively slow operation on databases, but how it compares to python's set()
?
Finally, if it goes down to databases, my options are MySql and PostgreSql for production. Would there be any difference between the two of them for this specific operation?
I usually use the orm functions, it's more readable and it runs on database level, so is your RDBMS who actually computes the distinct values, and you have your result in a single step.
You can accomplish the same with a python set, but you need to first bring all your data set, and then apply a set(). So you have to perform a two step operation to do the same.
In your first case (orm) you only have the I/O overhead, and in the second case, you have the I/O overhead + a function call, so I would go with the ORM's distinct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With