Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLAlchemy - only one result being returned when count() says there are more

I'm having trouble with a really large result set only returning one row.

Session.query(TestSet).join(Instance).count()
>> 4283878
Session.query(TestSet).join(Instance).offset(0).limit(100).count()
>> 100
Session.query(TestSet).join(Instance).offset(0).limit(100).all()
>> [<model.testset.TestSet object at 0x043EC2F0>]

That is, all returns only one instance of my model, instead of 100. Now, for something even stranger:

len(Session.query(TestSet).join(Instance).offset(0).limit(100).distinct().all())
>> 100

So if I add distinct before all, I get back all 100 results. What's going on here?

like image 740
Sarah Vessels Avatar asked Dec 04 '22 07:12

Sarah Vessels


1 Answers

The Query object, when asked to iterate through results representing an entity like TestSet, performs uniquing on the result rows based on object identity, so that if the query were to return 100 rows each with the same TestSet primary key, you'd get only one result object back. This behavior has its origins in the "eager joining" feature of Query, where it's often the case that many result rows are being received each with the same primary identity, but also containing a varying secondary identity of a related row that's to be populated into a collection upon each primary identity - only one instance of the primary identity is desirable in this very common case.

Let's then consider what distinct() does. Suppose your query for 4M objects returns 1000 rows with id=1, 1000 rows with id=2, etc. The query with limit(100) hits the first 100 rows with id=1, Query uniquifies, and you get one result object back, since they are all id=1. But with distinct(), suddenly we are getting 100 rows with distinct identities, i.e. "id=1", "id=2", "id=3". Query then assigns each of these rows to a new TestSet object in the identity map, and you get 100 rows back.

Setting echo='debug' on your Engine temporarily will show the SQL being emitted as well as the result rows coming back. When you see many result rows all with the same primary key, you know that Query when asked to return full entities is going to unique all those redundant identities down to the single object represented for each row.

like image 158
zzzeek Avatar answered May 24 '23 23:05

zzzeek