I am struggling to understand how to best query a repository.
The three factors that are throwing me through a loop right now are:
Point 1
In regards to question one:
I have Repositories with lot of methods that return a combination of both Entities and scalar values. This seems to lead to "method explosion". Should I always return an Entity object? How should I query for objects where I only need one column?
Point 2 When running a query should I include every column in the table even if I only need one, or two columns? If I create specific queries for this it leads to more methods in the Repository
Point 3 How should I provide conditions for the query? I read about Specifications, but my understanding is that you loop through the returned records and filter out the ones that pass into a new collection. This doesn't seem like a good idea performance wise. Right Now I just make a new method in the Repo like getNameById() which encapsulates the condition.
Please not that I am not using an ORM, I just have raw sql in my Repositories.
Update
Point 1: Based on the answers and a bit more research would this be a good implementation?
Right now I have a large repository that return a mix of scalar and entity type objects (all same entity). I'm thinking I could reduce this greatly if I just use a GetUser(userId) method and forget writing methods that just return single column values.
For example if I need to return a user name I could call the GetUser(userId) method that hydrates the User object and then in the service layer just filter it down to the username.
Another way would be to use some sort of QueryBuilder class I could pass into the Repository which could be parsed to generate the proper sql.
Point 2
Looking back this is pretty similar to point one and my current solution would be to just grab all table fields. It's a tradeoff between performance and maintainability.
Point 3
I would need to provide some sort of where clause. I'm not sure if this make sense doing via Specification or just a sql string. My current solution is to make new methods for these types, but I would like something more generic for the Repository
Overall, still researching into this... I'd love to hear more input into this or links to books or references that kind of tie this all together.
Raw SQL, sometimes also called native SQL, is the most basic, most low-level form of database interaction. You tell the database what to do in the language of the database. Most developers should know basics of SQL. This means how to CREATE tables and views, how to SELECT and JOIN data, how to UPDATE and DELETE data.
ORMs provide a code-first API to express queries, whereas raw SQL queries are often expressed as plain strings. These query strings are often more concise than the equivalent ORM operation and allow queries to be represented in a language-agnostic way.
Methods. To measure the performance of both techniques, we developed a blog application and we ran database operations select, insert and update in both techniques. Conclusions. Results indicated that overall Raw SQL performed better than Eloquent ORM in our database operations.
Django uses the primary key to identify model instances, so it must always be included in a raw query.
I have Repositories with lot of methods that return a combination of both Entities and scalar values. This seems to lead to "method explosion". Should I always return an Entity object? How should I query for objects where I only need one column?
You can fight repository method explosion similar to how you would fight other SRP violations. You can create another repository for the same entity. See this answer to a similar question.
When running a query should I include every column in the table even if I only need one, or two columns? If I create specific queries for this it leads to more methods in the Repository
This is not a DDD question. Domain driven design does not deal with 'rows and columns'. There is always some redundancy in how much data you load to 'hydrate' the domain object, but you have to measure whether this really affects your performance. If this is really a performance bottleneck than it maybe a symptom of incorrect domain model.
How should I provide conditions for the query? I read about Specifications, but my understanding is that you loop through the returned records and filter out the ones that pass into a new collection. This doesn't seem like a good idea performance wise. Right Now I just make a new method in the Repo like getNameById() which encapsulates the condition.
This again is a data access issue. Nothing in DDD says that your repository can not convert Specification to a SQL query. It is up to you whether you do this or iterate over records in memory (as long as repository consumer only sees Specification and Repository and stays unaware of the actual implementation).
Regarding 'Raw SQL vs. ORM in DDD' you may find this answer interesting.
I agree with everything Dmitry says, but perhaps think you should have a read of CQRS.
I used to ask similar questions when getting started with DDD (regarding 'method explosion', not your SQL issues), and this lead me to CQRS. Personally, I don't really see how DDD is practical without it, and it answers a lot of these sorts of questions when it comes to querying data. Using it's principles what I'd suggest is:
This keeps your proper domain & repositories clean, whilst providing the means to create a thin data access layer that provides high performing queries.
Regarding the specification pattern: rather than converting it to a SQL query in code, you could provide public properties on the specification that represent the criteria. These values could then be added in the where clause of your SQL or sent as parameters to a SPROC.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With