Why is ORM considered good but "select *" considered bad?

Tags:

orm

Doesn't an ORM usually involve doing something like a select *?

If I have a table, MyThing, with column A, B, C, D, etc, then there typically would be an object, MyThing with properties A, B, C, D.

It would be evil if that object were incompletely instantiated by a select statement that looked like this, only fetching the A, B, not the C, D:

select A, B from MyThing /* don't get C and D, because we don't need them */

but it would also be evil to always do this:

select A, B, C, D /* get all the columns so that we can completely instantiate the MyThing object */

Does ORM make an assumption that database access is so fast now you don't have to worry about it and so you can always fetch all the columns?

Or, do you have different MyThing objects, one for each combo of columns that might happen to be in a select statement?

EDIT: Before you answer the question, please read Nicholas Piasecki's and Bill Karwin's answers. I guess I asked my question poorly because many misunderstood it, but Nicholas understood it 100%. Like him, I'm interested in other answers.

EDIT #2: Links that relate to this question:

Why do we need entity objects?

http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx, especially the section "The Partial-Object Problem and the Load-Time Paradox"

http://groups.google.com/group/comp.object/browse_thread/thread/853fca22ded31c00/99f41d57f195f48b?

http://www.martinfowler.com/bliki/AnemicDomainModel.html

http://database-programmer.blogspot.com/2008/06/why-i-do-not-use-orm.html

825

asked Nov 15 '08 00:11

Corey Trager

8 Answers

In my limited experience, things are as you describe--it's a messy situation and the usual cop-out "it depends" answer applies.

A good example would be the online store that I work for. It has a Brand object, and on the main page of the Web site, all of the brands that the store sells are listed on the left side. To display this menu of brands, all the site needs is the integer BrandId and the string BrandName. But the Brand object contains a whole boatload of other properties, most notably a Description property that can contain a substantially large amount of text about the Brand. No two ways about it, loading all of that extra information about the brand just to spit out its name in an unordered list is (1) measurably and significantly slow, usually because of the large text fields and (2) pretty inefficient when it comes to memory usage, building up large strings and not even looking at them before throwing them away.

One option provided by many ORMs is to lazy load a property. So we could have a Brand object returned to us, but that time-consuming and memory-wasting Description field is not until we try to invoke its get accessor. At that point, the proxy object will intercept our call and suck down the description from the database just in time. This is sometimes good enough but has burned me enough times that I personally don't recommend it:

It's easy to forget that the property is lazy-loaded, introducing a SELECT N+1 problem just by writing a foreach loop. Who knows what happens when LINQ gets involved.
What if the just-in-time database call fails because the transport got flummoxed or the network went out? I can almost guarantee that any code that is doing something as innocuous as string desc = brand.Description was not expecting that simple call to toss a DataAccessException. Now you've just crashed in a nasty and unexpected way. (Yes, I've watched my app go down hard because of just that. Learned the hard way!)

So what I've ended up doing is that in scenarios that require performance or are prone to database deadlocks, I create a separate interface that the Web site or any other program can call to get access to specific chunks of data that have had their query plans carefully examined. The architecture ends up looking kind of like this (forgive the ASCII art):

Web Site:         Controller Classes
                     |
                     |---------------------------------+
                     |                                 |
App Server:       IDocumentService               IOrderService, IInventoryService, etc
                  (Arrays, DataSets)             (Regular OO objects, like Brand)
                     |                                 |
                     |                                 |
                     |                                 |
Data Layer:       (Raw ADO.NET returning arrays, ("Full cream" ORM like NHibernate)
                   DataSets, simple classes)

I used to think that this was cheating, subverting the OO object model. But in a practical sense, as long as you do this shortcut for displaying data, I think it's all right. The updates/inserts and what have you still go through the fully-hydrated, ORM-filled domain model, and that's something that happens far less frequently (in most of my cases) than displaying particular subsets of the data. ORMs like NHibernate will let you do projections, but by that point I just don't see the point of the ORM. This will probably be a stored procedure anyway, writing the ADO.NET takes two seconds.

This is just my two cents. I look forward to reading some of the other responses.

148

answered Sep 28 '22 06:09

Nicholas Piasecki

People use ORM's for greater development productivity, not for runtime performance optimization. It depends on the project whether it's more important to maximize development efficiency or runtime efficiency.

In practice, one could use the ORM for greatest productivity, and then profile the application to identify bottlenecks once you're finished. Replace ORM code with custom SQL queries only where you get the greatest bang for the buck.

SELECT * isn't bad if you typically need all the columns in a table. We can't generalize that the wildcard is always good or always bad.

edit: Re: doofledorfer's comment... Personally, I always name the columns in a query explicitly; I never use the wildcard in production code (though I use it when doing ad hoc queries). The original question is about ORMs -- in fact it's not uncommon that ORM frameworks issue a SELECT * uniformly, to populate all the fields in the corresponding object model.

Executing a SELECT * query may not necessarily indicate that you need all those columns, and it doesn't necessarily mean that you are neglectful about your code. It could be that the ORM framework is generating SQL queries to make sure all the fields are available in case you need them.

answered Sep 28 '22 06:09

Bill Karwin

Linq to Sql, or any implementation of IQueryable, uses a syntax which ultimately puts you in control of the selected data. The definition of a query is also the definition of its result set.

This neatly avoids the select * issue by removing data shape responsibilities from the ORM.

For example, to select all columns:

from c in data.Customers
select c

To select a subset:

from c in data.Customers
select new
{
  c.FirstName,
  c.LastName,
  c.Email
}

To select a combination:

from c in data.Customers
join o in data.Orders on c.CustomerId equals o.CustomerId
select new
{
  Name = c.FirstName + " " + c.LastName,
  Email = c.Email,
  Date = o.DateSubmitted
}

answered Sep 29 '22 06:09

Bryan Watts

There are two separate issues to consider.

To begin, it is quite common when using an ORM for the table and the object to have quite different "shapes", this is one reason why many ORM tools support quite complex mappings.

A good example is when a table is partially denormalised, with columns containing redundant information (often, this is done to improve query or reporting performance). When this occurs, it is more efficient for the ORM to request just the columns it requires, than to have all the extra columns brought back and ignored.

The question of why "Select *" is evil is separate, and the answer falls into two halves.

When executing "select *" the database server has no obligation to return the columns in any particular order, and in fact could reasonably return the columns in a different order every time, though almost no databases do this.

Problem is, when a typical developer observes that the columns returned seem to be in a consistent order, the assumption is made that the columns will always be in that order, and then you have code making unwarranted assumptions, just waiting to fail. Worse, that failure may not be fatal, but may simply involve, say, using Year of Birth in place of Account Balance.

The other issue with "Select *" revolves around table ownership - in many large companies, the DBA controls the schema, and makes changes as required by major systems. If your tool is executing "select *" then you only get the current columns - if the DBA has removed a redundant column that you need, you get no error, and your code may blunder ahead causing all sorts of damage. By explicitly requesting the fields you require, you ensure that your system will break rather than process the wrong information.

answered Sep 30 '22 06:09

Bevan

I am not sure why you would want a partially hydrated object. Given a class of Customer with properties of Name, Address, Id. I would want them all to create a fully populated Customer object.

The list hanging off of Customers called Orders can be lazily loaded when accessed though most ORMs. And NHibernate anyway allows you to do projections into other objects. So if you had say a simply customer list where you displayed the ID and Name, you can create an object of type CustomerListDisplay and project your HQL query into that object set and only obtain the columns you need from the database.

Friends don't let friends premature optimize. Fully hydrate your object, lazy load it's associations. And then profile your application looking for problems and optimize the problem areas.

answered Sep 28 '22 06:09

NotMyself

Even ORMs need to avoid SELECT * to be effective, by using lazy loading etc.

And yes, SELECT * is generally a bad idea if you aren't consuming all the data.

So, do you have different kinds of MyThing objects, one for each column combo? – Corey Trager (Nov 15 at 0:37)

No, I have read-only digest objects (which only contain important information) for things like lookups and massive collections and convert these to fully hydrated objects on demand. – Cade Roux (Nov 15 at 1:22)

answered Sep 30 '22 06:09

Cade Roux

The case you describe is a great example of how ORM is not a panacea. Databases offer flexible, needs-based access to their data primarily through SQL. As a developer, I can easily and simply get all the data (SELECT *) or some of the data (SELECT COL1, COL2) as needed. My mechanism for doing this will be easily understood by any other developer taking over the project.

In order to get the same flexibility from ORM, a lot more work has to be done (either by you or the ORM developers) just to get you back to the place under the hood where you're either getting all or some of the columns from the database as needed (see the excellent answers above to get a sense of some of the problems). And all this extra stuff is just more stuff that can fail, making an ORM system intrinsically less reliable than straight SQL calls.

This is not to say that you shouldn't use ORM (my standard disclaimer is that all design choices have costs and benefits, and the choice of one or the other just depends) - knock yourself out if it works for you. I will say that I truly don't understand the popularity of ORM, given the amount of extra un-fun work it seems to create for its users. I'll stick with using SELECT * when (wait for it) I need to get every column from a table.

answered Oct 02 '22 06:10

MusiGenesis

ORMs in general do not rely on SELECT *, but rely on better methods to find columns like defined data map files (Hibernate, variants of Hibernate, and Apache iBATIS do this). Something a bit more automatic could be set up by querying the database schema to get a list of columns and their data types for a table. How the data gets populated is specific to the particular ORM you are using, and it should be well-documented there.

It is never a good idea to select data that you do not use at all, as it can create a needless code dependency that can be obnoxious to maintain later. For dealing with data internal to the class, things are a bit more complicated.

A short rule would be to always fetch all the data that the class stores by default. In most cases, a small amount of overhead won't make a huge difference, so your main goal is to reduce maintenance overhead. Later, when you performance profiling of the code, and have reason to believe that it may benefit from adjusting the behavior, that is the time to do it.

If I saw an ORM make SELECT * statements, either visibly or under its covers, then I would look elsewhere to fulfill my database integration needs.

answered Sep 30 '22 06:09

Lara Dougan

Related questions
                            
                                Best ORM to use with C# 4.0 [closed]
                            
                                Is an ORM redundant with a NoSQL API?
                            
                                What is a good alternative for static stored properties of generic types in swift?
                            
                                can we use jpa without hibernate
                            
                                Safely clearing Hibernate session in the middle of large transaction
                            
                                Spring 3.1 entityManagerFactory java.lang.NoSuchFieldError: NULL Error
                            
                                Does TypeORM supports raw SQL queries for input and output?
                            
                                How do I use the SQL WHERE IN construct with PetaPoco?
                            
                                What does JPA EntityManager.getSingleResult() return for a COUNT query?
                            
                                JPA: When to choose Multivalued Association vs. Element Collection Mapping
                            
                                What are the principles behind, and benefits of, the "party model"?
                            
                                "where exists" in Hibernate HQL
                            
                                a new object was found through a relationship that was not marked cascade PERSIST
                            
                                No operator matches the given name and argument type(s). You might need to add explicit type casts. -- Netbeans, Postgresql 8.4 and Glassfish
                            
                                Why is loading SQLAlchemy objects via the ORM 5-8x slower than rows via a raw MySQLdb cursor?
                            
                                NHibernate or Fluent NHibernate? [closed]
                            
                                What exactly is the difference between a data mapper and a repository?
                            
                                Hibernate exception _$$_javassist_0 cannot be cast to javassist.util.proxy.Proxy
                            
                                Is it possible to output generated SQL using EclipseLink without having to increase log verbosity?
                            
                                SQLAlchemy is convoluted? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is ORM considered good but "select *" considered bad?

Tags:

orm

Corey Trager

People also ask

8 Answers

Nicholas Piasecki

Bill Karwin

Bryan Watts

Bevan

NotMyself

Cade Roux

MusiGenesis

Lara Dougan

Recent Activity

Donate For Us