Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it better to return null or empty collection?

Tags:

c#

collections

People also ask

Is it good practice to return null?

Returning Null is Bad Practice The FirstOrDefault method silently returns null if no order is found in the database. There are a couple of problems here: Callers of GetOrder method must implement null reference checking to avoid getting a NullReferenceException when accessing Order class members.

Should API return null or empty array?

Non existing numbers, strings, and booleans are usually represented as null . But string fields without value should also be represented as null , not "" . Note: Empty arrays shall not be null . If a field is some kind of list and it is represented by an array, an array shall be returned, even if empty.

What is the correct way of returning an empty collection?

Collections. emptyList() : method used to return an empty list. Collections. emptySet() : method used to return an empty set.

What happens when return null?

Returning null Creates More Work An ideal function, like an assistant cook, will encapsulate work and produce something useful. A function that returns a null reference achieves neither goal. Returning null is like throwing a time bomb into the software. Other code must a guard against null with if and else statements.


Empty collection. Always.

This sucks:

if(myInstance.CollectionProperty != null)
{
  foreach(var item in myInstance.CollectionProperty)
    /* arrgh */
}

It is considered a best practice to NEVER return null when returning a collection or enumerable. ALWAYS return an empty enumerable/collection. It prevents the aforementioned nonsense, and prevents your car getting egged by co-workers and users of your classes.

When talking about properties, always set your property once and forget it

public List<Foo> Foos {public get; private set;}

public Bar() { Foos = new List<Foo>(); }

In .NET 4.6.1, you can condense this quite a lot:

public List<Foo> Foos { get; } = new List<Foo>();

When talking about methods that return enumerables, you can easily return an empty enumerable instead of null...

public IEnumerable<Foo> GetMyFoos()
{
  return InnerGetFoos() ?? Enumerable.Empty<Foo>();
}

Using Enumerable.Empty<T>() can be seen as more efficient than returning, for example, a new empty collection or array.


From the Framework Design Guidelines 2nd Edition (pg. 256):

DO NOT return null values from collection properties or from methods returning collections. Return an empty collection or an empty array instead.

Here's another interesting article on the benefits of not returning nulls (I was trying to find something on Brad Abram's blog, and he linked to the article).

Edit- as Eric Lippert has now commented to the original question, I'd also like to link to his excellent article.


Depends on your contract and your concrete case. Generally it's best to return empty collections, but sometimes (rarely):

  • null might mean something more specific;
  • your API (contract) might force you to return null.

Some concrete examples:

  • an UI component (from a library out of your control), might be rendering an empty table if an empty collection is passed, or no table at all, if null is passed.
  • in a Object-to-XML (JSON/whatever), where null would mean the element is missing, while an empty collection would render a redundant (and possibly incorrect) <collection />
  • you are using or implementing an API which explicitly states that null should be returned/passed

There is one other point that hasn't yet been mentioned. Consider the following code:

    public static IEnumerable<string> GetFavoriteEmoSongs()
    {
        yield break;
    }

The C# Language will return an empty enumerator when calling this method. Therefore, to be consistant with the language design (and, thus, programmer expectations) an empty collection should be returned.


Empty is much more consumer friendly.

There is a clear method of making up an empty enumerable:

Enumerable.Empty<Element>()

It seems to me that you should return the value that is semantically correct in context, whatever that may be. A rule that says "always return an empty collection" seems a little simplistic to me.

Suppose in, say, a system for a hospital, we have a function that is supposed to return a list of all previous hospitalizations for the past 5 years. If the customer has not been in the hospital, it makes good sense to return an empty list. But what if the customer left that part of the admittance form blank? We need a different value to distinguish "empty list" from "no answer" or "don't know". We could throw an exception, but it's not necessarily an error condition, and it doesn't necessarily drive us out of the normal program flow.

I've often been frustrated by systems that cannot distinguish between zero and no answer. I've had a number of times where a system has asked me to enter some number, I enter zero, and I get an error message telling me that I must enter a value in this field. I just did: I entered zero! But it won't accept zero because it can't distinguish it from no answer.


Reply to Saunders:

Yes, I'm assuming that there's a difference between "Person didn't answer the question" and "The answer was zero." That was the point of the last paragraph of my answer. Many programs are unable to distinguish "don't know" from blank or zero, which seems to me a potentially serious flaw. For example, I was shopping for a house a year or so ago. I went to a real estate web site and there were many houses listed with an asking price of $0. Sounded pretty good to me: They're giving these houses away for free! But I'm sure the sad reality was that they just hadn't entered the price. In that case you may say, "Well, OBVIOUSLY zero means they didn't enter the price -- nobody's going to give a house away for free." But the site also listed the average asking and selling prices of houses in various towns. I can't help but wonder if the average didn't include the zeros, thus giving an incorrectly low average for some places. i.e. what is the average of $100,000; $120,000; and "don't know"? Technically the answer is "don't know". What we probably really want to see is $110,000. But what we'll probably get is $73,333, which would be completely wrong. Also, what if we had this problem on a site where users can order on-line? (Unlikely for real estate, but I'm sure you've seen it done for many other products.) Would we really want "price not specified yet" to be interpreted as "free"?

RE having two separate functions, an "is there any?" and an "if so, what is it?" Yes, you certainly could do that, but why would you want to? Now the calling program has to make two calls instead of one. What happens if a programmer fails to call the "any?" and goes straight to the "what is it?" ? Will the program return a mis-leading zero? Throw an exception? Return an undefined value? It creates more code, more work, and more potential errors.

The only benefit I see is that it enables you to comply with an arbitrary rule. Is there any advantage to this rule that makes it worth the trouble of obeying it? If not, why bother?


Reply to Jammycakes:

Consider what the actual code would look like. I know the question said C# but excuse me if I write Java. My C# isn't very sharp and the principle is the same.

With a null return:

HospList list=patient.getHospitalizationList(patientId);
if (list==null)
{
   // ... handle missing list ...
}
else
{
  for (HospEntry entry : list)
   //  ... do whatever ...
}

With a separate function:

if (patient.hasHospitalizationList(patientId))
{
   // ... handle missing list ...
}
else
{
  HospList=patient.getHospitalizationList(patientId))
  for (HospEntry entry : list)
   // ... do whatever ...
}

It's actually a line or two less code with the null return, so it's not more burden on the caller, it's less.

I don't see how it creates a DRY issue. It's not like we have to execute the call twice. If we always wanted to do the same thing when the list does not exist, maybe we could push handling down to the get-list function rather than having the caller do it, and so putting the code in the caller would be a DRY violation. But we almost surely don't want to always do the same thing. In functions where we must have the list to process, a missing list is an error that might well halt processing. But on an edit screen, we surely don't want to halt processing if they haven't entered data yet: we want to let them enter data. So handling "no list" must be done at the caller level one way or another. And whether we do that with a null return or a separate function makes no difference to the bigger principle.

Sure, if the caller doesn't check for null, the program could fail with a null-pointer exception. But if there's a separate "got any" function and the caller doesn't call that function but blindly calls the "get list" function, then what happens? If it throws an exception or otherwise fails, well, that's pretty much the same as what would happen if it returned null and didn't check for it. If it returns an empty list, that's just wrong. You're failing to distinguish between "I have a list with zero elements" and "I don't have a list". It's like returning zero for the price when the user didn't enter any price: it's just wrong.

I don't see how attaching an additional attribute to the collection helps. The caller still has to check it. How is that better than checking for null? Again, the absolute worst thing that could happen is for the programmer to forget to check it, and give incorrect results.

A function that returns null is not a surprise if the programmer is familiar with the concept of null meaning "don't have a value", which I think any competent programmer should have heard of, whether he thinks it's a good idea or not. I think having a separate function is more of a "surprise" problem. If a programmer is unfamiliar with the API, when he runs a test with no data he'll quickly discover that sometimes he gets back a null. But how would he discover the existence of another function unless it occurred to him that there might be such a function and he checks the documentation, and the documentation is complete and comprehensible? I would much rather have one function that always gives me a meaningful response, rather than two functions that I have to know and remember to call both.