I have an object that stores some data in a list. The implementation could change later, and I don't want to expose the internal implementation to the end user. However, the user must have the ability to modify and access this collection of data. Currently I have something like this:
public List<SomeDataType> getData() {
return this.data;
}
public void setData(List<SomeDataType> data) {
this.data = data;
}
Does this mean that I have allowed the internal implementation details to leak out? Should I be doing this instead?
public Collection<SomeDataType> getData() {
return this.data;
}
public void setData(Collection<SomeDataType> data) {
this.data = new ArrayList<SomeDataType>(data);
}
In List, data is in particular order. In Set, it can not contain the same data twice. In Collection, it just stores data with no particular order and can contain duplicate data.
The usage is purely depends on the requirement: If the requirement is to have only unique values then Set is your best bet as any implementation of Set maintains unique values only. If there is a need to maintain the insertion order irrespective of the duplicity then List is a best option.
A List is an ordered Collection (sometimes called a sequence). Lists may contain duplicate elements.
Collection framework are much higher level compared to Arrays and provides important interfaces and classes that by using them we can manage groups of objects with a much sophisticated way with many methods already given by the specific collection.
It just depends, do you want your users to be able to index into the data? If yes, use List. Both are interfaces, so you're not leaking implementation details, really, you just need to decide the minimum functionality needed.
Returning a List is in line with programming to the Highest Suitable Interface.
Returning a Collection would cause ambiguity to the user, as a returned collection could be either: Set, List or Queue.
Independent of the ability to index into the list via List.get(int), do the users (or you) have an expectation that the elements of the collection are in a reliable and predictable order? Can the collection have multiples of the same item? Both of these are expectations of lists that are not common to more general collections. These are the tests I use when determining which abstraction to expose to the end user.
When returning an implementation of an interface or class that is in a tall hierarchy, the rule of thumb is that the declared return type should be the HIGHEST level that provides the minimum functionality that you are prepared to guarantee to the caller, and that the caller reasonably needs. For example, suppose what you really return is an ArrayList. ArrayList implements List and Collection (among other things). If you expect the caller to need to use the get(int x) function, then it won't work to return a Collection, you'll need to return a List or ArrayList. As long as you don't see any reason why you would ever change your implementation to use something other than a list -- say a Set -- then the right answer is to return a List. I'm not sure if there's any function in ArrayList that isn't in List, but if there is, the same reasoning would apply. On the other hand, once you do return a List instead of a Collection, you have now locked in your implementation to some extent. The less you put in your API, the less restriction you put on future improvements.
(In practice, I almost always return a List in such situations, and it has never burned me. But I probably really should return a Collection.)
Using the most general type, which is Collection, makes the most sense unless there is some explicit reason to use the more specific type - List. But whatever you do, if this is an API for public consumption be clear in the documentation what it does; if it returns a shallow copy of the collection say so.
Yes, your first alternative does leak implementation details if it's not part of your interface contract that the method will always return a List. Also, allowing user code to replace your collection instance is somewhat dangerous, because the implementation they pass in may not behave as you expect.
Of course, it's all a matter of how much you trust your users. If you take the Python philosophy that "we're all consenting adults here" then the first method is just fine. If you think that your library will be used by inexperienced developers and you need to do all you can to "babysit" them and make sure they don't do something wrong then it's preferable not to let them set the collection and not to even return the actual collection. Instead return a (shallow) copy of it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With