Question: What are the pros and cons of writing an __init__
that takes a collection directly as an argument, rather than unpacking its contents?
Context: I'm writing a class to process data from several fields in a database table. I iterate through some large (~100 million rows) query result, passing one row at a time to a class that performs the processing. Each row is retrieved from the database as a tuple (or optionally, as a dictionary).
Discussion: Assume I'm interested in exactly three fields, but what gets passed into my class depends on the query, and the query is written by the user. The most basic approach might be one of the following:
class Direct:
def __init__(self, names):
self.names = names
class Simple:
def __init__(self, names):
self.name1 = names[0]
self.name2 = names[1]
self.name3 = names[2]
class Unpack:
def __init__(self, names):
self.name1, self.name2, self.name3 = names
Here are some examples of rows that might be passed to a new instance:
good = ('Simon', 'Marie', 'Kent') # Exactly what we want
bad1 = ('Simon', 'Marie', 'Kent', '10 Main St') # Extra field(s) behind
bad2 = ('15', 'Simon', 'Marie', 'Kent') # Extra field(s) in front
bad3 = ('Simon', 'Marie') # Forgot a field
When faced with the above, Direct
always runs (at least to this point) but is very likely to be buggy (GIGO). It takes one argument and assigns it exactly as given, so this could be a tuple or list of any size, a Null value, a function reference, etc. This is the most quick-and-dirty way I can think of to initialize the object, but I feel like the class should complain immediately when I give it data it's clearly not designed to handle.
Simple
handles bad1
correctly, is buggy when given bad2
, and throws an error when given bad3
. It's convenient to be able to effectively truncate the inputs from bad1
but not worth the bugs that would come from bad2
. This one feels naive and inconsistent.
Unpack
seems like the safest approach, because it throws an error in all three "bad" cases. The last thing we want to do is silently fill our database with bad information, right? It takes the tuple directly, but allows me to identify its contents as distinct attributes instead of forcing me to keep referring to indices, and complains if the tuple is the wrong size.
On the other hand, why pass a collection at all? Since I know I always want three fields, I can define __init__
to explicitly accept three arguments, and unpack the collection using the *-operator as I pass it to the new object:
class Explicit:
def __init__(self, name1, name2, name3):
self.name1 = name1
self.name2 = name2
self.name3 = name3
names = ('Guy', 'Rose', 'Deb')
e = Explicit(*names)
The only differences I see are that the __init__
definition is a bit more verbose and we raise TypeError
instead of ValueError
when the tuple is the wrong size. Philosophically, it seems to make sense that if we are taking some group of data (a row of a query) and examining its parts (three fields), we should pass a group of data (the tuple) but store its parts (the three attributes). So Unpack
would be better.
If I wanted to accept an indeterminate number of fields, rather than always three, I still have the choice to pass the tuple directly or use arbitrary argument lists (*args, **kwargs) and *
-operator unpacking. So I'm left wondering, is this a completely neutral style decision?
In python functions, we can pack or unpack function arguments. Unpacking: During function call, we can unpack python list/tuple/range/dict and pass it as separate arguments. * is used for unpacking positional arguments. ** is used for unpacking keyword arguments.
The pack function takes a list of values to be packed as a second argument and returns a scalar character string containing the packed values. The unpack function takes a character string containing the values to be unpacked as a second argument, and returns a list of individual values extracted from the string.
The asterisks are unpacking operators that unpack the values from iterable objects in Python. The single asterisk operator (*) commonly associated with args can be used on any iterable. The double asterisk (**) associated with kwargs can only be used on dictionaries.
The * operator is an unpacking operator that will unpack the values from any iterable object, such as lists, tuples, strings, etc… And that's it!
This question is probably best answered by trying out the different approaches and seeing what makes the most sense to you and is the most easily understood by others reading your code.
Now that I have the benefit of more experience, I'd ask myself, how do I plan to access these values?
When I access any one of the values in this collection, am I likely to be using most or all of the values in that same subroutine or section of code? If so, the "Direct" approach is a good choice; it's the most compact and it lets me think about the collection as a collection until the point that I absolutely need to pay attention to what's inside.
On the other hand, if I'm using some values here, some values there, I don't want have to constantly remember which index to access or add verbosity in the form of dictionary keys when I could just be referring directly to the values using separately named attributes. I would probably avoid the "Direct" approach in this case so that I only have to even think about the fact that there's a collection when the class is first initialized.
Each of the remaining approaches involves splitting the collection up into different attributes, and I think the clear winner here is the "Explicit" approach. The "Simple" and "Unpack" approaches share a hidden dependency on the order of the collection, without offering any real advantage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With