Often I find myself running into the same question. A common pattern is that I create a class that performs some operations. Eg. Loads data, transforms/cleans data, saves data. The question then arises how to pass/save intermediate data. Look at the following 2 options:
import read_csv_as_string, store_data_to_database
class DataManipulator:
''' Intermediate data states are saved in self.results'''
def __init__(self):
self.results = None
def load_data(self):
'''do stuff to load data, set self.results'''
self.results = read_csv_as_string('some_file.csv')
def transform(self):
''' transforms data, eg get first 10 chars'''
transformed = self.results[:10]
self.results = transformed
def save_data(self):
''' stores string to database'''
store_data_to_database(self.results)
def run(self):
self.load_data()
self.transform()
self.save_data()
DataManipulator().run()
class DataManipulator2:
''' Intermediate data states are not saved but passed along'''
def load_data(self):
''' do stuff to load data, return results'''
return read_csv_as_string('some_file.csv')
def transform(self, results):
''' transforms data, eg get first 10 chars'''
return results[:10]
def save_data(self, data):
''' stores string to database'''
store_data_to_database(data)
def run(self):
results = self.load_data()
trasformed_results = self.transform(results)
self.save_data(trasformed_results)
DataManipulator2().run()
Now for writing tests, I find DataManipulator2 better since functions can be tested more easily in isolation. At the same time I also like the clean run function of DataManipulator. What is the most pythonic way?
Unlike what was said in the other answers, I don't think this is a matter of personal taste.
As you wrote, DataManipulator2
seems, at first sight, easier to test. (But as @AliFaizan stated, it's not so easy to unit test a function that needs a database connection.) And it seems easier to test because it's stateless. A stateless class is not automatically easier to test, but it is easier to understand: for one input, you always get the same output.
But that's not the only point: with DataManipulator2
, the order of the actions in run
can't be wrong, because each function passes some data to the next one, and the next one can't proceed without this data. That would be more obvious with a statically (and strongly) typed language, because you can't even compile a erroneous run
function.
On the contrary, DataManipulator
is not easily testable, stateful and doesn't ensure the order of the actions. That's why the method DataManipulator.run
is so clean. It's event too clean because its implementation hides something very important: function calls are ordered.
Hence, my answer: prefer the DataManipulator2
implementation to the DataManipulator
implementation.
But is the DataManipulator2
perfect? Yes and no. For a quick and dirty implementation, that's the way to go. But let's try to go further.
You need the function run
to be public, but load_data
, save_data
and transform
have no reason to be public (by "public" I mean: not marked as implementation detail with an underscore). If you mark them with an underscore, they are not part of the contract anymore and you are not comfortable with testing them. Why? Because the implementation may change without breaking the class contract although there may be tests failures. That's a cruel dilemma: either your class DataManipulator2
has the correct API or it is not fully testable.
Nevertheless, these functions should be testable, but as part of the API of another class. Think of a 3-tier architecture:
load_data
and save_data
are in the data layertransform
is in the business layer.run
call is in the presentation layerLet's try to implement this:
class DataManipulator3:
def __init__(self, data_store, transformer):
self._data_store = data_store
self._transformer = transformer
def run(self):
results = self._data_store.load()
trasformed_results = self._transformer.transform(results)
self._data_store.save(transformed_results)
class DataStore:
def load(self):
''' do stuff to load data, return results'''
return read_csv_as_string('some_file.csv')
def save(self, data):
''' stores string to database'''
store_data_to_database(data)
class Transformer:
def transform(self, results):
''' transforms data, eg get first 10 chars'''
return results[:10]
DataManipulator3(DataStore(), Transformer()).run()
That's not bad, and the Transformer
is easy to test. But:
DataStore
is not handy: the file to read is buried in the code and the database too.DataManipulator
should be able to run a Transformer
on multiple data samples.Hence another version that adresses these issues:
class DataManipulator4:
def __init__(self, transformer):
self._transformer = transformer
def run(self, data_sample):
data = data_sample.load()
results = self._transformer.transform(data)
self.data_sample.save(results)
class DataSample:
def __init__(self, filename, connection)
self._filename = filename
self._connection = connection
def load(self):
''' do stuff to load data, return results'''
return read_csv_as_string(self._filename)
def save(self, data):
''' stores string to database'''
store_data_to_database(self._connection, data)
with get_db_connection() as conn:
DataManipulator4(Transformer()).run(DataSample('some_file.csv', conn))
There's one more point: the filename. Try to prefer file-like object to filenames as arguments, because you can test your code with the io
module:
class DataSample2:
def __init__(self, file, connection)
self._file = file
self._connection = connection
...
dm = DataManipulator4(Transformer())
with get_db_connection() as conn, open('some_file.csv') as f:
dm.run(DataSample2(f, conn))
With mock objects, it's now very easy to test the behaviour of the classes.
Let's summarize the advantages of this code:
DataManipulator2
)run
method is as clean as it should be (as in DataManipulator2
)Transformer
, or a new DataSample
(load from a DB and save to a csv file for instance)Of course, this is really (old style) Java-like. In python, you can simply pass the function transform
instead of an instance of the Transformer
class. But as soon as your transform
begins to be complex, a class is a good solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With