When writing a python class that have different functions for getting the data, and parsing the data; what is the most correct way? You can write it so you are populating self.data... one by one, and then running parse functions to populate self.parsed_data.... Or is it correct to write functions that accept self.data and returns self.parsed_data..?
Examples below. MyClass1 populates self.variables, and MyClass2 takes them as parameters. I think MyClass2 is "most" correct.
So, what is correct? And why? I have been trying to decide upon which of these two coding styles for a while. But I want to know which of these are considered best practice.
class MyClass1(object):
def __init__(self):
self.raw_data = None
def _parse_data(self):
# This is a fairly complex function xml/json parser
raw_data = self.raw_data
data = raw_data # Much for is done to do something with raw_data
cache.set('cache_key', data, 600) # Cache for 10 minutes
return data
def _populate_data(self):
# This function grabs data from an external source
self.raw_data = 'some raw data, xml, json or alike..'
def get_parsed_data(self):
cached_data = cache.get('cache_key')
if cached_data:
return cached_data
else:
self._populate_data()
return self._parse_data()
mc1 = MyClass1()
print mc1.get_parsed_data()
class MyClass2(object):
def _parse_data(self, raw_data):
# This is a fairly complex function xml/json parser
data = raw_data # After some complicated work of parsing raw_data
cache.set('cache_key', data, 600) # Cache for 10 minutes
return data
def _get_data(self):
# This function grabs data from an external source
return 'some raw data, xml, json or alike..'
def get_parsed_data(self):
cached_data = cache.get('cache_key')
if cached_data:
return cached_data
else:
return self._populate_data(self._get_data())
mc2 = MyClass2()
print mc1.get_parsed_data()
It's down to personal preference, finally. But IMO, it's better to just have a module-level function called parse_data
which takes in the raw data, does a bunch of work and returns the parsed data. I assume your cache keys are somehow derived from the raw data, which means the parse_data
function can also implement your caching logic.
The reason I prefer a function vs having a full-blown class is the simplicity. If you want to have a class which provides data fields pulled from your raw data, so that users of your objects can do something like obj.some_attr
instead of having to look inside some lower-level data construct (e.g. JSON, XML, Python dict, etc.), I would make a simple "value object" class which only contains data fields, and no parsing logic, and have the aforementioned parse_data
function return an instance of this class (essentially acting as a factory function for your data class). This leads to less state, simpler objects and no laziness, making your code easier to reason about.
This would also make it easier to unit test consumers of this class, because in those tests, you can simply instantiate the data object with fields, instead of having to provide a big blob of test raw data.
For me the most correct class is the class the user understands and uses with as few errors as possible.
When I look at class 2 I ask myself how would I use it...
mc2 = MyClass2()
print mc1.get_parsed_data()
I would like only
print get_parsed_data()
Sometimes it is better to not write classes at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With