I'm refactoring a Python script for a data science project, and have made it fairly modularised; each function "does one thing". In order to run the script I have a "top-level" function, and my question regards what is considered the best practice for how the functions call each other.
Is it better to have:
(a) a "single trunk with branches" structure where the "top-level" function contains ~12 function calls, each of which does not call any further functions itself
def main_function():
connect_to_database()
get_data()
clean_data()
....
train_model()
test_model()
analyse_results()
(b) a "branching tree" structure where the "top-level" function has 3-4 function calls, which in turn may call functions, which may call a third level of hierarchy.
def get_and_clean_data():
connect_to_database()
get_data()
clean_data()
def train_and_deploy_model():
train_model()
test_model()
def main_function():
get_and_clean_data()
train_and_deploy_model()
analyse_results()
Function names in both examples are for illustrative purposes, not actual code.
Approach (a) is relatively easy to follow through sequentially, but either the top-level function can get very long, or the functions within it end up doing more than one thing. Approach (b) allows groupings of smaller functions that each have a clearly defined purpose, but I've found with larger projects, it can be trickier to chase bugs through a stack of callbacks.
I understand that there are no hard and fast rules but I'd like some intuition or rules of thumb about how others approach this.
The script in question is a single .py file ~300 lines long, though I'm interested in the question more broadly too, for example if the code is spread across multiple .py files. TIA
It really depends on the cases and for me even both can be used at the same time.
One function should do one thing and be short.
If what it does is complicated, it can deserve abstraction and black box usages, that is where you will have subcalls to subfunctions. It makes easy to understand and to test. And moreover, getting a subcode into a function creates self documentation, since you will have to find a name for what it does, and make it easier to understand.
If what it does is in several steps, each step may be a function and then you would have an orchestration function.
"""This can be the main module."""
def orchestrate(): # orchestration function
data = retrieve_data()
model = retrieve_model()
"""This can be in a separate module for data"""
def retrieve_data(): # sub orchestration function
conn = connect_to_database()
data = get_data(conn)
data = clean_data(data)
return data
def connect_to_database():
pass
def get_data():
pass
def clean_data(data):
data = do_complex_sanitization_on_data(data) # black box for complex
data = data.upper() # simple is kept there
return data
def do_complex_sanitization_on_data(data):
return data
"""This can be in a separate module for model"""
def retrieve_model(data): # sub orchestration function
model = train_model(data)
test_model(model)
analyse_results(model)
def train_model(data):
pass
def test_model(model):
pass
def analyse_results(model):
pass
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With