Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Architecture best practice in Python script - one function calling all the others, or a "tree" where one function calls functions that call others?

I'm refactoring a Python script for a data science project, and have made it fairly modularised; each function "does one thing". In order to run the script I have a "top-level" function, and my question regards what is considered the best practice for how the functions call each other.

Is it better to have:

(a) a "single trunk with branches" structure where the "top-level" function contains ~12 function calls, each of which does not call any further functions itself

def main_function():
   connect_to_database()
   get_data()
   clean_data()
   ....
   train_model()
   test_model()
   analyse_results()
    

(b) a "branching tree" structure where the "top-level" function has 3-4 function calls, which in turn may call functions, which may call a third level of hierarchy.

def get_and_clean_data():
    connect_to_database()
    get_data()
    clean_data()

def train_and_deploy_model():
    train_model()
    test_model()

def main_function():
    get_and_clean_data()
    train_and_deploy_model()
    analyse_results()
    

Function names in both examples are for illustrative purposes, not actual code.

Approach (a) is relatively easy to follow through sequentially, but either the top-level function can get very long, or the functions within it end up doing more than one thing. Approach (b) allows groupings of smaller functions that each have a clearly defined purpose, but I've found with larger projects, it can be trickier to chase bugs through a stack of callbacks.

I understand that there are no hard and fast rules but I'd like some intuition or rules of thumb about how others approach this.

The script in question is a single .py file ~300 lines long, though I'm interested in the question more broadly too, for example if the code is spread across multiple .py files. TIA

like image 307
user3140106 Avatar asked Oct 27 '25 04:10

user3140106


1 Answers

It really depends on the cases and for me even both can be used at the same time.

One function should do one thing and be short.

If what it does is complicated, it can deserve abstraction and black box usages, that is where you will have subcalls to subfunctions. It makes easy to understand and to test. And moreover, getting a subcode into a function creates self documentation, since you will have to find a name for what it does, and make it easier to understand.

If what it does is in several steps, each step may be a function and then you would have an orchestration function.

"""This can be the main module."""


def orchestrate():  # orchestration function
    data = retrieve_data()
    model = retrieve_model()


"""This can be in a separate module for data"""


def retrieve_data(): # sub orchestration function
    conn = connect_to_database()
    data = get_data(conn)
    data = clean_data(data)
    return data


def connect_to_database():
    pass


def get_data():
    pass


def clean_data(data):
    data = do_complex_sanitization_on_data(data)  # black box for complex
    data = data.upper()  # simple is kept there
    return data


def do_complex_sanitization_on_data(data):
    return data



"""This can be in a separate module for model"""


def retrieve_model(data): # sub orchestration function
    model = train_model(data)
    test_model(model)
    analyse_results(model)


def train_model(data):
    pass


def test_model(model):
    pass


def analyse_results(model):
    pass
like image 148
Floh Avatar answered Oct 29 '25 19:10

Floh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!