How to structure Machine Learning projects using Object Oriented programming in Python? [closed]

Tags:

I have observed that staticians and machine learning scientist generally doesnt follow OOPS for ML/data science projects when using Python (or other languages).

Mostly it should be due to lack of understanding of best software engineering practises in oops while developing ML code for production. Because they mostly come from math & statistics education background than computer science.

Days when ML scientist develop ad hoc protype code and another software team make it production ready are over in the industry.

enter image description here

Questions

How do we structure code using OOP for ML project?
Should every major task (from picture above) like data cleaning, feature transformation, grid search, model validation etc. be a individual class? What are the recommended code design practises for ML?
Any good github links with well strcutured code for reference (may be a well written kaggle solution)
Should every class like data cleaning have fit(), transform(), fit_transform() function for every process like remove_missing(), outlier_removal()? When this is done why is scikit-learn BaseEstimator be usually inherited?
What should be the structure of typical config file for ML projects in production?

989

asked Oct 28 '17 13:10

GeorgeOfTheRF

1 Answers

You are right about one thing being special about ML: data scientists are generally clever people, so they have no problem in presenting their ideas in code. The problem is that they tend to create fire&forget code, because they lack software development craftsmanship - but ideally this shouldn't be the case.

When writing code it shouldn't make any difference what the code is for¹. ML is just another domain like anything else, and should follow clean code principles.

The most important aspect always should be SOLID. Many important aspects directly follow: maintainability, readability, flexibility, testability, reliability etc. What you can add to this mix of features is risk of change. It doesn't matter whether a piece of code is pure ML, or banking business logic, or audiological algorithm for a hearing instrument. All the same - the implementation will be read by other developers, will contain bugs to fix, will be tested (hopefully) and possibly refactored and extended.

Let me try to explain this in more detail while addressing some of your questions:

1,2) You shouldn't think that OOP is the goal in itself. If there is a concept that can be modeled as a class and this will make its usage easy for other developers, it will be readable, easy to extend, easy to test, easy to avoid bugs then of course - make it a class. But unless it's needed, you shouldn't follow the BDUF antipattern. Start with free functions and evolve into a better interface if needed.

4) Such complex inheritance hierarchies are typically created to allow implementation to be extensible (see "O" from SOLID). In this case, library users can inherit BaseEstimator and it's easy to see what methods can they override and how this will fit into scikit's existing structure.

5) Almost the same principles as for code, but with people who will create/edit these config files in mind. What will be the easiest format for them? How to choose parameter names so it will be obvious what do they mean, even for a beginner, who is just starting to use your product?

All these things should be combined with guessing how likely is it that this piece of code will be changed/extended in the future? If you are sure something should be written in stone, don't worry about all aspects too much (e.g. focus only on readability), and direct your efforts to more critical parts of the implementation.

To sum up: think about people who will interact with what you create in the future. In case of products/config files/user interfaces it should be always "user first". In case of code, try to put yourself in the shoes of a future developer who will want to fix/extend/understand your code.

¹ There are of course some special cases, like code that needs to be formally proven correct or extensively documented because of formal regulations and this main goal imposes some particular constructs/practices.

130

answered Sep 22 '22 12:09

BartoszKP

Related questions
                            
                                Firebase auth onUpdate cloud function for when a user updates their email
                            
                                How to create a repeating animated moving gradient drawable, like an indeterminate progress?
                            
                                Deleting Apollo Client cache for a given query and every set of variables
                            
                                Why are iframe requests not sending cookies?
                            
                                ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running
                            
                                How to setup Axios interceptors with React Context properly?
                            
                                How to have an active binding know if it's called as a function?
                            
                                What happens when mandatory RVO is applied to a reference that's extending the lifetime of a temporary?
                            
                                Ping Tasks will not complete
                            
                                How to stress-test video streaming server?
                            
                                BindingRedirect to different assembly name
                            
                                Android Download Progress

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With