Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimal multiple return values in scientific python

I'm using scipy/numpy for research code instead of matlab. There is one flaw, I was running into frequently. I found a work-around solution, but want to check for a best practice and better solution. Imagine some mathematical optimisation:

def calculation (data, max_it=10000, tol = 1e-5):
    k = 0
    rmse = np.inf 
    while k < max_it and rmse > tol:
        #calc and modify data - rmse becomes smaller in each iteration
        k += 1
    return data

It works fine, I embed it into my code, in multiple locations, e.g.:

 import module
 d = module.calculation (data)

But sometimes I want to check further insights and need multiple return values. If I simply append multiple return values, I have to modify the other code and unpack the first return value. This is one of the few situations were I prefer matlab to scipy.In matlab only the first return value is evaluated, unless you explicitly demand the rest.

So my work-around for matlab-like (= optimal) multiple return values are global variables [of the module]

def calculation (data, max_it=10000, tol = 1e-5):
    global k
    global rmse
    k = 0
    rmse = np.inf 
    while k < max_it and rmse > tol:
        #calc and modify data - rmse becomes smaller in each iteration
        k += 1
    return data

My function calls work without modification and if I want to verify something in ipython, Iset some variables global reload(module) and check the insight with module.rmse.

But I could also imagine a OO-aproach from the beginning, or to use pdb, or to use other ipython magic

like image 927
user421929 Avatar asked Mar 24 '23 04:03

user421929


1 Answers

You could specify that you want more info returned using an info=True argument when calling calculation. This is the approach taken by np.unique (with its return_inverse and return_index parameters) and scipy.optimize.leastsq (with its full_output parameter):

def calculation(data, max_it=10000, tol = 1e-5, info=False):
    k = 0
    rmse = np.inf 
    while k < max_it and rmse > tol:
        #calc and modify data - rmse becomes smaller in each iteration
        k += 1
    if info:
        return data, k, rmse
    else:
        return data

Or, you could assign additional attributes on the calculation function:

def calculation(data, max_it=10000, tol = 1e-5):
    k = 0
    rmse = np.inf 
    while k < max_it and rmse > tol:
        #calc and modify data - rmse becomes smaller in each iteration
        k += 1
    calculation.k = k
    calculation.rmse = rmse
    return data

The added info would then be accessible with

import module
d = module.calculation(data)
rmse = module.calculation.rmse

Note that this latter approach will not work well if calculation is run concurrently from multiple threads...

In CPython (due to the GIL), only one thread can execute at any given time, so there is little attraction to running calculation in multiple threads. But who knows? there may be some situation which calls for some use of threads on a small scale, such as perhaps in a GUI. There, accessing calculation.k or calculation.rmse might return incorrect values.

Moreover, the Zen of Python says, "Explicit is better than implicit".

So I would recommend the first approach over the second.

like image 106
unutbu Avatar answered Apr 02 '23 05:04

unutbu