Im running several machine learning algorithms with sklearn in a for loop and want to see how long each of them takes. The problem is I also need to return a value and DONT want to have to run it more than once because each algorithm takes so long. Is there a way to capture the return value 'clf' using python's timeit module or a similar one with a function like this... <pre class="prettyprint"><code>def RandomForest(train_input, train_output): clf = ensemble.RandomForestClassifier(n_estimators=10) clf.fit(train_input, train_output) return clf </code></pre> when I call the function like this <pre class="prettyprint"><code>t = Timer(lambda : RandomForest(trainX,trainy)) print t.timeit(number=1) </code></pre> P.S. I also dont want to set a global 'clf' because I might want to do multithreading or multiprocessing later.

Funnily enough, I'm also doing machine-learning, and have a similar requirement ;-) I solved it as follows, by writing a function, that: <ul> <li>runs your function</li> <li>prints the running time, along with the name of your function</li> <li>returns the results</li> </ul> Let's say you want to time: <pre class="prettyprint"><code>clf = RandomForest(train_input, train_output) </code></pre> Then do: <pre class="prettyprint"><code>clf = time_fn( RandomForest, train_input, train_output ) </code></pre> Stdout will show something like: <pre class="prettyprint"><code>mymodule.RandomForest: 0.421609s </code></pre> Code for time_fn: <pre class="prettyprint"><code>import time def time_fn( fn, *args, **kwargs ): start = time.clock() results = fn( *args, **kwargs ) end = time.clock() fn_name = fn.__module__ + "." + fn.__name__ print fn_name + ": " + str(end-start) + "s" return results </code></pre>

How can I capture return value with Python timeit module?

Tags:

python

timeit

python-2.7

scikit-learn

Im running several machine learning algorithms with sklearn in a for loop and want to see how long each of them takes. The problem is I also need to return a value and DONT want to have to run it more than once because each algorithm takes so long. Is there a way to capture the return value 'clf' using python's timeit module or a similar one with a function like this...

def RandomForest(train_input, train_output):
    clf = ensemble.RandomForestClassifier(n_estimators=10)
    clf.fit(train_input, train_output)
    return clf

when I call the function like this

t = Timer(lambda : RandomForest(trainX,trainy))
print t.timeit(number=1)

P.S. I also dont want to set a global 'clf' because I might want to do multithreading or multiprocessing later.

423

asked Jul 17 '14 19:07

Leon

3 Answers

For Python 3.5 you can override the value of timeit.template

timeit.template = """
def inner(_it, _timer{init}):
    {setup}
    _t0 = _timer()
    for _i in _it:
        retval = {stmt}
    _t1 = _timer()
    return _t1 - _t0, retval
"""

unutbu's answer works for python 3.4 but not 3.5 as the _template_func function appears to have been removed in 3.5

answered Oct 19 '22 01:10

Brendan Cody-Kenny

The problem boils down to timeit._template_func not returning the function's return value:

def _template_func(setup, func):
    """Create a timer function. Used if the "statement" is a callable."""
    def inner(_it, _timer, _func=func):
        setup()
        _t0 = _timer()
        for _i in _it:
            _func()
        _t1 = _timer()
        return _t1 - _t0
    return inner

We can bend timeit to our will with a bit of monkey-patching:

import timeit
import time

def _template_func(setup, func):
    """Create a timer function. Used if the "statement" is a callable."""
    def inner(_it, _timer, _func=func):
        setup()
        _t0 = _timer()
        for _i in _it:
            retval = _func()
        _t1 = _timer()
        return _t1 - _t0, retval
    return inner

timeit._template_func = _template_func

def foo():
    time.sleep(1)
    return 42

t = timeit.Timer(foo)
print(t.timeit(number=1))

returns

(1.0010340213775635, 42)

The first value is the timeit result (in seconds), the second value is the function's return value.

Note that the monkey-patch above only affects the behavior of timeit when a callable is passed timeit.Timer. If you pass a string statement, then you'd have to (similarly) monkey-patch the timeit.template string.

answered Oct 19 '22 03:10

unutbu

Funnily enough, I'm also doing machine-learning, and have a similar requirement ;-)

I solved it as follows, by writing a function, that:

runs your function
prints the running time, along with the name of your function
returns the results

Let's say you want to time:

clf = RandomForest(train_input, train_output)

Then do:

clf = time_fn( RandomForest, train_input, train_output )

Stdout will show something like:

mymodule.RandomForest: 0.421609s

Code for time_fn:

import time

def time_fn( fn, *args, **kwargs ):
    start = time.clock()
    results = fn( *args, **kwargs )
    end = time.clock()
    fn_name = fn.__module__ + "." + fn.__name__
    print fn_name + ": " + str(end-start) + "s"
    return results

answered Oct 19 '22 01:10

Hugh Perkins

Related questions
                            
                                HDF5 taking more space than CSV?
                            
                                How to find max value in a numpy array column?
                            
                                Python numpy.square vs **
                            
                                How big can the input to the input() function be?
                            
                                Django Rest Framework - Missing Static Directory
                            
                                What is the difference between numpy.linalg.lstsq and scipy.linalg.lstsq?
                            
                                Pandas "diff()" with string
                            
                                Shuffling non-zero elements of each row in an array - Python / NumPy
                            
                                Handle generator exceptions in its consumer
                            
                                Measure (max) memory usage with IPython—like timeit but memit
                            
                                How to refresh an already opened web page
                            
                                Get Confusion Matrix From a Keras Multiclass Model [duplicate]
                            
                                How to fix "<string> DeprecationWarning: invalid escape sequence" in Python?
                            
                                MySQL-db lib for Python 3.x?
                            
                                Why Python built in "all" function returns True for empty iterables?
                            
                                in Numpy, how to zip two 2-D arrays?
                            
                                VSCode's debugging mode always stop at first line
                            
                                Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead
                            
                                "lambda" vs. "operator.attrgetter('xxx')" as a sort key function
                            
                                How do I exclude a few columns from a DataFrame plot?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With