Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is using tanh definition of logistic sigmoid faster than scipy's expit?

I'm using a logistic sigmoid for an application. I compared the times using the scipy.special function, expit, versus using the hyperbolic tangent definition of the sigmoidal.

I found that the hyperbolic tangent was 3 times as fast. What is going on here? I also tested times on a sorted array to see if the result was any different.

Here is an example that was run in IPython:

In [1]: from scipy.special import expit

In [2]: myexpit = lambda x: 0.5*tanh(0.5*x) + 0.5

In [3]: x = randn(100000)

In [4]: allclose(expit(x), myexpit(x))
Out[4]: True

In [5]: timeit expit(x)
100 loops, best of 3: 15.2 ms per loop

In [6]: timeit myexpit(x)
100 loops, best of 3: 4.94 ms per loop

In [7]: y = sort(x)

In [8]: timeit expit(y)
100 loops, best of 3: 15.3 ms per loop

In [9]: timeit myexpit(y)
100 loops, best of 3: 4.37 ms per loop

Edit:

Machine info:

  • Ubuntu 16.04
  • RAM: 7.4 GB
  • Intel Core i7-3517U CPU @ 1.90GHz × 4

Numpy/Scipy info:

In [1]: np.__version__
Out[1]: '1.12.0'

In [2]: np.__config__.show()
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blis_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE

In [3]: import scipy

In [4]: scipy.__version__
Out[4]: '0.18.1'
like image 891
Matt Hancock Avatar asked Mar 26 '17 19:03

Matt Hancock


People also ask

What is the relationship between the sigmoid and tanh activation?

We observe that the gradient of tanh is four times greater than the gradient of the sigmoid function. This means that using the tanh activation function results in higher values of gradient during training and higher updates in the weights of the network.

Can we use tanh function in place of sigmoid in logistic regression?

Yes, you can use tanh instead of sigmoid function. It depends on your use case. If you want to find output between 0 to 1 then we use sigmoid function. If you want to find output between -1 to 1 then we use tanh function.

How is tanh related to the logistic function?

Its outputs range from 0 to 1, and are often interpreted as probabilities (in, say, logistic regression). The tanh function, a.k.a. hyperbolic tangent function, is a rescaling of the logistic sigmoid, such that its outputs range from -1 to 1.

Is tanh the same as sigmoid function?

Infact, tanh is a wide variety of sigmoid functions including called as hyperbolic tangent functions. Both sigmoid and tanh are S-Shaped curves, the only difference is sigmoid lies between 0 and 1. whereas tanh lies between 1 and -1.


1 Answers

edit:

I'll refer future people to this question.


To summarize results from helpful comments:

"Why is using tanh definition of logistic sigmoid faster than scipy's expit?"

Answer: It's not; there's some funny business going on with the tanh and exp C functions on my specific machine.

It's turns out that on my machine, the C function for tanh is faster than exp. The answer to why this is the case obviously belongs to a different question. When I run the C++ code listed below, I see

tanh: 5.22203
exp: 14.9393

which matches the ~3x increase in the tanh function when called from Python. The strange thing is that when I run the identical code on a separate machine that has the same OS, I get similar timing results for tanh and exp.

#include <iostream>
#include <cmath>
#include <ctime>

using namespace std;

int main() {
    double a = -5;
    double b =  5;
    int N =  10001;
    double x[10001];
    double y[10001];
    double h = (b-a) / (N-1);

    clock_t begin, end;

    for(int i=0; i < N; i++)
        x[i] = a + i*h;

    begin = clock();

    for(int i=0; i < N; i++)
        for(int j=0; j < N; j++)
            y[i] = tanh(x[i]);

    end = clock();

    cout << "tanh: " << double(end - begin) / CLOCKS_PER_SEC << "\n";

    begin = clock();

    for(int i=0; i < N; i++)
        for(int j=0; j < N; j++)
            y[i] = exp(x[i]);

    end = clock();

    cout << "exp: " << double(end - begin) / CLOCKS_PER_SEC << "\n";


    return 0;
}
like image 107
Matt Hancock Avatar answered Sep 30 '22 21:09

Matt Hancock