I'm using a logistic sigmoid for an application. I compared the times using the scipy.special
function, expit
, versus using the hyperbolic tangent definition of the sigmoidal.
I found that the hyperbolic tangent was 3 times as fast. What is going on here? I also tested times on a sorted array to see if the result was any different.
Here is an example that was run in IPython:
In [1]: from scipy.special import expit
In [2]: myexpit = lambda x: 0.5*tanh(0.5*x) + 0.5
In [3]: x = randn(100000)
In [4]: allclose(expit(x), myexpit(x))
Out[4]: True
In [5]: timeit expit(x)
100 loops, best of 3: 15.2 ms per loop
In [6]: timeit myexpit(x)
100 loops, best of 3: 4.94 ms per loop
In [7]: y = sort(x)
In [8]: timeit expit(y)
100 loops, best of 3: 15.3 ms per loop
In [9]: timeit myexpit(y)
100 loops, best of 3: 4.37 ms per loop
Machine info:
Numpy/Scipy info:
In [1]: np.__version__
Out[1]: '1.12.0'
In [2]: np.__config__.show()
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blis_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
In [3]: import scipy
In [4]: scipy.__version__
Out[4]: '0.18.1'
We observe that the gradient of tanh is four times greater than the gradient of the sigmoid function. This means that using the tanh activation function results in higher values of gradient during training and higher updates in the weights of the network.
Yes, you can use tanh instead of sigmoid function. It depends on your use case. If you want to find output between 0 to 1 then we use sigmoid function. If you want to find output between -1 to 1 then we use tanh function.
Its outputs range from 0 to 1, and are often interpreted as probabilities (in, say, logistic regression). The tanh function, a.k.a. hyperbolic tangent function, is a rescaling of the logistic sigmoid, such that its outputs range from -1 to 1.
Infact, tanh is a wide variety of sigmoid functions including called as hyperbolic tangent functions. Both sigmoid and tanh are S-Shaped curves, the only difference is sigmoid lies between 0 and 1. whereas tanh lies between 1 and -1.
I'll refer future people to this question.
To summarize results from helpful comments:
"Why is using tanh definition of logistic sigmoid faster than scipy's expit?"
Answer: It's not; there's some funny business going on with the tanh
and exp
C functions on my specific machine.
It's turns out that on my machine, the C function for tanh
is faster than exp
. The answer to why this is the case obviously belongs to a different question. When I run the C++ code listed below, I see
tanh: 5.22203
exp: 14.9393
which matches the ~3x increase in the tanh
function when called from Python. The strange thing is that when I run the identical code on a separate machine that has the same OS, I get similar timing results for tanh
and exp
.
#include <iostream>
#include <cmath>
#include <ctime>
using namespace std;
int main() {
double a = -5;
double b = 5;
int N = 10001;
double x[10001];
double y[10001];
double h = (b-a) / (N-1);
clock_t begin, end;
for(int i=0; i < N; i++)
x[i] = a + i*h;
begin = clock();
for(int i=0; i < N; i++)
for(int j=0; j < N; j++)
y[i] = tanh(x[i]);
end = clock();
cout << "tanh: " << double(end - begin) / CLOCKS_PER_SEC << "\n";
begin = clock();
for(int i=0; i < N; i++)
for(int j=0; j < N; j++)
y[i] = exp(x[i]);
end = clock();
cout << "exp: " << double(end - begin) / CLOCKS_PER_SEC << "\n";
return 0;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With