<p>For a scalar variable <code>x</code>, we know how to write down a numerically stable sigmoid function in python:</p> <pre class="prettyprint lang-py prettyprint-override"><code>def sigmoid(x): if x >= 0: return 1. / ( 1. + np.exp(-x) ) else: return exp(x) / ( 1. + np.exp(x) ) </code></pre> <p>For a list of scalars, say <code>z = [x_1, x_2, x_3, ...]</code>, and suppose we don't know the sign of each <code>x_i</code> beforehand, we could generalize the above definition and try:</p> <pre class="prettyprint lang-py prettyprint-override"><code>def sigmoid(z): result = [] for x in z: if x >= 0: result.append(1. / ( 1. + np.exp(-x) ) ) else: result.append( exp(x) / ( 1. + np.exp(x) ) ) return result </code></pre> <p>This seems to work. However, I feel this is perhaps not the most pythonic way. How should I improve the definition in terms of 'cleanness'? Say, is there a way to use comprehension to shorten the function definition?</p> <p>I'm sorry if this has been asked, because I cannot find similar questions on SO. Thank you very much for your time and help!</p>

<p>You are right, you can do better by using <code>np.where</code>, the numpy equivalent of <code>if</code>:</p> <pre class="prettyprint"><code>def sigmoid(x): return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x))) </code></pre> <p>This function takes a numpy array <code>x</code> and returns a numpy array, too:</p> <pre class="prettyprint"><code>data = np.arange(-5,5) sigmoid(data) #array([0.00669285, 0.01798621, 0.04742587, 0.11920292, 0.26894142, # 0.5 , 0.73105858, 0.88079708, 0.95257413, 0.98201379]) </code></pre>

<p>Fully correct answer (no warnings) was provided by @hao peng but solution wasn't explained clearly. This would be too long for a comment, so I'll go for an answer.</p> <p>Let's start with analysis of a few answers (pure <code>numpy</code> answers only):</p> <h3>@DYZ accepted answer</h3> <p>This one is correct mathematically but still gives us a warning. Let's look at the code:</p> <pre class="prettyprint"><code>def sigmoid(x): return np.where( x >= 0, # condition 1 / (1 + np.exp(-x)), # For positive values np.exp(x) / (1 + np.exp(x)) # For negative values ) </code></pre> <p>As both branches are evaluated (they are arguments, they have to be), the first branch will give us a warning for negative values and the second for positive.</p> <p>Although the warnings will be raised, results from overflows <strong>will not be incorporated</strong>, hence the result is correct.</p> <h3>Downsides</h3> <ul> <li>unnecessary evaluation of both branches (twice as many operations as needed)</li> <li>warnings are thrown</li> </ul> <h3>@ynn answer</h3> <p>This one is almost correct, <strong>BUT</strong> will work only on floating point values, see below:</p> <pre class="prettyprint"><code>def sigmoid(x): return np.piecewise( x, [x > 0], [lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))], ) sigmoid(np.array([0.0, 1.0])) # [0.5 0.73105858] correct sigmoid(np.array([0, 1])) # [0, 0] incorrect </code></pre> <p>Why? Longer answer was provided by <a href="https://stackoverflow.com/a/27060141/10886420"> @mhawke</a> in another thread, but the main point is:</p> <blockquote> <p>It seems that piecewise() converts the return values to the same type as the input so, when an integer is input an integer conversion is performed on the result, which is then returned.</p> </blockquote> <h3>Downsides</h3> <ul> <li>no automatic casting due to strange behavior of piecewise function</li> </ul> <h3>Improved @hao peng answer</h3> <p>Idea of stable sigmoid comes from the fact that:</p> <p><img src="https://i.stack.imgur.com/Nz7fK.gif" alt="sigmoid"></p> <p>Both versions are equally efficient in terms of operations if coded correctly (one <code>exp</code> evaluation is enough). Now:</p> <ul> <li> <code>e^x</code> will overflow when <code>x</code> is positive</li> <li> <code>e^-x</code> will overflow when <code>x</code> is negative</li> </ul> <p>Hence we have to branch on <code>x</code> equal to zero. Using <code>numpy</code>'s masking we can transform only the part of array which is positive or negative with specific sigmoid implementations.</p> <p>See code comments for additional points:</p> <pre class="prettyprint"><code>def _positive_sigmoid(x): return 1 / (1 + np.exp(-x)) def _negative_sigmoid(x): # Cache exp so you won't have to calculate it twice exp = np.exp(x) return exp / (exp + 1) def sigmoid(x): positive = x >= 0 # Boolean array inversion is faster than another comparison negative = ~positive # empty contains junk hence will be faster to allocate # Zeros has to zero-out the array after allocation, no need for that # See comment to the answer when it comes to dtype result = np.empty_like(x, dtype=np.float) result[positive] = _positive_sigmoid(x[positive]) result[negative] = _negative_sigmoid(x[negative]) return result </code></pre> <h3>Time measurements</h3> <p>Results (50 times case test from <code>ynn</code>):</p> <pre class="prettyprint"><code>289.5070939064026 #DYZ 222.49267292022705 #ynn 230.81086134910583 #this </code></pre> <p>Indeed piecewise seems faster (not sure about the reasons, maybe masking and additional masking ops make it slower).</p> <p>Code below was used:</p> <pre class="prettyprint"><code>import time import numpy as np def _positive_sigmoid(x): return 1 / (1 + np.exp(-x)) def _negative_sigmoid(x): # Cache exp so you won't have to calculate it twice exp = np.exp(x) return exp / (exp + 1) def sigmoid(x): positive = x >= 0 # Boolean array inversion is faster than another comparison negative = ~positive # empty contains juke hence will be faster to allocate than zeros result = np.empty_like(x) result[positive] = _positive_sigmoid(x[positive]) result[negative] = _negative_sigmoid(x[negative]) return result N = int(1e4) x = np.random.uniform(size=(N, N)) start: float = time.time() for _ in range(50): y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x))) y1 += 1 end: float = time.time() print(end - start) start: float = time.time() for _ in range(50): y2 = np.piecewise( x, [x > 0], [lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))], ) y2 += 1 end: float = time.time() print(end - start) start: float = time.time() for _ in range(50): y2 = sigmoid(x) y2 += 1 end: float = time.time() print(end - start) </code></pre>

optimal way of defining a numerically stable sigmoid function for a list in python

Tags:

python

sigmoid

For a scalar variable x, we know how to write down a numerically stable sigmoid function in python:

def sigmoid(x):
    if x >= 0:
        return 1. / ( 1. + np.exp(-x) )
    else:
        return exp(x) / ( 1. + np.exp(x) )

For a list of scalars, say z = [x_1, x_2, x_3, ...], and suppose we don't know the sign of each x_i beforehand, we could generalize the above definition and try:

def sigmoid(z):
    result = []
    for x in z:
        if x >= 0:
            result.append(1. / ( 1. + np.exp(-x) ) )
        else:
            result.append( exp(x) / ( 1. + np.exp(x) ) )
    return result

This seems to work. However, I feel this is perhaps not the most pythonic way. How should I improve the definition in terms of 'cleanness'? Say, is there a way to use comprehension to shorten the function definition?

I'm sorry if this has been asked, because I cannot find similar questions on SO. Thank you very much for your time and help!

368

asked Aug 22 '18 23:08

RandomWalker

2 Answers

You are right, you can do better by using np.where, the numpy equivalent of if:

def sigmoid(x):
    return np.where(x >= 0, 
                    1 / (1 + np.exp(-x)), 
                    np.exp(x) / (1 + np.exp(x)))

This function takes a numpy array x and returns a numpy array, too:

data = np.arange(-5,5)
sigmoid(data)
#array([0.00669285, 0.01798621, 0.04742587, 0.11920292, 0.26894142,
#       0.5       , 0.73105858, 0.88079708, 0.95257413, 0.98201379])

145

answered Oct 16 '22 09:10

DYZ

Fully correct answer (no warnings) was provided by @hao peng but solution wasn't explained clearly. This would be too long for a comment, so I'll go for an answer.

Let's start with analysis of a few answers (pure numpy answers only):

@DYZ accepted answer

This one is correct mathematically but still gives us a warning. Let's look at the code:

def sigmoid(x):
    return np.where(
            x >= 0, # condition
            1 / (1 + np.exp(-x)), # For positive values
            np.exp(x) / (1 + np.exp(x)) # For negative values
    )

As both branches are evaluated (they are arguments, they have to be), the first branch will give us a warning for negative values and the second for positive.

Although the warnings will be raised, results from overflows will not be incorporated, hence the result is correct.

Downsides

unnecessary evaluation of both branches (twice as many operations as needed)
warnings are thrown

@ynn answer

This one is almost correct, BUT will work only on floating point values, see below:

def sigmoid(x):
    return np.piecewise(
        x,
        [x > 0],
        [lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))],
    )


sigmoid(np.array([0.0, 1.0]))  # [0.5 0.73105858] correct
sigmoid(np.array([0, 1]))  # [0, 0] incorrect

Why? Longer answer was provided by @mhawke in another thread, but the main point is:

It seems that piecewise() converts the return values to the same type as the input so, when an integer is input an integer conversion is performed on the result, which is then returned.

Downsides

no automatic casting due to strange behavior of piecewise function

Improved @hao peng answer

Idea of stable sigmoid comes from the fact that:

sigmoid

Both versions are equally efficient in terms of operations if coded correctly (one exp evaluation is enough). Now:

e^x will overflow when x is positive
e^-x will overflow when x is negative

Hence we have to branch on x equal to zero. Using numpy's masking we can transform only the part of array which is positive or negative with specific sigmoid implementations.

See code comments for additional points:

def _positive_sigmoid(x):
    return 1 / (1 + np.exp(-x))


def _negative_sigmoid(x):
    # Cache exp so you won't have to calculate it twice
    exp = np.exp(x)
    return exp / (exp + 1)


def sigmoid(x):
    positive = x >= 0
    # Boolean array inversion is faster than another comparison
    negative = ~positive

    # empty contains junk hence will be faster to allocate
    # Zeros has to zero-out the array after allocation, no need for that
    # See comment to the answer when it comes to dtype
    result = np.empty_like(x, dtype=np.float)
    result[positive] = _positive_sigmoid(x[positive])
    result[negative] = _negative_sigmoid(x[negative])

    return result

Time measurements

Results (50 times case test from ynn):

289.5070939064026 #DYZ
222.49267292022705 #ynn
230.81086134910583 #this

Indeed piecewise seems faster (not sure about the reasons, maybe masking and additional masking ops make it slower).

Code below was used:

import time

import numpy as np


def _positive_sigmoid(x):
    return 1 / (1 + np.exp(-x))


def _negative_sigmoid(x):
    # Cache exp so you won't have to calculate it twice
    exp = np.exp(x)
    return exp / (exp + 1)


def sigmoid(x):
    positive = x >= 0
    # Boolean array inversion is faster than another comparison
    negative = ~positive

    # empty contains juke hence will be faster to allocate than zeros
    result = np.empty_like(x)
    result[positive] = _positive_sigmoid(x[positive])
    result[negative] = _negative_sigmoid(x[negative])

    return result


N = int(1e4)
x = np.random.uniform(size=(N, N))

start: float = time.time()
for _ in range(50):
    y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
    y1 += 1
end: float = time.time()
print(end - start)

start: float = time.time()
for _ in range(50):
    y2 = np.piecewise(
        x,
        [x > 0],
        [lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))],
    )
    y2 += 1
end: float = time.time()
print(end - start)

start: float = time.time()
for _ in range(50):
    y2 = sigmoid(x)
    y2 += 1
end: float = time.time()
print(end - start)

answered Oct 16 '22 09:10

Szymon Maszke

Related questions
                            
                                np_utils.to_categorical Reverse
                            
                                Python Matplotlib FuncAnimation.save() only saves 100 frames
                            
                                How to boost a Keras based neural network using AdaBoost?
                            
                                Python error: "socket.error: [Errno 11] Resource temporarily unavailable" when sending image
                            
                                Pandas: create dataframe without auto ordering column names alphabetically
                            
                                Sequentially read huge CSV file in python
                            
                                Pandas missing x tick labels [duplicate]
                            
                                Generate sql with subquery as a column in select statement using SQLAlchemy
                            
                                What is the explicit python3 type for dict_keys for isinstance() check?
                            
                                what does `yield from asyncio.sleep(delay)` do?
                            
                                how to get the name of column with maximum value in pyspark dataframe
                            
                                Swapping rows within the same pandas dataframe
                            
                                Why is my Protobuf message (in Python) ignoring zero values?
                            
                                Scatter plot with colormap makes X-axis disappear
                            
                                Efficiently download files asynchronously with requests
                            
                                Django REST: Uploading and serializing multiple images
                            
                                Python splitting list to sublists at given start/end keywords
                            
                                How to run a cron job with pipenv?
                            
                                PyTorch: Testing with torchvision.datasets.ImageFolder and DataLoader
                            
                                More than one estimator in GridSearchCV(sklearn)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

optimal way of defining a numerically stable sigmoid function for a list in python

Tags:

python

sigmoid

RandomWalker

People also ask

2 Answers

DYZ

@DYZ accepted answer

Downsides

@ynn answer

Downsides

Improved @hao peng answer

Time measurements

Szymon Maszke

Recent Activity

Donate For Us