I just came across one of these Kernels and couldn't understand what does <code>numpy.log1p()</code> do in the third pipeline of this code (House Prediction dataset in Kaggle). Numpy documentation said Returns: - An array with natural logarithmic value of x + 1 - where x belongs to all elements of input array. What is the purpose of finding log with one added while finding skewness of original and transformed array of same features? What does it actually do?

The NumPy docs give a hint: <blockquote> For real-valued input, <code>log1p</code> is accurate also for <code>x</code> so small that <code>1 + x == 1</code> in floating-point accuracy. </blockquote> So for example let's add a tiny non-zero number and <code>1.0</code>. Rounding errors make it a <code>1.0</code>. <pre class="prettyprint"><code>>>> 1e-100 == 0.0 False >>> 1e-100 + 1.0 == 1.0 True </code></pre> If we try to take the <code>log</code> of that incorrect sum, we get an incorrect result (compare to WolframAlpha): <pre class="prettyprint"><code>>>> np.log(1e-100 + 1) 0.0 </code></pre> But if we use <code>log1p()</code>, we get the correct result <pre class="prettyprint"><code>>>> np.log1p(1e-100) 1e-100 </code></pre> The same principle holds for <code>exp1m()</code> and <code>logaddexp()</code>: The're more accurate for small <code>x</code>.

<img src="https://i.stack.imgur.com/giq8G.png" alt="https://docs.scipy.org/doc/numpy/reference/generated/numpy.log1p.html"> <img src="https://i.stack.imgur.com/ycPOC.png" alt="enter image description here"> If x is in range 0...+Inf then it will never cause an error (as we know log(0) would cause an error). Not always the best choice, because as you see you will lose a big curve before x = 0 that is one of the best things about log function

Purpose of `numpy.log1p( )`?

2 Answers

The NumPy docs give a hint:

For real-valued input, log1p is accurate also for x so small that 1 + x == 1 in floating-point accuracy.

So for example let's add a tiny non-zero number and 1.0. Rounding errors make it a 1.0.

>>> 1e-100 == 0.0 False >>> 1e-100 + 1.0 == 1.0 True

If we try to take the log of that incorrect sum, we get an incorrect result (compare to WolframAlpha):

>>> np.log(1e-100 + 1) 0.0

But if we use log1p(), we get the correct result

>>> np.log1p(1e-100) 1e-100

The same principle holds for exp1m() and logaddexp(): The're more accurate for small x.

answered Oct 17 '22 07:10

Nils Werner

enter image description here

If x is in range 0...+Inf then it will never cause an error (as we know log(0) would cause an error).

Not always the best choice, because as you see you will lose a big curve before x = 0 that is one of the best things about log function

answered Oct 17 '22 08:10

Evalds Urtans

Related questions
                            
                                overwriting file in ziparchive
                            
                                Can SQLAlchemy be used with Google Cloud SQL?
                            
                                Best way to make Flask-Login's login_required the default
                            
                                I can "pickle local objects" if I use a derived class?
                            
                                Django Standalone Script
                            
                                Converting list of tuples into a dictionary
                            
                                Find last match with python regular expression
                            
                                How to display only a left and bottom box border in matplotlib?
                            
                                Correlation between columns in DataFrame
                            
                                Django humanize outside of template?
                            
                                Turn Off Autosave in IPython Notebook
                            
                                How do I install python3-gi within virtualenv?
                            
                                Django JSONField filtering
                            
                                How do I read a parquet in PySpark written from Spark?
                            
                                Python convert seconds to datetime date and time [duplicate]
                            
                                cmake error 'the source does not appear to contain CMakeLists.txt'
                            
                                'in-place' string modifications in Python
                            
                                Check if module exists, if not install it
                            
                                Getting SQLAlchemy to issue CREATE SCHEMA on create_all
                            
                                ImportError: No module named xgboost

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Purpose of `numpy.log1p( )`?

Tags:

python

numpy

Sabah

People also ask

2 Answers

Nils Werner

Evalds Urtans

Recent Activity

Donate For Us