How to normalize a non-normal distribution?

1 Answers

You have two options here. You can either Box-Cox transform or Yeo-Johnson transform. The issue with Box-Cox transform is that it applies only to positive numbers. To use Box-Cox transform, you'll have to take an exponential, perform the Box-Cox transform and then take the log to get the data in the original scale. Box-Cox transform is available in scipy.stats

You can avoid those steps and simply use Yeo-Johnson transform. sklearn provides an API for that

from matplotlib import pyplot as plt
from scipy.stats import normaltest
import numpy as np
from sklearn.preprocessing import PowerTransformer

data=np.array([-0.35714286,-0.28571429,-0.00257143,-0.00271429,-0.00142857,0.,0.,0.,0.00142857,0.00285714,0.00714286,0.00714286,0.01,0.01428571,0.01428571,0.01428571,0.01428571,0.01428571,0.01428571,0.02142857,0.07142857])

pt = PowerTransformer(method='yeo-johnson')
data = data.reshape(-1, 1)
pt.fit(data)
transformed_data = pt.transform(data)

We have transformed our data but we need a way to measure and see if we have moved in the right direction. Since our goal was to move towards being a normal distribution, we will use a normality test.

k2, p = normaltest(data)
transformed_k2, transformed_p = normaltest(transformed_data)

The test returns two values k2 and p. The value of p is of our interest here. if p is greater than some threshold (ex 0.001 or so), we can say reject the hypothesis that data comes from a normal distribution.

In the example above, you'll see that p is greater than 0.001 while transformed_p is less than this threshold indicating that we are moving in the right direction.

159

answered Sep 23 '22 17:09

Clock Slave

Related questions
                            
                                how to install win32clipboard
                            
                                What does single(not double) asterisk * means when unpacking dictionary in Python?
                            
                                Add multiple docs in yaml file | PyYAML
                            
                                Django datetime not validating right
                            
                                How to get quarter beginning date in python
                            
                                Insert a pandas dataframe into a SQLite table
                            
                                SQLAlchemy: filtering by a key in a JSON column
                            
                                Operating the Celery Worker in the ECS Fargate
                            
                                Python requests timeout not working properly
                            
                                best way to distribute keyword arguments?
                            
                                How to use pip with python 2 & 3 installed? (OSX)
                            
                                Python 3.7: Applying the proxy to all parts of pip installation, failing to maintain the proxy variable
                            
                                How to perform MultiLabel stratified sampling?
                            
                                How to execute Shell Script from Flask App [duplicate]
                            
                                "Type 'lxml.etree._ElementUnicodeResult' cannot be serialized"
                            
                                How do I accumulate a sequence of digits in a string and convert them to one number?
                            
                                Setting tick colors of matplotlib 3D plot
                            
                                Putting a python script into a docker container
                            
                                AWS lambda CLI 'update-function-code' does not update lambda_handler file
                            
                                How can I launch pyqt GUI multiple times consequtively in a process?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to normalize a non-normal distribution?

Tags:

python

graph

normalization

normal-distribution

Chipmunkafy

People also ask

1 Answers

Clock Slave

Recent Activity

Donate For Us