I'm trying to find a Python method/library for testing correlation between the independent variables X and the binary output Y.. So for example, lets say I have the following data and output: X Y 0.65 1 0.11 0 0.13 0 0.35 1 0.21 0 ... Lets say the output Y is 1 if (X > 0.3) and 0 otherwise. If I don't know this correlation (the threshold value 0.3), is there a statistical method/test to find out the degree of correlation between X and Y? So for example, some method that returns <pre class="prettyprint"><code>x = [0.65, 0.11, 0.13, 0.31, 0.21] y = [1, 0, 0, 1, 0] print some_test(x, y) ==> returns "degree of correlation = 1.0" </code></pre> Thanks

You are looking for a point biserial correlation, which is used when one of your variables is dichotomous. <pre class="prettyprint"><code>from scipy import stats stats.pointbiserialr(x,y) </code></pre> If you simply want to know whether X is different depending on the value of Y, you should instead use a t-test.

Is there a way to test correlation between Data X and Binary output Y?

Tags:

python

optimization

correlation

I'm trying to find a Python method/library for testing correlation between the independent variables X and the binary output Y..

So for example, lets say I have the following data and output:

X           Y
0.65       1
0.11       0
0.13       0
0.35       1
0.21       0
...

Lets say the output Y is 1 if (X > 0.3) and 0 otherwise. If I don't know this correlation (the threshold value 0.3), is there a statistical method/test to find out the degree of correlation between X and Y?

So for example, some method that returns

x = [0.65, 0.11, 0.13, 0.31, 0.21]
y = [1, 0, 0, 1, 0]
print some_test(x, y)

==> returns "degree of correlation = 1.0"

Thanks

674

asked Mar 12 '15 22:03

user2436815

1 Answers

You are looking for a point biserial correlation, which is used when one of your variables is dichotomous.

from scipy import stats
stats.pointbiserialr(x,y)

If you simply want to know whether X is different depending on the value of Y, you should instead use a t-test.

answered Oct 12 '22 06:10

Jeff

Related questions
                            
                                save password as salted hash in mongodb in users collection using python/bcrypt
                            
                                Collecting messages from 3rd party apps in Django
                            
                                Converting pandas DatetimeIndex to 'float days format' with Matplotlib.dates.datestr2num
                            
                                GridSearchCV no reporting on high verbosity
                            
                                ImportError: dynamic module does not define init function
                            
                                How do I mock a class in a Python unit test?
                            
                                PySide: What is the best way to resize the main window if one widget is hidden?
                            
                                Extract part of data from JSON file with python [duplicate]
                            
                                UndefinedError: 'current_user' is undefined
                            
                                When I am in the Python or IPython console, what is called when I am returned an output?
                            
                                How to find which branches are not covered by tests?
                            
                                Django test client does not handle exceptions?
                            
                                How do I install OpenCV for Python 3.4?
                            
                                Why does python's os.walk() not reflect directory deletion?
                            
                                How to toggle visibility of matplotlib figures?
                            
                                How do I bind the escape key to close this window
                            
                                Django Log Formatting Not Being Applied
                            
                                ImportError: No module named app
                            
                                Get All Revisions for a specific file in gitpython
                            
                                Python Flask writes access log to STDERR

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With