Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I have written a code to calculate the correlation between two Pandas Series. Can you tell me what is wrong with my code?

Below is the code:

import numpy as np
import pandas as pd

def correlation(x, y):
    std_x = (x - x.mean())/x.std(ddof = 0)
    std_y = (y - y.mean())/y.std(ddof = 0)
    return (std_x * std_y).mean

a = pd.Series([2, 4, 5, 7, 9])
b = pd.Series([12, 10, 9, 7, 3])
ca = correlation(a, b)
print(ca)

It does not return the value of the correlation, instead it returns a Series with keys as 0 ,1, 2, 3, 4, 5 and values as -1.747504, -0.340844, -0.043282, -0.259691, -2.531987.

Please help me understand the problem behind this.

like image 542
python_noob Avatar asked Mar 08 '23 00:03

python_noob


1 Answers

You need to call mean() with:

return (std_x * std_y).mean()

not only :

return (std_x * std_y).mean:

which returns the method itself. Full code:

import numpy as np
import pandas as pd

def correlation(x, y):
    std_x = (x - x.mean())/x.std(ddof = 0)
    std_y = (y - y.mean())/y.std(ddof = 0)
    return (std_x * std_y).mean()

a = pd.Series([2, 4, 5, 7, 9])
b = pd.Series([12, 10, 9, 7, 3])
ca = correlation(a, b)
print(ca)

Output:

-0.984661667628
like image 124
Mike Müller Avatar answered May 01 '23 02:05

Mike Müller