Perhaps I am doing something wrong while z-normalizing my array. Can someone take a look at this and suggest what's going on?
In R:
> data <- c(2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34)
> data.mean <- mean(data)
> data.sd <- sqrt(var(data))
> data.norm <- (data - data.mean) / data.sd
> print(data.norm)
[1] -0.9796808 -0.8622706 -0.6123005 0.8496459 1.7396910 1.5881940 1.0958286 0.5277147 0.4709033 -0.2865819
[11] 0.0921607 -0.2865819 -0.9039323 -1.1955641 -1.2372258
In Python using numpy:
>>> import string
>>> import numpy as np
>>> from scipy.stats import norm
>>> data = np.array([np.array([2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34])])
>>> data -= np.split(np.mean(data, axis=1), data.shape[0])
>>> data *= np.split(1.0/data.std(axis=1), data.shape[0])
>>> print data
[[-1.01406602 -0.89253491 -0.63379126 0.87946705 1.80075126 1.64393692
1.13429034 0.54623659 0.48743122 -0.29664045 0.09539539 -0.29664045
-0.93565885 -1.23752644 -1.28065039]]
Am I using numpy
incorrectly?
Python Vs R: Full ComparisonPython is better suitable for machine learning, deep learning, and large-scale web applications. R is suitable for statistical learning having powerful libraries for data experiment and exploration. Python has a lot of libraries. However, it can be complex to understand all of them.
If you're passionate about the statistical calculation and data visualization portions of data analysis, R could be a good fit for you. If, on the other hand, you're interested in becoming a data scientist and working with big data, artificial intelligence, and deep learning algorithms, Python would be the better fit.
R is developed for data analysis; hence it has more powerful statistical packages. Python's statistical packages are less powerful. R is slower than python but not much. Python is faster.
Unlike R, Python is a general-purpose programming language, so it can also be used for software development and embedded programming. The main motivation for Python was creating a small core language with a large standard library and an easily extensible interpreter.
The reason you're getting different results has to do with how the standard deviation/variance is calculated. R calculates using denominator N-1
, while numpy calculates using denominator N
. You can get a numpy result equal to the R result by using data.std(ddof=1)
, which tells numpy to use N-1
as the denominator when calculating the variance.
I believe that your NumPy result is correct. I would do the normalization in a simpler way, though:
>>> data = np.array([2.02, 2.33, 2.99, 6.85, 9.20, 8.80, 7.50, 6.00, 5.85, 3.85, 4.85, 3.85, 2.22, 1.45, 1.34])
>>> data -= data.mean()
>>> data /= data.std()
>>> data
array([-1.01406602, -0.89253491, -0.63379126, 0.87946705, 1.80075126,
1.64393692, 1.13429034, 0.54623659, 0.48743122, -0.29664045,
0.09539539, -0.29664045, -0.93565885, -1.23752644, -1.28065039])
The difference between your two results lies in the normalization: with r
as the R result:
>>> r / data
array([ 0.96609173, 0.96609173, 0.96609173, 0.96609179, 0.96609179, 0.96609181, 0.9660918 , 0.96609181,
0.96609179, 0.96609179, 0.9660918 , 0.96609179, 0.96609175, 0.96609176, 0.96609177])
Thus, your two results are mostly simply proportional to each other. You may therefore want to compare the standard deviations obtained with R and with Python.
PS: Now that I am thinking of it, it may be that the variance in NumPy and in R is not defined in the same way: for N
elements, some tools normalize with N-1
instead of N
, when calculating the variance. You may want to check this.
PPS: Here is the reason for the discrepancy: the difference in factors comes from two different normalization conventions: the observed factor is simply sqrt(14/15) = 0.9660917… (because the data has 15 elements). Thus, in order to obtain in R the same result as in Python, you need to divide the R result by this factor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With