I want to create a CDF with NumPy, my code is the next:
histo = np.zeros(4096, dtype = np.int32) for x in range(0, width): for y in range(0, height): histo[data[x][y]] += 1 q = 0 cdf = list() for i in histo: q = q + i cdf.append(q)
I am walking by the array but take a long time the program execution. There is a built function with this feature, isn't?
A CDF or cumulative distribution function plot is basically a graph with on the X-axis the sorted values and on the Y-axis the cumulative distribution. So, I would create a new series with the sorted values as index and the cumulative distribution as values.
Using a histogram is one solution but it involves binning the data. This is not necessary for plotting a CDF of empirical data. Let F(x)
be the count of how many entries are less than x
then it goes up by one, exactly where we see a measurement. Thus, if we sort our samples then at each point we increment the count by one (or the fraction by 1/N) and plot one against the other we will see the "exact" (i.e. un-binned) empirical CDF.
A following code sample demonstrates the method
import numpy as np import matplotlib.pyplot as plt N = 100 Z = np.random.normal(size = N) # method 1 H,X1 = np.histogram( Z, bins = 10, normed = True ) dx = X1[1] - X1[0] F1 = np.cumsum(H)*dx #method 2 X2 = np.sort(Z) F2 = np.array(range(N))/float(N) plt.plot(X1[1:], F1) plt.plot(X2, F2) plt.show()
It outputs the following
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With