Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the cumulative distribution function with NumPy?

I want to create a CDF with NumPy, my code is the next:

histo = np.zeros(4096, dtype = np.int32) for x in range(0, width):    for y in range(0, height):       histo[data[x][y]] += 1       q = 0     cdf = list()    for i in histo:       q = q + i       cdf.append(q) 

I am walking by the array but take a long time the program execution. There is a built function with this feature, isn't?

like image 849
omar Avatar asked May 17 '12 17:05

omar


People also ask

What is CDF in pandas?

A CDF or cumulative distribution function plot is basically a graph with on the X-axis the sorted values and on the Y-axis the cumulative distribution. So, I would create a new series with the sorted values as index and the cumulative distribution as values.


1 Answers

Using a histogram is one solution but it involves binning the data. This is not necessary for plotting a CDF of empirical data. Let F(x) be the count of how many entries are less than x then it goes up by one, exactly where we see a measurement. Thus, if we sort our samples then at each point we increment the count by one (or the fraction by 1/N) and plot one against the other we will see the "exact" (i.e. un-binned) empirical CDF.

A following code sample demonstrates the method

import numpy as np import matplotlib.pyplot as plt  N = 100 Z = np.random.normal(size = N) # method 1 H,X1 = np.histogram( Z, bins = 10, normed = True ) dx = X1[1] - X1[0] F1 = np.cumsum(H)*dx #method 2 X2 = np.sort(Z) F2 = np.array(range(N))/float(N)  plt.plot(X1[1:], F1) plt.plot(X2, F2) plt.show() 

It outputs the following

enter image description here

like image 51
Dan Avatar answered Sep 21 '22 03:09

Dan