I have a disordered list named d
that looks like:
[0.0000, 123.9877,0.0000,9870.9876, ...]
I just simply want to plot a cdf graph based on this list by using Matplotlib in Python. But don't know if there's any function I can use
d = [] d_sorted = [] for line in fd.readlines(): (addr, videoid, userag, usertp, timeinterval) = line.split() d.append(float(timeinterval)) d_sorted = sorted(d) class discrete_cdf: def __init__(data): self._data = data # must be sorted self._data_len = float(len(data)) def __call__(point): return (len(self._data[:bisect_left(self._data, point)]) / self._data_len) cdf = discrete_cdf(d_sorted) xvalues = range(0, max(d_sorted)) yvalues = [cdf(point) for point in xvalues] plt.plot(xvalues, yvalues)
Now I am using this code, but the error message is :
Traceback (most recent call last): File "hitratioparea_0117.py", line 43, in <module> cdf = discrete_cdf(d_sorted) TypeError: __init__() takes exactly 1 argument (2 given)
The Cumulative Distribution Function (CDF) plot is a lin-lin plot with data overlay and confidence limits. It shows the cumulative density of any data set over time (i.e., Probability vs. size).
As mentioned, cumsum
from numpy
works well. Make sure that your data is a proper PDF (ie. sums to one), otherwise the CDF won't end at unity as it should. Here is a minimal working example:
import numpy as np from pylab import * # Create some test data dx = 0.01 X = np.arange(-2, 2, dx) Y = exp(-X ** 2) # Normalize the data to a proper PDF Y /= (dx * Y).sum() # Compute the CDF CY = np.cumsum(Y * dx) # Plot both plot(X, Y) plot(X, CY, 'r--') show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With