how to convert a dataframe of counts to a probability density function

Q: How do you convert density to probability?

To translate the probability density ρ(x) into a probability, imagine that Ix is some small interval around the point x. Then, assuming ρ is continuous, the probability that X is in that interval will depend both on the density ρ(x) and the length of the interval: Pr(X∈Ix)≈ρ(x)×Length of Ix.

Q: How do you plot a PDF and CDF in Python?

MatPlotLib with Python Compute the histogram of a set of data with data and bins=10. Find the probability distribution function (pdf). Using pdf (Step 5), calculate cdf. Plot the cdf using plot() method with label "CDF".

Tags:

python

pandas

scikit-learn

Suppose that I have the following observations of integers:

df = pd.DataFrame({'observed_scores': [100, 100, 90, 85, 100, ...]})

I know that this can be used as an input to make a density plot:

df['observed_scores'].plot.density()

but suppose that what I have is a counts table:

df = pd.DataFrame({'observed_scores': [100, 95, 90, 85, ...], 'counts': [1534, 1399, 3421, 8764, ...})

which is cheaper to store than the full observed_scores Series (I have LOTS of observations).

I know it's possible to plot the histogram using the counts, but how do I plot the density plot? If possible, can it be done without having to unstack/unravel the counts table into thousands of rows?

350

asked Jun 22 '20 15:06

irene

1 Answers

IIUC, statsmodels lets you fit a weighted KDE:

from statsmodels.nonparametric.kde import KDEUnivariate

df = pd.DataFrame({'observed_scores': [100, 95, 90, 85],
                   'counts': [1534, 1399, 3421, 8764]})

kde1= KDEUnivariate(df.observed_scores)
kde_noweight = KDEUnivariate(df.observed_scores)
kde1.fit(weights=df.counts, fft=False)
kde_noweight.fit()
plt.plot(kde1.support, kde1.density)
plt.plot(kde_noweight.support, kde_noweight.density)
plt.legend(['weighted', 'unweighted'])

Output:

enter image description here

answered Oct 04 '22 09:10

Juan C

Related questions
                            
                                Custom Spider chart --> Display curves instead of lines between point on a polar plot in matplotlib
                            
                                What is a buffer in Pytorch?
                            
                                Why can I not import load_dotenv?
                            
                                ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host error with ChromeDriver Chrome Selenium Django
                            
                                FileRequiredValidator() doesn't work when using MultipleFileField() in my form
                            
                                Networkx Traveling Salesman Problem (TSP)
                            
                                Tensorflow 2.1.0 Error, module 'tensorflow' has no attribute 'GraphKeys'
                            
                                How to open huge parquet file using Pandas without enough RAM
                            
                                Copy performance: list vs array
                            
                                detect key press in python, where each iteration can take more than a couple of seconds?
                            
                                What is the purpose of the class meta in Django?
                            
                                How does np.ndarray.tobytes() work for dtype "object"?
                            
                                Modify trained model architecture and continue training Keras
                            
                                Two instances of class are equal but different hash code
                            
                                remove authentication and permission for specific url path
                            
                                Pandas apply in parallel when axis=0
                            
                                bottle : how to set a cookie inside a python decorator?
                            
                                Stuck in Watching for file changes with StatReloader
                            
                                Nested Class factory with tkinter
                            
                                Replace pandas column with sorted index

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With