Generating random number for a distribution of a real data?

Tags:

I have a set of real data and I want use this data to find a probability distribution and then use their property to generate some random points according to their pdf. A sample of my data set is as following:

#Mag Weight
21.9786 3.6782
24.0305 6.1120
21.9544 4.2225
23.9383 5.1375
23.9352 4.6499
23.0261 5.1355
23.8682 5.9932
24.8052 4.1765
22.8976 5.1901
23.9679 4.3190
25.3362 4.1519
24.9079 4.2090
23.9851 5.1951
22.2094 5.1570
22.3452 5.6159
24.0953 6.2697
24.3901 6.9299
24.1789 4.0222
24.2648 4.4997
25.3931 3.3920
25.8406 3.9587
23.1427 6.9398
21.2985 7.7582
25.4807 3.1112
25.1935 5.0913
25.2136 4.0578
24.6990 3.9899
23.5299 4.6788
24.0880 7.0576
24.7931 5.7088
25.1860 3.4825
24.4757 5.8500
24.1398 4.9842
23.4947 4.4730
20.9806 5.2717
25.9470 3.4706
25.0324 3.3879
24.7186 3.8443
24.3350 4.9140
24.6395 5.0757
23.9181 4.9951
24.3599 4.1125
24.1766 5.4360
24.8378 4.9121
24.7362 4.4237
24.4119 6.1648
23.8215 5.9184
21.5394 5.1542
24.0081 4.2308
24.5665 4.6922
23.5827 5.4992
23.3876 6.3692
25.6872 4.5055
23.6629 5.4416
24.4821 4.7922
22.7522 5.9513
24.0640 5.8963
24.0361 5.6406
24.8687 4.5699
24.8795 4.3198
24.3486 4.5305
21.0720 9.5246
25.2960 3.0828
23.8204 5.8605
23.3732 5.1161
25.5097 2.9010
24.9206 4.0999
24.4140 4.9073
22.7495 4.5059
24.3394 3.5061
22.0560 5.5763
25.4404 5.4916
25.4795 4.4089
24.1772 3.8626
23.6042 4.7476
23.3537 6.4804
23.6842 4.3220
24.1895 3.6072
24.0328 4.3273
23.0243 5.6789
25.7042 4.4493
22.1983 6.1868
22.3661 5.9132
20.9426 4.8079
20.3806 10.1128
25.0105 4.4296
23.6648 6.6482
25.2780 4.4933
24.6870 4.4836
25.4565 4.0990
25.0415 3.9384
24.6098 4.6057
24.7796 4.2042

How could I do this? My first attempt was to fit a polynomial to the binned data and find the probability distribution of weights in each magnitude bin, but I reckon it might be a smarter way to do it. For instance, using scipy.stats.rv_continuous for sampling data from the given distribution but I don't know how it can work and there are not enough examples.

Update: As I got a lot of comments to use KDE, I used scipy.stats.gaussian_kde and I got the following results. enter image description here

I am wondering whether it is a good probability distribution to represent the property of my data? First, how could I test it, and second, whether there is a possibility to fit more than one gaussian kde with scipy.stats?

282

asked Jun 13 '14 15:06

Dalek

1 Answers

(1) If you have an idea about the distribution from which these data are sampled, then fit that distribution to the data (i.e., adjust parameters via maximum likelihood or whatever) and then sample that.

(2) For more nearly empirical approach, select one datum at random (with equal probability) and then pretend it is the center of a little Gaussian bump, and sample from that bump. This is equivalent to constructing a kernel density estimate and sampling from that. You will have to pick a standard deviation for the bumps.

(3) For an entirely empirical approach, select one datum at random (with equal probability). This is equivalent to assuming the empirical distribution is the same as the actual distribution.

answered Nov 03 '22 05:11

Robert Dodier

Related questions
                            
                                Why does flask-admin require a ListField of mongoengine to have a field type?
                            
                                __init__() got an unexpected keyword argument 'text'
                            
                                Programmatically check if Python dependencies are satisfied
                            
                                Numpy operations appear slow
                            
                                Python bigquery - ImportError: No module named google.apputils
                            
                                Flask-login request_loader not working?
                            
                                Need explanation on flask.request
                            
                                Multiple Django Storage Backend systems
                            
                                Insufficient Privileges: plone.app.multilingual [1.x] - Translate an Archetype content
                            
                                Debugging strategy for a bug (apparently) affected by timing
                            
                                Text classification in python - (NLTK Sentence based)
                            
                                Apache Thrift 2-way SSL mutual authentication
                            
                                Fitting complex model using Python and lmfit?
                            
                                Postgres: cursor.execute("COMMIT") vs. connection.commit()
                            
                                python speed up this regex sub
                            
                                "no python application found" uWSGI + nginx + Ubuntu 13
                            
                                ImportError: dlopen failed: has bad ELF magic
                            
                                IOError: decoder jpeg not available when using Pillow
                            
                                Why does "a is b" behave differently on Interactive mode and when it's ran from script? [duplicate]
                            
                                how to pass arguments to imported script in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Generating random number for a distribution of a real data?

Tags:

python

numpy

statistics

scipy

probability-density

Dalek

People also ask

1 Answers

Robert Dodier

Recent Activity

Donate For Us