calculating Gini coefficient in Python/numpy

Tags:

i'm calculating Gini coefficient (similar to: Python - Gini coefficient calculation using Numpy) but i get an odd result. for a uniform distribution sampled from np.random.rand(), the Gini coefficient is 0.3 but I would have expected it to be close to 0 (perfect equality). what is going wrong here?

def G(v):     bins = np.linspace(0., 100., 11)     total = float(np.sum(v))     yvals = []     for b in bins:         bin_vals = v[v <= np.percentile(v, b)]         bin_fraction = (np.sum(bin_vals) / total) * 100.0         yvals.append(bin_fraction)     # perfect equality area     pe_area = np.trapz(bins, x=bins)     # lorenz area     lorenz_area = np.trapz(yvals, x=bins)     gini_val = (pe_area - lorenz_area) / float(pe_area)     return bins, yvals, gini_val  v = np.random.rand(500) bins, result, gini_val = G(v) plt.figure() plt.subplot(2, 1, 1) plt.plot(bins, result, label="observed") plt.plot(bins, bins, '--', label="perfect eq.") plt.xlabel("fraction of population") plt.ylabel("fraction of wealth") plt.title("GINI: %.4f" %(gini_val)) plt.legend() plt.subplot(2, 1, 2) plt.hist(v, bins=20)

for the given set of numbers, the above code calculates the fraction of the total distribution's values that are in each percentile bin.

the result:

enter image description here

uniform distributions should be near "perfect equality" so the lorenz curve bending is off.

544

asked Sep 15 '16 13:09

mvd

1 Answers

This is to be expected. A random sample from a uniform distribution does not result in uniform values (i.e. values that are all relatively close to each other). With a little calculus, it can be shown that the expected value (in the statistical sense) of the Gini coefficient of a sample from the uniform distribution on [0, 1] is 1/3, so getting values around 1/3 for a given sample is reasonable.

You'll get a lower Gini coefficient with a sample such as v = 10 + np.random.rand(500). Those values are all close to 10.5; the relative variation is lower than the sample v = np.random.rand(500). In fact, the expected value of the Gini coefficient for the sample base + np.random.rand(n) is 1/(6*base + 3).

Here's a simple implementation of the Gini coefficient. It uses the fact that the Gini coefficient is half the relative mean absolute difference.

def gini(x):     # (Warning: This is a concise implementation, but it is O(n**2)     # in time and memory, where n = len(x).  *Don't* pass in huge     # samples!)      # Mean absolute difference     mad = np.abs(np.subtract.outer(x, x)).mean()     # Relative mean absolute difference     rmad = mad/np.mean(x)     # Gini coefficient     g = 0.5 * rmad     return g

Here's the Gini coefficient for several samples of the form v = base + np.random.rand(500):

In [80]: v = np.random.rand(500)  In [81]: gini(v) Out[81]: 0.32760618249832563  In [82]: v = 1 + np.random.rand(500)  In [83]: gini(v) Out[83]: 0.11121487509454202  In [84]: v = 10 + np.random.rand(500)  In [85]: gini(v) Out[85]: 0.01567937753659053  In [86]: v = 100 + np.random.rand(500)  In [87]: gini(v) Out[87]: 0.0016594595244509495

answered Oct 10 '22 04:10

Warren Weckesser

Related questions
                            
                                Can I develop a private action only accessible via my google home or linked account?
                            
                                Get name of specific Exception
                            
                                Angular 2 HTTP GET 404 Not Found for URL
                            
                                Vue.js - event handling with on-mouse-click down (and not up)
                            
                                Minimize function with parameters
                            
                                Nested list to dict
                            
                                NPM: Cannot find module 'uuid'
                            
                                Passing an event object to enzyme .simulate
                            
                                use DELETE method in route with Laravel 5.4
                            
                                Homebrew: Could not symlink, /usr/local/share/man/man7 is not writable
                            
                                Testing a React component with Enzyme. Typescript can't find instance methods
                            
                                What is the difference between microservices and webservices?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

calculating Gini coefficient in Python/numpy

Tags:

mvd

People also ask

1 Answers

Warren Weckesser

Recent Activity

Donate For Us