Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Top 20% of the people earn 80% of the money" ... this type of result using R

Tags:

r

Let me give you an example:

person | salary
----------------
1      | 30'000
2      | 10'000
3      | 15'000
4      | 25'000
5      | 80'000
6      | 56'000
...    | ...

The steps to take to get to this result would be to order the salaries and then create a new table giving the share of rows/people from start until the respective row and the share of the sum of the salaries from start until the respective row (of the total salaries).

Then one just has to take the row closest to 20% for the people and we know how much they earn.

This is a pretty standard question - but as I don't know how to refer to it verbally I cannot google it.

So I would appreciate if somebody could tell me what to "call this" and how to compute and plot this in R the most easiest - so no loops and stuff. My intuition tells me there are at least 5 packages and 10 functions addressing this scenario. Maybe something similar to summary() with fixed quantiles.

So let's assume the above table to be available as a data frame:

salaries <- data.frame(person = c(1,2,3,...), salary = c(30000,...))

like image 428
Raffael Avatar asked Nov 12 '22 22:11

Raffael


1 Answers

Use the SLID income dataset from the car-package:

library(car)

dat <- SLID[!is.na(SLID$wage),]       # Remove missing values
dat$income <- dat$wage*40*50          # "translate" the wages to their full time annual earnings equivalent.
dat$id <- seq(1,nrow(dat))       

# Create a data.frame with a person ID and their annual income:
keep <- data.frame(id = seq(1, nrow(dat)), 
                   income = dat$income)
keep <- keep[order(keep$income, decreasing = TRUE),]  # Descending ordering according to income
keep$accum <- cumsum(keep$income)                     # Cumulative sum of the descending incomes
keep$pct <- keep$accum/sum(keep$income)*100           # % of the total income

keep$check <- keep$pct<80                      # Check where the % is smaller than 80%
threshold <- min(which(keep$check == FALSE))   # First line where % is larger than 80%
border <- threshold/nrow(keep)*100             # Check which percentile that was
border <- round(border, digits = 2)
paste0(border, "% of the people earn 80% of the income")

#[1] "62.41% of the people earn 80% of the income"

The classical 80-20 rule as we would expect it, would show "20% of the people earn 80% of the income". This rule does not apply here, as you can see..

The reversed argument:

# The 20% of the people earn X % of total income:

linenr <- ceiling(1/5*nrow(keep))
outcome2 <- round(keep$pct[linenr], digits = 2)
paste0(outcome2, "% of total income is earned by the top 20% of the people")

# [1] "36.07% of total income is earned by the top 20% of the people"

Note that the numbers presented here are not representative for the real world :)

Also, Wikipedia has some more information on the Pareto principle, which is also known as the 80-20 rule. It seems that this rule appears in multiple settings, such as business, economics and mathematics.

like image 139
KenHBS Avatar answered Nov 15 '22 05:11

KenHBS