I am trying to calculate the percentile ranks of a value in a dataframe, and I also have an associated frequency in the dataframe to weight by. I'm struggling to come up with a solution that will calculate the percentile of the original value as if the overall distribution is that value replicated by the frequency and all the other values replicated by that frequency.
For example:
groceries <- tribble(
~item, ~price, ~freq,
"apple", 1, 20,
"banana", 2, 5,
"carrot", 3, 1
)
groceries %>%
mutate(reg_ptile = percent_rank(price),
wtd_ptile = weighted_percent_rank(price, wt = freq))
# the expected result would be:
# A tibble: 3 x 5
item price freq reg_ptile wtd_ptile
<chr> <dbl> <dbl> <dbl> <dbl>
1 apple 1 20 0.0 0.0
2 banana 2 5 0.5 0.8
3 carrot 3 1 1.0 1.0
percent_rank()
is an actual dplyr function. How would the function weighted_percent_rank()
be written? Not sure how to make this work in a dataframe and pipes. It would be swell if the solution could also work with groups.
Edit: Using uncount()
doesn't really work because uncounting the data I'm using would result in 800 billion rows. Any other ideas?
You can use tidyr::uncount
to expand the number of rows as per frequency to get the weighted percentile, then reduce them back down with summarize
, as per this regex:
library(dplyr)
groceries <- tribble(
~item, ~price, ~freq,
"apple", 1, 10,
"banana", 2, 5,
"carrot", 3, 1
)
groceries %>%
tidyr::uncount(freq) %>%
mutate(wtd_ptile = percent_rank(price)) %>%
group_by(item) %>%
summarize_all(~.[1]) %>%
mutate(ptile = percent_rank(price))
#> # A tibble: 3 x 4
#> item price wtd_ptile ptile
#> <chr> <dbl> <dbl> <dbl>
#> 1 apple 1 0 0
#> 2 banana 2 0.667 0.5
#> 3 carrot 3 1 1
Note there are different ranking functions you can choose, though in this case the weighted percentile is 0.667 ( 10/(16 - 1)
), not 0.8
EDIT
An alternative that does not involve creating billions of rows:
groceries %>%
arrange(price) %>%
mutate(wtd_ptile = lag(cumsum(freq), default = 0)/(sum(freq) - 1))
#> # A tibble: 3 x 4
#> item price freq wtd_ptile
#> <chr> <dbl> <dbl> <dbl>
#> 1 apple 1 10 0
#> 2 banana 2 5 0.667
#> 3 carrot 3 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With