How do I figure cumulative distribution within groups (as cume_dist does) but separate tied values in R?

Question

Here is a data set where YEAR is a grouping variable.

dat <- data.frame(YEAR = c(rep(1999,4),rep(2002,3)), VALUE = c(1,2,3,2,1,2,3))

I would like to add a column that looks at the VALUE column and says "where, within its year, does this value sit?" I'm having trouble phrasing it concisely, but pretty much exactly what cume_dist does except that cume_dist lumps tied values and I want them separated. cume_dist takes 1,2,2,3 and gives back 0.25, 0.75, 0.75, 1.00 and I need the ties separated: 1,2,2,4 should give 0.25, 0.50, 0.75, 1.0.

Here's a line based on cume_dist that works except for the tied values:

dat %>% group_by(YEAR) %>% mutate(cumdist = cume_dist(VALUE))

I tried to deconstruct cume_dist (which is rank in group divided by size of group) and use row_number, which separates ties, for the numerator and divide it by the number of rows in each year. This gives me the correct numerator:

dat %>% group_by(YEAR) %>% mutate(rownumber = row_number(VALUE))

But how do I divide those ranks by the number of values in each year (that is, divide all the ranks in 1999 by 4, and the ranks from 2002 by 3)?

ThomasIsCoding · Accepted Answer

Is this what you are after?

dat %>%
  mutate(cumdist = seq.int(n())[order(VALUE)] / n(), .by = YEAR)

which gives

  YEAR VALUE   cumdist
1 1999     1 0.2500000
2 1999     2 0.5000000
3 1999     3 1.0000000
4 1999     2 0.7500000
5 2002     1 0.3333333
6 2002     2 0.6666667
7 2002     3 1.0000000

How do I figure cumulative distribution within groups (as cume_dist does) but separate tied values in R?

Tags:

r

group-by

cumulative-frequency

chill

1 Answers

ThomasIsCoding

Recent Activity

Donate For Us

How do I figure cumulative distribution within groups (as cume_dist does) but separate tied values in R?

Tags:

r

group-by

cumulative-frequency

chill

1 Answers

ThomasIsCoding

Related questions

Recent Activity

Donate For Us