Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Insert Missing Numbers in A Sequence by Group's Max Value

Tags:

r

I'd like to insert missing numbers in the index column following these two conditions:

  1. Partitioned by multiple columns
  2. The minimum value is always 1
  3. The maximum value is always the maximum for the group and type

Current Data:

group   type    index   vol
A       1       1       200
A       1       2       244
A       1       5       33

A       2       2       66
A       2       3       2
A       2       4       199
A       2       10      319

B       1       4       290
B       1       5       188
B       1       6       573
B       1       9       122

Desired Data:

group   type    index   vol
A       1       1       200
A       1       2       244
A       1       3       0
A       1       4       0
A       1       5       33

A       2       1       0
A       2       2       66
A       2       3       2
A       2       4       199
A       2       5       0
A       2       6       0
A       2       7       0
A       2       8       0
A       2       9       0
A       2       10      319

B       1       1       0
B       1       2       0
B       1       3       0
B       1       4       290
B       1       5       188
B       1       6       573
B       1       7       0
B       1       8       0
B       1       9       122

I've just added in spaces between the partitions for clarity.

Hope you can help out!

like image 610
lostinsql Avatar asked Mar 08 '19 06:03

lostinsql


1 Answers

You can do the following

library(dplyr)
library(tidyr)

my_df %>% 
  group_by(group, type) %>% 
  complete(index = 1:max(index), fill = list(vol = 0))

#    group type index vol
# 1      A    1     1 200
# 2      A    1     2 244
# 3      A    1     3   0
# 4      A    1     4   0
# 5      A    1     5  33
# 6      A    2     1   0
# 7      A    2     2  66
# 8      A    2     3   2
# 9      A    2     4 199
# 10     A    2     5   0
# 11     A    2     6   0
# 12     A    2     7   0
# 13     A    2     8   0
# 14     A    2     9   0
# 15     A    2    10 319
# 16     B    1     1   0
# 17     B    1     2   0
# 18     B    1     3   0
# 19     B    1     4 290
# 20     B    1     5 188
# 21     B    1     6 573
# 22     B    1     7   0
# 23     B    1     8   0
# 24     B    1     9 122

With group_by you specify the groups you indicated withed the white spaces. With complete you specify which columns should be complete and then what values should be filled in for the remaining column (default would be NA)

Data

my_df <- 
  structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
                 type = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), 
                 index = c(1L, 2L, 5L, 2L, 3L, 4L, 10L, 4L, 5L, 6L, 9L), 
                 vol = c(200L, 244L, 33L, 66L, 2L, 199L, 319L, 290L, 188L, 573L, 122L)), 
            class = "data.frame", row.names = c(NA, -11L))
like image 183
kath Avatar answered Oct 12 '22 12:10

kath