Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R repeat in column based on value in row

I have a dataframe like the following:

Name    School   Weight Days
Antoine Bach     0.03   5
Antoine Ken      0.02   7
Barbara Franklin 0.04   3

I would like to obtain an output like the following:

Name    School   1    2    3    4    5    6    7
Antoine Bach     0.03 0.03 0.03 0.03 0.03 NA   NA
Antoine Ken      0.02 0.02 0.02 0.02 0.02 0.02 0.02
Barbara Franklin 0.04 0.04 0.04 NA   NA   NA   NA

Reproducible Sample Data:

df <- tribble(
  ~Name,    ~School,   ~Weight, ~Days,
  "Antoine", "Bach",     0.03,   5,
  "Antoine", "Ken",      0.02,   7,
  "Barbara", "Franklin", 0.04,   3
)

like image 288
user15462606 Avatar asked Apr 11 '21 15:04

user15462606


People also ask

How do I repeat a value in a column in R?

How to repeat column values in R data frame by values in another column? First of all, create a data frame. Then, use rep function along with cbind function to repeat column values in the matrix by values in another column.

How do you repeat a row in R?

In R, the easiest way to repeat rows is with the REP() function. This function selects one or more observations from a data frame and creates one or more copies of them. Alternatively, you can use the SLICE() function from the dplyr package to repeat rows.


Video Answer


2 Answers

Using data.table you can create a long version by repeating the Weight value Days number of times for each row, then dcasting to a wide format with the rowidof the new variable as the column.

library(data.table)
setDT(df)

dcast(df[, .(rep(Weight, Days)), .(Name, School)], 
      Name + School ~ rowid(V1))

# Name   School    1    2    3    4    5    6    7
# 1: Antoine     Bach 0.03 0.03 0.03 0.03 0.03   NA   NA
# 2: Antoine      Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
# 3: Barbara Franklin 0.04 0.04 0.04   NA   NA   NA   NA

You could also rep Weight the number of Days, then rep NA enough times to complete the row.

max_days <- max(df$Days) 

df[, as.list(rep(c(Weight, NA), c(Days, max_days - Days))), 
   .(Name, School)]

# Name   School   V1   V2   V3   V4   V5   V6   V7
# 1: Antoine     Bach 0.03 0.03 0.03 0.03 0.03   NA   NA
# 2: Antoine      Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
# 3: Barbara Franklin 0.04 0.04 0.04   NA   NA   NA   NA
like image 77
IceCreamToucan Avatar answered Sep 21 '22 14:09

IceCreamToucan


You can use pmap_dfr to apply a function across the rows and then row bind the resulting list into a tibble object. The function will match arguments to column names, the rest of the row values will be captured in the ellipsis ....

library(purrr)
library(dplyr)

pmap_dfr(df, function(Weight, Days, ...) c(..., setNames(rep(Weight, Days), 1:Days))) %>% 
  mutate(across(3:last_col(), as.numeric))

Because vectors are atomic in R c() will coerce everything in the row to be character. So the mutate converts the newly created columns back to numeric.

setNames is used to name the newly created columns, which is required to bind by row.

Output

  Name    School     `1`   `2`   `3`   `4`   `5`   `6`   `7`
  <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach      0.03  0.03  0.03  0.03  0.03 NA    NA   
2 Antoine Ken       0.02  0.02  0.02  0.02  0.02  0.02  0.02
3 Barbara Franklin  0.04  0.04  0.04 NA    NA    NA    NA   

Note: pmap_dfr is from the purrr package, and mutate, across, and last_col are all from dplyr.

How it works

When you use pmap in the way above the named function arguments will be matched to columns with the same name. So Weights and Days as function arguments are matched to those columns with the same name in each row.

The ... collects the remaining columns that are still passed to the function, but are unused (by name) in the function. Essentially, the ellipsis collects Name and School in your case.

Since Name and School already have names they are passed to c() first to maintain your column order. In addition we combine the other values and give them names as well. The output for a single row is then this:

     Name    School         1         2         3         4         5         6 
"Antoine"    "Bach"    "0.03"    "0.03"    "0.03"    "0.03"    "0.03"        NA 
        7 
       NA 

The output of pmap is a list. _dfr is a specific function to row bind (hence the r) these list elements into a dataframe/tibble (hence the df).

like image 38
LMc Avatar answered Sep 19 '22 14:09

LMc