I have a dataframe like the following:
Name School Weight Days
Antoine Bach 0.03 5
Antoine Ken 0.02 7
Barbara Franklin 0.04 3
I would like to obtain an output like the following:
Name School 1 2 3 4 5 6 7
Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
Reproducible Sample Data:
df <- tribble(
~Name, ~School, ~Weight, ~Days,
"Antoine", "Bach", 0.03, 5,
"Antoine", "Ken", 0.02, 7,
"Barbara", "Franklin", 0.04, 3
)
How to repeat column values in R data frame by values in another column? First of all, create a data frame. Then, use rep function along with cbind function to repeat column values in the matrix by values in another column.
In R, the easiest way to repeat rows is with the REP() function. This function selects one or more observations from a data frame and creates one or more copies of them. Alternatively, you can use the SLICE() function from the dplyr package to repeat rows.
Using data.table you can create a long version by repeating the Weight value Days number of times for each row, then dcasting to a wide format with the rowidof the new variable as the column.
library(data.table)
setDT(df)
dcast(df[, .(rep(Weight, Days)), .(Name, School)],
Name + School ~ rowid(V1))
# Name School 1 2 3 4 5 6 7
# 1: Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
# 2: Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
# 3: Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
You could also rep Weight the number of Days, then rep NA enough times to complete the row.
max_days <- max(df$Days)
df[, as.list(rep(c(Weight, NA), c(Days, max_days - Days))),
.(Name, School)]
# Name School V1 V2 V3 V4 V5 V6 V7
# 1: Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
# 2: Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
# 3: Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
You can use pmap_dfr to apply a function across the rows and then row bind the resulting list into a tibble object. The function will match arguments to column names, the rest of the row values will be captured in the ellipsis ....
library(purrr)
library(dplyr)
pmap_dfr(df, function(Weight, Days, ...) c(..., setNames(rep(Weight, Days), 1:Days))) %>%
mutate(across(3:last_col(), as.numeric))
Because vectors are atomic in R c() will coerce everything in the row to be character. So the mutate converts the newly created columns back to numeric.
setNames is used to name the newly created columns, which is required to bind by row.
Output
Name School `1` `2` `3` `4` `5` `6` `7`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
Note: pmap_dfr is from the purrr package, and mutate, across, and last_col are all from dplyr.
How it works
When you use pmap in the way above the named function arguments will be matched to columns with the same name. So Weights and Days as function arguments are matched to those columns with the same name in each row.
The ... collects the remaining columns that are still passed to the function, but are unused (by name) in the function. Essentially, the ellipsis collects Name and School in your case.
Since Name and School already have names they are passed to c() first to maintain your column order. In addition we combine the other values and give them names as well. The output for a single row is then this:
Name School 1 2 3 4 5 6
"Antoine" "Bach" "0.03" "0.03" "0.03" "0.03" "0.03" NA
7
NA
The output of pmap is a list. _dfr is a specific function to row bind (hence the r) these list elements into a dataframe/tibble (hence the df).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With