I have a dataframe like the following:
Name School Weight Days
Antoine Bach 0.03 5
Antoine Ken 0.02 7
Barbara Franklin 0.04 3
I would like to obtain an output like the following:
Name School 1 2 3 4 5 6 7
Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
Reproducible Sample Data:
df <- tribble(
~Name, ~School, ~Weight, ~Days,
"Antoine", "Bach", 0.03, 5,
"Antoine", "Ken", 0.02, 7,
"Barbara", "Franklin", 0.04, 3
)
How to repeat column values in R data frame by values in another column? First of all, create a data frame. Then, use rep function along with cbind function to repeat column values in the matrix by values in another column.
In R, the easiest way to repeat rows is with the REP() function. This function selects one or more observations from a data frame and creates one or more copies of them. Alternatively, you can use the SLICE() function from the dplyr package to repeat rows.
Using data.table you can create a long version by rep
eating the Weight
value Days
number of times for each row, then dcast
ing to a wide format with the rowid
of the new variable as the column.
library(data.table)
setDT(df)
dcast(df[, .(rep(Weight, Days)), .(Name, School)],
Name + School ~ rowid(V1))
# Name School 1 2 3 4 5 6 7
# 1: Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
# 2: Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
# 3: Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
You could also rep
Weight
the number of Days
, then rep NA
enough times to complete the row.
max_days <- max(df$Days)
df[, as.list(rep(c(Weight, NA), c(Days, max_days - Days))),
.(Name, School)]
# Name School V1 V2 V3 V4 V5 V6 V7
# 1: Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
# 2: Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
# 3: Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
You can use pmap_dfr
to apply a function across the rows and then row bind the resulting list into a tibble object. The function will match arguments to column names, the rest of the row values will be captured in the ellipsis ...
.
library(purrr)
library(dplyr)
pmap_dfr(df, function(Weight, Days, ...) c(..., setNames(rep(Weight, Days), 1:Days))) %>%
mutate(across(3:last_col(), as.numeric))
Because vectors are atomic in R c()
will coerce everything in the row to be character. So the mutate converts the newly created columns back to numeric.
setNames
is used to name the newly created columns, which is required to bind by row.
Output
Name School `1` `2` `3` `4` `5` `6` `7`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
Note: pmap_dfr
is from the purrr
package, and mutate
, across
, and last_col
are all from dplyr
.
How it works
When you use pmap
in the way above the named function arguments will be matched to columns with the same name. So Weights
and Days
as function arguments are matched to those columns with the same name in each row.
The ...
collects the remaining columns that are still passed to the function, but are unused (by name) in the function. Essentially, the ellipsis collects Name
and School
in your case.
Since Name
and School
already have names they are passed to c()
first to maintain your column order. In addition we combine the other values and give them names as well. The output for a single row is then this:
Name School 1 2 3 4 5 6
"Antoine" "Bach" "0.03" "0.03" "0.03" "0.03" "0.03" NA
7
NA
The output of pmap
is a list. _dfr
is a specific function to row bind (hence the r
) these list elements into a dataframe/tibble (hence the df
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With