A sample of my data is :
dat <- read.table(text = " ID BC1 DC1 DE1 MN2 DC2 PO2 SA3 BC3 KL3 AA4 AP4 BC4 PO4
1 2 1 2 3 1 3 1 1 3 2 2 2 2
2 3 1 1 2 3 1 1 2 3 1 1 3 2
3 2 3 2 3 2 3 2 1 1 3 1 1 1
4 3 3 1 1 1 1 1 2 2 1 2 1 2", header = TRUE)
I want to get the following table and missing data are blank
ID Group1 Group2 Group3 Group4
1 2 1 2
2 3 1 1
3 2 3 2
4 3 3 1
1 3 1 3
2 2 3 1
3 3 2 3
4 1 1 1
1 1 1 3
2 1 2 3
3 2 1 1
4 1 2 2
1 2 2 2 2
2 1 1 3 2
3 3 1 1 1
4 1 2 1 2
The number in front of each column is where the columns are separated from each other. For example BC1, DC1 and DE1. They form the first four rows with their Ids and MN2, DC2 and PO2 form the second rows with their IDs and so on.
What about using the row numbers with some pivoting?
library(dplyr)
library(tidyr)
dat |>
pivot_longer(-ID, names_sep = "(?=\\d)", names_to = c(NA, "id")) |>
group_by(ID, id) |>
mutate(name = row_number()) |>
pivot_wider(c(ID, id), names_prefix = "Group") |>
arrange(id) |>
ungroup() |>
select(-id)
Or using a map:
library(purrr)
library(dplyr)
paste(1:4) |> # unique(readr::parse_number(names(dat |> select(-ID))))
map(\(x) select(dat, ID, ends_with(x)) |> rename_with(\(x) names(x) <- paste0("Group", 1:length(x)), -ID)) |>
bind_rows()
Output:
# A tibble: 16 × 5
ID Group1 Group2 Group3 Group4
<int> <int> <int> <int> <int>
1 1 2 1 2 NA
2 2 3 1 1 NA
3 3 2 3 2 NA
4 4 3 3 1 NA
5 1 3 1 3 NA
6 2 2 3 1 NA
7 3 3 2 3 NA
8 4 1 1 1 NA
9 1 1 1 3 NA
10 2 1 2 3 NA
11 3 2 1 1 NA
12 4 1 2 2 NA
13 1 2 2 2 2
14 2 1 1 3 2
15 3 3 1 1 1
16 4 1 2 1 2
Update 13-01: Now the first solution returns the correct ID (not id) + another approach added.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With