Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping and stacking data

Tags:

dataframe

r

pivot

A sample of my data is :

dat <- read.table(text = " ID   BC1 DC1 DE1 MN2 DC2 PO2 SA3 BC3 KL3 AA4 AP4 BC4 PO4
        1   2   1   2   3   1   3   1   1   3   2   2   2   2
        2   3   1   1   2   3   1   1   2   3   1   1   3   2
        3   2   3   2   3   2   3   2   1   1   3   1   1   1
        4   3   3   1   1   1   1   1   2   2   1   2   1   2", header = TRUE)

I want to get the following table and missing data are blank

ID Group1  Group2  Group3  Group4
 1      2       1       2      
 2      3       1       1      
 3      2       3       2      
 4      3       3       1      
 1      3       1       3      
 2      2       3       1      
 3      3       2       3      
 4      1       1       1      
 1      1       1       3      
 2      1       2       3      
 3      2       1       1      
 4      1       2       2      
 1      2       2       2      2
 2      1       1       3      2
 3      3       1       1      1
 4      1       2       1      2

The number in front of each column is where the columns are separated from each other. For example BC1, DC1 and DE1. They form the first four rows with their Ids and MN2, DC2 and PO2 form the second rows with their IDs and so on.

like image 371
user330 Avatar asked Feb 24 '26 01:02

user330


1 Answers

What about using the row numbers with some pivoting?

library(dplyr)
library(tidyr)

dat |>
  pivot_longer(-ID, names_sep = "(?=\\d)", names_to = c(NA, "id")) |>
  group_by(ID, id) |>
  mutate(name = row_number()) |>
  pivot_wider(c(ID, id), names_prefix = "Group") |>
  arrange(id) |>
  ungroup() |>
  select(-id)

Or using a map:

library(purrr)
library(dplyr)

paste(1:4) |> # unique(readr::parse_number(names(dat |> select(-ID))))
  map(\(x) select(dat, ID, ends_with(x)) |> rename_with(\(x) names(x) <- paste0("Group", 1:length(x)), -ID)) |> 
  bind_rows()

Output:

# A tibble: 16 × 5
      ID Group1 Group2 Group3 Group4
   <int>  <int>  <int>  <int>  <int>
 1     1      2      1      2     NA
 2     2      3      1      1     NA
 3     3      2      3      2     NA
 4     4      3      3      1     NA
 5     1      3      1      3     NA
 6     2      2      3      1     NA
 7     3      3      2      3     NA
 8     4      1      1      1     NA
 9     1      1      1      3     NA
10     2      1      2      3     NA
11     3      2      1      1     NA
12     4      1      2      2     NA
13     1      2      2      2      2
14     2      1      1      3      2
15     3      3      1      1      1
16     4      1      2      1      2

Update 13-01: Now the first solution returns the correct ID (not id) + another approach added.

like image 58
harre Avatar answered Feb 26 '26 19:02

harre