Split rows to columns in R

Question

I have a text data file like this (with multiple rows for respective category [A,B,C]):

A=1,2,3,6,
7,9
10
B=3,4,5
C=5,7,8,10,11,
13,14

When I load to R it becomes a data frame with 6 obs. and 1 variable

df <- read.delim("file.text",header = FALSE)

    v1
1   A=1,2,3,6,
2   7,9
3   10
4   B=3,4,5
5   C=5,7,8,10,11,
6   13,14

How can I change to this format?

Thanks!

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

The basic approach would be to read the data in as a string, split on = and ,, and then figure out the best way to identify which group each number belongs with.

In the following approach, I've intentionally used type conversion to create the groups. Since this will involve coercion from character to numeric, since there are characters in the split values, expect several warning messages.

# Load the data.table package.
library(data.table)

# Read in the data.
x <- fread("A=1,2,3,6,
7,9
10
B=3,4,5
C=5,7,8,10,11,
13,14", sep = "
", header = FALSE)

x[, unlist(strsplit(V1, "=|,"), use.names = FALSE, recursive = FALSE), .I][
  , list(ind = as.integer(V1), col = rep(V1[1], .N)), cumsum(is.na(as.integer(V1)))][
    , dcast(na.omit(.SD), ind ~ col, value.var = "col", fill = "")]
#     ind A B C
#  1:   1 A    
#  2:   2 A    
#  3:   3 A B  
#  4:   4   B  
#  5:   5   B C
#  6:   6 A    
#  7:   7 A   C
#  8:   8     C
#  9:   9 A    
# 10:  10 A   C
# 11:  11     C
# 12:  13     C
# 13:  14     C

Here's another alternative using cSplit from my "splitstackshape" package. "x" is the same data read in using fread.

library(splitstackshape)
cSplit(
  cSplit(x[, toString(V1), cumsum(grepl("[A-Z]", V1))], "V1", "="), "V1_2", ",", "long")[
    , dcast(.SD, V1_2 ~ V1_1, value.var = "V1_1", fill = "")]

Ronak Shah · Answer

Here is a tidyverse approach.

library(dplyr)
library(tidyr)

df %>%
  #Remove commas at the end of the string
  mutate(V1 = sub(",$", "", V1)) %>%
  #Divide data into separate column based on "=" sign
  separate(V1, into = c("let", "num"), sep = "=", fill = "left") %>%
  #fill the NA values with it's respective group
  fill(let) %>%
  #Convert comma-separated value into different rows
  separate_rows(num, convert = TRUE) %>%
  #Convert data to wide format filling missing values with empty string
  pivot_wider(names_from = let, values_from = let, values_fill = list(let = "")) %>%
  #Arrange data according to numbers
  arrange(num)

#  # A tibble: 13 x 4
#     num A     B     C    
#   <int> <chr> <chr> <chr>
# 1     1 A     ""    ""   
# 2     2 A     ""    ""   
# 3     3 A     B     ""   
# 4     4 ""    B     ""   
# 5     5 ""    B     C    
# 6     6 A     ""    ""   
# 7     7 A     ""    C    
# 8     8 ""    ""    C    
# 9     9 A     ""    ""   
#10    10 A     ""    C    
#11    11 ""    ""    C    
#12    13 ""    ""    C    
#13    14 ""    ""    C

If you want num column as rownames add %>% column_to_rownames('num').

data

df <- read.table(text = "A=1,2,3,6,
7,9
10
B=3,4,5
C=5,7,8,10,11,
13,14")

Split rows to columns in R

Tags:

r

reshape

melt

CHONG

2 Answers

A5C1D2H2I1M1N2O1R2T1

Ronak Shah

Recent Activity

Donate For Us

Split rows to columns in R

Tags:

r

reshape

melt

CHONG

2 Answers

A5C1D2H2I1M1N2O1R2T1

Ronak Shah

Related questions

Recent Activity

Donate For Us