Split column label by number of letters/characters in R

Question

I have a large dataset where all column headers are individual IDS, each 8 characters in length. I would like to split those individual IDs into 2 rows, where the first row of IDs contains the first 7 characters, and the second row contains just the last character.

Current dataset:

ID1:    Indiv01A    Indiv01B    Indiv02A    Indiv02B    Speci03A    Speci03B

Intended dataset:

ID1:    Indiv01 Indiv01 Indiv02 Indiv02 Speci03 Speci03  
ID2:    A   B   A   B   A   B

I've looked through other posts on splitting data, but they all seem to have a unique way to separate the column name (ie: there's a comma separating the 2 components, or a period).

This is the code I'm thinking would work best, but I just can't figure out how to code for "7 characters" as the split point, rather than a comma:

sapply(strsplit(as.character(d$ID), ",")

Any help would be appreciated.

Sven Hohenstein · Accepted Answer

Here's a regular expression for a solution with strsplit. It splits the string between the 7th and the 8th character:

ID1 <- c("Indiv01A", "Indiv01B", "Indiv02A", "Indiv02B", "Speci03A", "Speci03B")

res <- strsplit(ID1, "(?<=.{7})", perl = TRUE)

# [[1]]
# [1] "Indiv01" "A"      
# 
# [[2]]
# [1] "Indiv01" "B"      
# 
# [[3]]
# [1] "Indiv02" "A"      
# 
# [[4]]
# [1] "Indiv02" "B"      
# 
# [[5]]
# [1] "Speci03" "A"      
# 
# [[6]]
# [1] "Speci03" "B"

Now, you can use rbind to create two columns:

do.call(rbind, res)
#      [,1]      [,2]
# [1,] "Indiv01" "A" 
# [2,] "Indiv01" "B" 
# [3,] "Indiv02" "A" 
# [4,] "Indiv02" "B" 
# [5,] "Speci03" "A" 
# [6,] "Speci03" "B"

Explanation of the regex pattern:

(?<=.{7})

The (?<=) is a (positive) lookbehind. It matches any position that is preceded by the specified pattern. Here, the pattern is .{7}. The dot (.) matches any character. {7} means 7 times. Hence, the regex matches the position that is preceded by exactly 7 characters.

Split column label by number of letters/characters in R

Tags:

regex

split

r

multiple-columns

KKL234

1 Answers

Sven Hohenstein

Recent Activity

Donate For Us

Split column label by number of letters/characters in R

Tags:

regex

split

r

multiple-columns

KKL234

1 Answers

Sven Hohenstein

Related questions

Recent Activity

Donate For Us