Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separate string into many columns

I'd like to separate each letter or symbol in a string for composing a new data.frame with dimension equals the number of letters. I tried to use the function separate from tidyr package, but the result is not desired.

df <- data.frame(x = c('house', 'mouse'), y = c('count', 'apple'), stringsAsFactors = F)

#unexpected result df[1, ] %>% separate(x, c('A1', 'A2', 'A3', 'A4', 'A5'), sep ='') A1 A2 A3 A4 A5 y 1 count

Expected output

A1  A2  A3  A4  A5
 h   o   u   s   e
 m   o   u   s   e

Solutions using stringr are welcome.

like image 854
Wagner Jorge Avatar asked Apr 13 '26 08:04

Wagner Jorge


2 Answers

We can use regex lookaround in sep to match the boundary between each character

library(dplyr)
library(tidyr)
library(stringr)
df %>%
   select(x) %>% 
   separate(x, into = str_c("A", 1:5), sep= "(?<=[a-z])(?=[a-z])")
#  A1 A2 A3 A4 A5
#1  h  o  u  s  e
#2  m  o  u  s  e
like image 125
akrun Avatar answered Apr 14 '26 22:04

akrun


We can use cSplit from splitstackshape with stripWhite = FALSE and sep = "" to split every letter in a column.

splitstackshape::cSplit(df, "x", sep = "", stripWhite = FALSE)

#       y x_1 x_2 x_3 x_4 x_5
#1: count   h   o   u   s   e
#2: apple   m   o   u   s   e
like image 27
Ronak Shah Avatar answered Apr 14 '26 22:04

Ronak Shah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!