Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split character string multiple times every two characters

I have a character column in my dataframe that looks like

df<-
  data.frame(a=c("AaBbCC","AABBCC","AAbbCC"))#df
       a
1 AaBbCC
2 AABBCC
3 AAbbCC

I would like to split this column every two characters. So in this case I would like to obtain three columns named VA,VB,VC. I tried

library(tidyr)
library(dplyr)
df<-
  data.frame(a=c("AaBbCC","AABBCC","AAbbCC"))%>%
  separate(a,c(paste("V",LETTERS[1:3],sep="")),sep=c(2,2))
 VA VB   VC
1 Aa    BbCC
2 AA    BBCC
3 AA    bbCC

but this is not the desired result. I like to have the result that is now in VC split into VB (all letter B) and VC (all letter C)How do I get R to split every two characters. The length of the string in the column is always the same for every row (6 in this example). I will have strings that are of length >10.

like image 389
user2386786 Avatar asked Jan 09 '16 15:01

user2386786


People also ask

Can a string be split on multiple characters?

Method 1: Split multiple characters from string using re. split() This is the most efficient and commonly used method to split multiple characters at once. It makes use of regex(regular expressions) in order to do this.

How do you split a string with every nth character?

To split a string every n characters: Import the wrap() method from the textwrap module. Pass the string and the max width of each slice to the method. The wrap() method will split the string into a list with items of max length N.

How do I split a string into multiple strings?

split() The method split() splits a String into multiple Strings given the delimiter that separates them. The returned object is an array which contains the split Strings. We can also pass a limit to the number of elements in the returned array.

How do you split a string by multiple delimiters?

There are multiple ways you can split a string or strings of multiple delimiters in python. The most and easy approach is to use the split() method, however, it is meant to handle simple cases.


2 Answers

You were actually quite close. You need to specify the separator-positions as sep = c(2,4) instead of sep = c(2,2):

df <- separate(df, a, c(paste0("V",LETTERS[1:3])), sep = c(2,4))

you get:

> df
  VA VB VC
1 Aa Bb CC
2 AA BB CC
3 AA bb CC

In base R you could do (borrowing from @rawr's comment):

l <- ave(as.character(df$a), FUN = function(x) strsplit(x, '(?<=..)', perl = TRUE))
df <- data.frame(do.call('rbind', l))

which gives:

> df
  X1 X2 X3
1 Aa Bb CC
2 AA BB CC
3 AA bb CC
like image 200
Jaap Avatar answered Oct 23 '22 07:10

Jaap


We could do this with base R

read.csv(text=gsub('(..)(?!$)', '\\1,', df$a, 
    perl=TRUE),col.names=paste0("V", LETTERS[1:3]), header=FALSE)
#  VA VB VC
#1 Aa Bb CC
#2 AA BB CC
#3 AA bb CC

If we are reading directly from the file, another option is read.fwf

read.fwf(file="yourfile.txt", widths=c(2,2,2), skip=1)
like image 35
akrun Avatar answered Oct 23 '22 05:10

akrun