I have a character column in my dataframe that looks like
df<-
data.frame(a=c("AaBbCC","AABBCC","AAbbCC"))#df
a
1 AaBbCC
2 AABBCC
3 AAbbCC
I would like to split this column every two characters. So in this case I would like to obtain three columns named VA,VB,VC
.
I tried
library(tidyr)
library(dplyr)
df<-
data.frame(a=c("AaBbCC","AABBCC","AAbbCC"))%>%
separate(a,c(paste("V",LETTERS[1:3],sep="")),sep=c(2,2))
VA VB VC
1 Aa BbCC
2 AA BBCC
3 AA bbCC
but this is not the desired result. I like to have the result that is now in VC
split into VB
(all letter B) and VC
(all letter C)How do I get R to split every two characters. The length of the string in the column is always the same for every row (6 in this example).
I will have strings that are of length >10.
Method 1: Split multiple characters from string using re. split() This is the most efficient and commonly used method to split multiple characters at once. It makes use of regex(regular expressions) in order to do this.
To split a string every n characters: Import the wrap() method from the textwrap module. Pass the string and the max width of each slice to the method. The wrap() method will split the string into a list with items of max length N.
split() The method split() splits a String into multiple Strings given the delimiter that separates them. The returned object is an array which contains the split Strings. We can also pass a limit to the number of elements in the returned array.
There are multiple ways you can split a string or strings of multiple delimiters in python. The most and easy approach is to use the split() method, however, it is meant to handle simple cases.
You were actually quite close. You need to specify the separator-positions as sep = c(2,4)
instead of sep = c(2,2)
:
df <- separate(df, a, c(paste0("V",LETTERS[1:3])), sep = c(2,4))
you get:
> df VA VB VC 1 Aa Bb CC 2 AA BB CC 3 AA bb CC
In base R you could do (borrowing from @rawr's comment):
l <- ave(as.character(df$a), FUN = function(x) strsplit(x, '(?<=..)', perl = TRUE))
df <- data.frame(do.call('rbind', l))
which gives:
> df X1 X2 X3 1 Aa Bb CC 2 AA BB CC 3 AA bb CC
We could do this with base R
read.csv(text=gsub('(..)(?!$)', '\\1,', df$a,
perl=TRUE),col.names=paste0("V", LETTERS[1:3]), header=FALSE)
# VA VB VC
#1 Aa Bb CC
#2 AA BB CC
#3 AA bb CC
If we are reading directly from the file, another option is read.fwf
read.fwf(file="yourfile.txt", widths=c(2,2,2), skip=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With