I have a string in R in the following form:
example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")
And I wish to obtain two columns:
namei1 namej1 | surname1
name2 | surnamei2 surnamej2
name3 | surname3
I try using string split:
example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")
pattern <- "\\,+[[:space:]]"
str_split(example, pattern)
But, I get stuck from here…
read.csv(text = gsub("([^,]+,[^,]+),", "\\1\n", example),
header = FALSE, stringsAsFactors = FALSE)
# V1 V2
# 1 namei1 namej1 surname1
# 2 name2 surnamei2 surnamej2
# 3 name3 surname3
We can split the string at ,
followed by zero or more spaces (\\s*
), then create a grouping variable based on the occurance of 'name' string and split
the vector
(v1
) into a list
of vector
s, rbind the
listelements and convert it to a
data.frame`
v1 <- strsplit(example, ",\\s*")[[1]]
setNames(do.call(rbind.data.frame, split(v1, cumsum(grepl('\\bname',
v1)))), paste0("V", 1:2))
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
Or another option is scan
and convert it to a two column matrix
as.data.frame( matrix(trimws(scan(text = example, sep=",",
what = "", quiet = TRUE)), byrow = TRUE, ncol = 2))
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
Or another option is gsub
where we replace the ,
followed by space and 'name' string with \n
and 'name' and use that in. read.csv
to split based on the delimiter ,
read.csv(text = gsub(", name", "\nname", example), header= FALSE)
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With