Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string by last two characters in R? (/negative string indices)

Tags:

split

dataframe

r

My data frame looks like:

b <- data.frame(height = c(190,165,174,176), name = c('John Smith 34','Mr.Turner 54', 'Antonio P. 23', 'John Brown 31'))

#   height          name
# 1    190 John Smith 34
# 2    165  Mr.Turner 54
# 3    174 Antonio P. 23
# 4    176 John Brown 31

As we can see name and age are the same value. So I want to split it by last two characters in string:

  height       name age
1    190 John Smith  34
2    165  Mr.Turner  54
3    174 Antonio P.  23
4    176 John Brown  31

How I can do that?

like image 338
Denis Avatar asked Feb 28 '26 08:02

Denis


2 Answers

tidyr::separate makes separating columns simple by allowing you to pass an integer index of split position, including negatively indexed from the end of the string. (Regex works as well, of course.)

library(tidyr)

b %>% separate(name, into = c('name', 'age'), sep = -4, convert = TRUE)
##   height        name age
## 1    190 John Smith   34
## 2    165  Mr.Turner   54
## 3    174 Antonio P.   23
## 4    176 John Brown   31

or separate by the final space:

b %>% separate(name, into = c('name', 'age'), sep = '\\s(?=\\S*?$)', convert = TRUE)

which returns the same thing.

In base R, it's a bit more work:

b$name <- as.character(b$name)
split_name <- strsplit(b$name, '\\s(?=\\S*?$)', perl = TRUE)
split_name <- do.call(rbind, split_name)
colnames(split_name) <- c('name', 'age')
b <- data.frame(b[-2], split_name, stringsAsFactors = FALSE)
b$age <- type.convert(b$age)

b
##   height       name age
## 1    190 John Smith  34
## 2    165  Mr.Turner  54
## 3    174 Antonio P.  23
## 4    176 John Brown  31
like image 63
alistaire Avatar answered Mar 02 '26 22:03

alistaire


many options here using regular expression.I would use substr because you want know exactly the number of characters to extract.

Within data.table (for syntax-sugar):

library(data.table)
setDT(b)[,c("name","age"):=list(
  substr(name,1,nchar(name)-3),
  substr(name,nchar(name)-2,nchar(name)))]

   height       name age
1:    190 John Smith  34
2:    165  Mr.Turner  54
3:    174 Antonio P.  23
4:    176 John Brown  31

Note that name should be a character :

  b <- data.frame(
  height = c(190,165,174,176), 
  name = c('John Smith 34','Mr.Turner 54', 'Antonio P. 23', 'John Brown 31'),
  stringsAsFactors = FALSE)
like image 44
agstudy Avatar answered Mar 02 '26 20:03

agstudy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!