Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the second item after using str_split() in R

Tags:

r

dplyr

stringr

I have a data frame that contains some questions. I want to drop the leading number and period from the question, but leave everything else. I don't really understand regex, but this seems like a perfect use for str_split(), specifically within a dplyr pipeline. However, after splitting the string, I'm not sure how to grab the the second item. I tried accessing it by position, and that didn't work.

x <- structure(list(question = c("01. I like my job.", 
                                 "02. I like my house.", 
                                 "03. I like my car.")), class = "data.frame", row.names = c(NA, -3L))

x %>% 
  mutate(words = str_split(question, "."))

Returns this:

question                        words
01. I like my job.         <chr [19]>           
02. I like my house.       <chr [21]>           
03. I like my car.         <chr [19]>   

I want it to look like this:

question                             words
01. I like my job.         I like my job.           
02. I like my house.       I like my house.     
03. I like my car.         I like my car.

I've also tried using separate() and strsplit() but I couldn't make any of those work either.

like image 700
CurtLH Avatar asked Sep 19 '25 06:09

CurtLH


2 Answers

I think you're looking for str_replace (or sub in base R)

x %>% mutate(words = str_replace(question, "^\\d+\\.", ""))
#              question             words
#1   01. I like my job.    I like my job.
#2 02. I like my house.  I like my house.
#3   03. I like my car.    I like my car.

Explanation:

  1. ^ is the left string anchor
  2. \\d+\\. matches one or more digit(s) followed by a full stop

You can use str_split in the following way

x %>% mutate(words = paste0(map_chr(str_split(question, "\\."), 2), "."))

giving the same result.

like image 152
Maurits Evers Avatar answered Sep 21 '25 21:09

Maurits Evers


You can change the pattern to be \\., and then get the second element for the word column.

library(tidyverse)

x %>% 
  mutate(words = str_split(question, "\\. ")[[1]][[2]]) 
#               question          words
# 1   01. I like my job. I like my job.
# 2 02. I like my house. I like my job.
# 3   03. I like my car. I like my job.
like image 45
www Avatar answered Sep 21 '25 21:09

www