Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Delete string in character vector that begins with capital letter

I have a df:

df <- c("hello goodbye Delete Me", "Another Sentence good program", "hello world The End")

I want this:

c("hello goodbye", "good program", "hello world")

I have tried:

df <- grep("^[A-Z]", df, invert = TRUE, value = TRUE)

but this deletes the entire character that starts with a capital letter:

c("hello goodbye Delete Me", "hello world The End")

How do I do this?

like image 419
aurelius_37809 Avatar asked Jul 17 '21 03:07

aurelius_37809


2 Answers

You can use -

trimws(gsub('[A-Z]\\w+', '', df))
#[1] "hello goodbye" "good program"  "hello world" 
like image 74
Ronak Shah Avatar answered Oct 30 '22 22:10

Ronak Shah


You may use the following regex pattern, and then replace with just a single space:

\s*[A-Z]\w+\s*

This will capture all words beginning with capital letters, along with any whitespace which might appear on either side. The outer call to trimws() is there to remove any spaces which might remain at the very start or end, as a leftover of the replacement logic.

x <- c("nice to meet You however", "cat Ran away", "Cat", "Dog")
trimws(gsub('\\s*[A-Z]\\w+\\s*', ' ', x))

[1] "nice to meet however" "cat away"             ""                    
[4] ""
like image 35
Tim Biegeleisen Avatar answered Oct 31 '22 00:10

Tim Biegeleisen