Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A regex to remove the pattern "[0-9]g"

Tags:

regex

r

I have the following sample dataset:

XYZ 185g
ABC 60G
Gha 20g

How do I remove the strings "185g", "60G", "20g" without accidentally removing the alphabets g and G in the main words? I tried the below code but it replaces the alphabets in the main words as well.

a <- str_replace_all(a$words,"[0-9]"," ")
a <- str_replace_all(a$words,"[gG]"," ")
like image 225
Shalvaze Avatar asked Sep 06 '21 11:09

Shalvaze


2 Answers

You need to combine them into something like

a$words <- str_replace_all(a$words,"\\s*\\d+[gG]$", "")

The \s*\d+[gG]$ regex matches

  • \s* - zero or more whitespaces
  • \d+ - one or more digits
  • [gG] - g or G
  • $ - end of string.

If you can have these strings inside a string, not just at the end, you may use

a$words <- str_replace_all(a$words,"\\s*\\d+[gG]\\b", "")

where $ is replaced with a \b, a word boundary.

To ignore case,

a$words <- str_replace_all(a$words, regex("\\s*\\d+g\\b", ignore_case=TRUE), "")
like image 140
Wiktor Stribiżew Avatar answered Oct 19 '22 04:10

Wiktor Stribiżew


You can try

> gsub("\\s\\d+g$", "", c("XYZ 185g", "ABC 60G", "Gha 20g"), ignore.case = TRUE)
[1] "XYZ" "ABC" "Gha"
like image 36
ThomasIsCoding Avatar answered Oct 19 '22 03:10

ThomasIsCoding