Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to remove leading zeros in R, unless the final (or only) character is zero

gsub("(?<![0-9])0+", "", c("005", "0AB", "000", "0"), perl = TRUE)
#> [1] "5"  "AB" ""   ""
gsub("(^|[^0-9])0+", "\\1", c("005", "0AB", "000", "0"), perl = TRUE)
#> [1] "5"  "AB" ""   ""

The regular expression above is from this SO thread explaining how to remove all leading zeros from a string in R. As a consequence of this regular expression both "000" and "0" are transformed into "". Instead I want to remove all leading zeros from a string of characters, except for the cases when the final character happens to be zero, or the only character is zero.

"005" would become "5"
"0AB" would become "AB"
"000" would become "0"
"0"   would become "0"

This other SO thread explains how to do what I want, but I don't think I'm getting the syntax quite correct, applying the solution in R. And I don't really understand the distinction between the 1st and 2nd solution below (if they indeed worked).

gsub("s/^0*(\d+)$/$1/;", "", c("005", "0AB", "000", "0"), perl = TRUE)  # 1st solution
# Error: '\d' is an unrecognized escape in character string starting ""s/^0*(\d"
gsub("s/0*(\d+)/$1/;", "", c("005", "0AB", "000", "0"), perl = TRUE)    # 2nd solution
# Error: '\d' is an unrecognized escape in character string starting ""s/0*(\d"

What is the proper regex in R to get what I want?

like image 701
Display name Avatar asked Dec 20 '19 16:12

Display name


Video Answer


2 Answers

You may remove all zeros from the start of a string but not the last one:

sub("^0+(?!$)", "", x, perl=TRUE)

See the regex demo.

Details

  • ^ - start of a string
  • 0+ - one or more zeros
  • (?!$) - a negative lookahead that fails the match if there is an end of string position immediately to the right of the current location

See the R demo:

x <- c("005", "0AB", "000", "0")
sub("^0+(?!$)", "", x, perl=TRUE)
## => [1] "5"  "AB" "0"  "0"
like image 173
Wiktor Stribiżew Avatar answered Oct 21 '22 18:10

Wiktor Stribiżew


By using a non word boundary \B. See this demo at regex101 or an R demo at tio.run.

sub("^0+\\B", "", s)

This will not match the last zero, because right of it there is no word character.

like image 29
bobble bubble Avatar answered Oct 21 '22 16:10

bobble bubble