Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing leading zeros from alphanumeric characters in R

I have a character vector d with alphanumeric characters

d <- c("012309 template", "separate 00340", "00045", "890 098", "3405 garage", "matter00908")

d
[1] "012309 template" "separate 00340"  "00045"           "890 098"         "3405 garage"     "matter00908"  

How can I remove the leading zeros from all the numbers in R? as.numeric will remove all leading zeros only in numeric or integer vectors. I have tried gsub with regex but could not get the desired results.

The expected output is as follows

out <- c("12309 template", "seperate 340", "45", "890 98", "3405 garage", "matter908")
out
[1] "12309 template" "seperate 340"   "45"             "890 98"         "3405 garage"    "matter908"  
like image 964
Crops Avatar asked May 08 '14 10:05

Crops


2 Answers

You could use a negative lookbehind to eliminate 0 unless preceded by a digit:

> d <- c("100001", "012309 template", "separate 00340", "00045", "890 098", "3405 garage", "matter00908")
> gsub("(?<![0-9])0+", "", d, perl = TRUE)
[1] "100001"         "12309 template" "separate 340"   "45"            
[5] "890 98"         "3405 garage"    "matter908"     

Another way using regex:

> gsub("(^|[^0-9])0+", "\\1", d, perl = TRUE)
[1] "100001"         "12309 template" "separate 340"   "45"            
[5] "890 98"         "3405 garage"    "matter908"     
>
like image 165
devnull Avatar answered Oct 21 '22 23:10

devnull


Here's the solution utilizing stri_replace_all_regex from the stringi package:

d <- c("012309 template", "separate 00340", "00045",
       "890 098", "3405 garage", "matter00908")
library("stringi")
stri_replace_all_regex(d, "\\b0*(\\d+)\\b", "$1")
## [1] "12309 template" "separate 340"   "45"             "890 98"
## [5] "3405 garage"    "matter00908"   

Explanation: We are matching all sequences of digits within word boundaries (\b). Trailing zeros are matched greedily (0+). The remaining digits (\d denotes any digit, \d+ denotes their non-empty sequence) are captured within a group ((...)). Then we replace all such matches with the group-captured stuff only.

If you'd also wish to remove 0s within words (as in your example), just omit \b and call:

stri_replace_all_regex(d, "0*(\\d+)", "$1")
## [1] "12309 template" "separate 340"   "45"             "890 98"
## [5] "3405 garage"    "matter908"  
like image 41
gagolews Avatar answered Oct 21 '22 23:10

gagolews