Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R-- Add leading zero to string, with no fixed string format

Tags:

r

leading-zero

I have a column as below.

9453, 55489, 4588, 18893, 4457, 2339, 45489HQ, 7833HQ

I would like to add leading zero if the number is less than 5 digits. However, some numbers have "HQ" in the end, some don't.(I did check other posts, they dont have similar problem in the "HQ" part)

so the finally desired output should be:

09453, 55489, 04588, 18893, 04457, 02339, 45489HQ, 07833HQ

any idea how to do this? Thank you so much for reading my post!

like image 846
C_Mu Avatar asked Jan 24 '18 21:01

C_Mu


2 Answers

A one-liner using regular expressions:

my_strings <- c("9453", "55489", "4588", 
      "18893", "4457", "2339", "45489HQ", "7833HQ")

gsub("^([0-9]{1,4})(HQ|$)", "0\\1\\2",my_strings)

[1] "09453"   "55489"   "04588"   "18893"   
    "04457"   "02339"   "45489HQ" "07833HQ"

Explanation:

^ start of string
[0-9]{1,4} one to four numbers in a row
(HQ|$) the string "HQ" or the end of the string

Parentheses represent capture groups in order. So 0\\1\\2 means 0 followed by the first capture group [0-9]{1,4} and the second capture group HQ|$.

Of course if there is 5 numbers, then the regex isn't matched, so it doesn't change.

like image 122
thc Avatar answered Oct 13 '22 02:10

thc


I was going to use the sprintf approach, but found the the stringr package provides a very easy solution.

library(stringr)
x <- c("9453", "55489", "4588", "18893", "4457", "2339", "45489HQ", "7833HQ")
[1] "9453"    "55489"   "4588"    "18893"   "4457"    "2339"    "45489HQ" "7833HQ"

This can be converted with one simple stringr::str_pad() function:

stringr::str_pad(x, 5, side="left", pad="0")
[1] "09453"   "55489"   "04588"   "18893"   "04457"   "02339"   "45489HQ" "7833HQ" 

If the number needs to be padded even if the total string width is >5, then the number and text need to be separated with regex. The following will work. It combines regex matching with the very helpful sprintf() function:

sprintf("%05.0f%s", # this encodes the format and recombines the number with padding (%05.0f) with text(%s)
        as.numeric(gsub("^(\\d+).*", "\\1", x)), #get the number
        gsub("[[:digit:]]+([a-zA-Z]*)$", "\\1", x)) #get just the text at the end
[1] "09453"   "55489"   "04588"   "18893"   "04457"   "02339"   "45489HQ" "07833HQ"
like image 3
Matt L. Avatar answered Oct 13 '22 02:10

Matt L.