I have the following composition of phonenumbers where 33 is the area code:
+331234567
+3301234567
00331234567
003301234567
0331234567
033-123-456-7
0033.1234567
where Im expecting only 331234567
What I have tried to clean those numbers using R
R::tidyverse::str_replace_all(c("+331234567", "033-123-456-7", "0033.1234567"), pattern = "[^0-9.]", replacement = "") removing non-numeric characters
R::tidyverse::str_replace_all("0331234567", pattern = "^0", replacement = "") removing the leading 0
R::tidyverse::str_replace_all("00331234567", pattern = "^00", replacement = "") removing the leading 00
my question is how to remove the zeros in between: 3301234567 or 003301234567 or +3301234567 or 03301234567
Appreciate any help
You can use
gsub("^(?:00?|\\+)330?|\\W", "", x, perl=TRUE)
See the regex demo. See the R demo online.
If there can be more 0s after 33 before the number you need to extract, replace 0? with 0*.
Details
^ - start of string(?:00?|\+) - 00, 0 or +330? - 33 or 330| - or\W - any non-word char.You can use ^\+?0*3*0*|[^\s\d]
Pattern explanation:
^ - match beginning of the string
\+? - match + literally, zero or one time.
0* - match zero or more 0
3* - match zero or more 3
| - alternation
[^\s\d] - negated character class - match any character other from whitespace and digit (you could remove \s if you handle one number at a time, it just prevents from matching newline in demo)
Regex demo
It will match unwanted parts separately. First part will clean beginning of a number if it starts with + or 0, second part will clean non-digits inside the number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With