Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, remove all dots from string apart from the last

Tags:

regex

r

I have a list of strings like this:

mystr <- c("16.142.8",          
       "52.135.1",         
       "40.114.4",          
       "83.068.8",         
       "83.456.3",         
       "55.181.5",         
       "76.870.2",         
       "96.910.2",         
       "17.171.9",         
       "49.617.4",         
       "38.176.1",         
       "50.717.7",         
       "19.919.6")

I know that the first dot . is just a thousands separator, while the second one is the decimal operator.

I want to convert the strings to numbers, so the first one should become 16142.8, the second 52135.1, and so on.

I suspect that it migh be done with regular expressions, but I'm not sure how. Any ideas?

like image 956
ulima2_ Avatar asked Dec 07 '22 17:12

ulima2_


2 Answers

You need a lookahead based PCRE regex with gsub:

gsub("\\.(?=[^.]*\\.)", "", mystr, perl=TRUE)

See an online R demo

Details

  • \\. - a dot
  • (?=[^.]*\\.) - that is followed with 0 or more chars other than . (matched with [^.]*) and then a literal .. The (?=...) is a positive lookahead that requires some pattern to appear immediately to the right of the current location, but is not added to the match value and the regex index stays at the one and the same place, i.e. is not advanced.
like image 125
Wiktor Stribiżew Avatar answered Feb 09 '23 01:02

Wiktor Stribiżew


A simple "sub" can achieve the same, as it will only replace the first matching pattern. Example,

sub("\\.", "", mystr)
like image 34
Sagar Avatar answered Feb 09 '23 01:02

Sagar