Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match everything but numbers regular expression

Tags:

regex

r

I want to have a regular expression that match anything that is not a correct mathematical number. the list below is a sample list as input for regex:

1

1.7654

-2.5

2-

2.

m

2..3

2....233..6

2.2.8

2--5

6-4-9

So the first three (in Bold) should not get selected and the rest should. This is a close topic to another post but because of it's negative nature, it is different.

I'm using R but any regular expression will do I guess. The following is the best shot in the mentioned post:

a <- c("1", "1.7654", "-2.5", "2-", "2.", "m", "2..3", "2....233..6", "2.2.8", "2--5", "6-4-9")
grep(pattern="(-?0[.]\\d+)|(-?[1-9]+\\d*([.]\\d+)?)|0$", x=a)

which outputs:

\[1\] 1  2  3  4  5  7  8  9 10 11
like image 458
Mehrad Mahmoudian Avatar asked Jul 13 '15 15:07

Mehrad Mahmoudian


2 Answers

You can use following regex :

^(?:((\d+(?=[^.]+|\.{2,})).)+|(\d\.){2,}).*|[^\d]+$

See demo https://regex101.com/r/tZ3uH0/6

Note that your regex engine should support look-ahead with variable length.and you need to use multi-line flag and as mentioned in comment you can use perl=T to active look-ahead in R.

this regex is contains 2 part that have been concatenated with an OR.first part is :

(?:((\d+(?=[^.]+|\.{2,})).)+|(\d\.){2,}).*

which will match a combination of digits that followed by anything except dot or by 2 or more dot.which the whole of this is within a capture group that can be repeat and instead of this group you can have a digit which followed by dot 2 or more time (for matching some strings like 2.3.4.) .

and at the second part we have [^\d]+ which will match anything except digit.

Regular expression visualization

Debuggex Demo

like image 112
Mazdak Avatar answered Oct 04 '22 00:10

Mazdak


a[grep("^-?\\d*(\\.?\\d*)$", a, invert=T)]

With a suggested edit from @Frank.

Speed Test

a <- rep(a, 1e4)
all.equal(a[is.na(as.numeric(a))], a[grep("^-?\\d+(\\.?\\d+)?$|^\\d+\\.$", a, invert=T)])
[1] TRUE

library(microbenchmark)
microbenchmark(dosc = a[is.na(as.numeric(a))],
           plafort = a[grep("^-?\\d*(\\.?\\d*)$", a, invert=T)])
# Unit: milliseconds
#     expr      min       lq     mean   median       uq      max neval
#     dosc 27.83477 28.32346 28.69970 28.51254 28.76202 31.24695   100
#  plafort 31.92118 32.14915 32.62036 32.33349 32.71107 35.12258   100
like image 25
Pierre L Avatar answered Oct 04 '22 01:10

Pierre L