Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regexp - find numbers in a string in any order

Tags:

regex

r

I need to find a regexp that allows me to find strings in which i have all the required numbers but only once.

For example:

a <- c("12","13","112","123","113","1123","23","212","223","213","2123","312","323","313","3123","1223","1213","12123","2313","23123","13123")

I want to get:

"123" "213" "312"

The pattern 123 only once and in any order and in any position of the string

I tried a lot of things and this seemed to be the closer while it's still very far from what I want :

grep('[1:3][1:3][1:3]', a, value=TRUE)
[1] "113"   "313"   "2313"  "13123"
like image 325
Théo Gaboriau Avatar asked Dec 25 '22 13:12

Théo Gaboriau


2 Answers

What i exactly need is to find all 3 digit numbers containing 1 2 AND 3 digits

Then you can safely use

grep('^[123]{3}$', a, value=TRUE)
##=> [1] "112" "123" "113" "212" "223" "213" "312" "323" "313"

The regex matches:

  • ^ - start of string
  • [123]{3} - Exactly 3 characters that are either 1, or 2 or 3
  • $ - assert the position at the end of string.

Also, if you only need unique values, use unique.

If you do not need to allow the same digit more than once, you need a Perl-based regex:

grep('^(?!.*(.).*\\1)[123]{3}$', a, value=TRUE, perl=T)
## => [1] "123" "213" "312"

Note the double escaped back-reference. The (?!.*(.).*\\1) negative look-ahead will check if the string has no repeated symbols with the help of a capturing group (.) and a back-reference that forces the same captured text to appear in the string. If the same characters are found, there will be no match. See IDEONE demo.

The (?!.*(.).*\\1) is a negative look-ahead. It only asserts the absence of some pattern after the current regex engine position, i.e. it checks and returns true if there is no match, otherwise it returns false. Thus, it does not not "consume" characters, it does not "match" the pattern inside the look-ahead, the regex engine stays at the same location in the input string. In this regex, it is the beginning of string (^). So, right at the beginning of the string, the regex engine starts looking for .* (any character but a newline, 0 or more repetitions), then captures 1 character (.) into group 1, again matches 0 or more characters with .*, and then tries to match the same text inside group 1 with \\1. Thus, if there is 121, there will be no match since the look-ahead will return false as it will find two 1s.

like image 76
Wiktor Stribiżew Avatar answered Jan 12 '23 10:01

Wiktor Stribiżew


you can as well use this

grep('^([123])((?!\\1)\\d)(?!\\2|\\1)\\d', a, value=TRUE, perl=T)

see demo

like image 29
james jelo4kul Avatar answered Jan 12 '23 11:01

james jelo4kul