Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex - find anagrams and sub-anagrams

Tags:

regex

anagram

I have a pool of characters and I want to match all the words which are anagrams of those chars or of a subset of those chars using a regular expression.

Example: given the string "ACNE" the regex should give me these results:

  • ACNE [T]
  • CENA [T]
  • CAN [T]
  • CAAN [F]
  • CANEN [F]

I've tried this solution /b[acne]{1,4}/b but it accepts multiple repetitions of single chars. What can I do to take each char at most one time?

like image 849
HghlnDR Avatar asked Jan 28 '13 12:01

HghlnDR


1 Answers

The sub-anagrams of the word "acne" are the words that

  • consist only of the letters acne
  • do not contain a more than once
  • do not contain c more than once
  • do not contain n more than once
  • do not contain e more than once

Compiling this into a regex:

^(?!.*a.*a)(?!.*c.*c)(?!.*n.*n)(?!.*e.*e)[acne]*$

Test: regexpal

Alternatively, since "acne" does not contain any letter more than once, the sub-anagrams of the word "acne" are the words that

  • consist only of the letters acne
  • do not contain any letter more than once.

Compiling this into a regex:

^(?!.*(.).*\1)[acne]*$

Test: regexpal

Note: the sub-anagrams of the word "magmoid" can be matched as

^(?!.*([agoid]).*\1)(?!(.*m){3})[magoid]*$

(do not contain any of agoid more than once, and do not contain m more than twice)

like image 97
John Dvorak Avatar answered Sep 23 '22 08:09

John Dvorak