Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regex where a set of options can occur at most once in a list, in any order

Tags:

python

regex

perl

I'm wondering if there's any way in python or perl to build a regex where you can define a set of options can appear at most once in any order. So for example I would like a derivative of foo(?: [abc])*, where a, b, c could only appear once. So:

foo a b c
foo b c a
foo a b
foo b

would all be valid, but

foo b b

would not be

like image 951
HardcoreHenry Avatar asked Oct 07 '21 19:10

HardcoreHenry


People also ask

What method regex returns a list of strings containing all matches?

Regex's findall() function is extremely useful as it returns a list of strings containing all matches.

How do you do multiple regex in Python?

made this to find all with multiple #regular #expressions. regex1 = r"your regex here" regex2 = r"your regex here" regex3 = r"your regex here" regexList = [regex1, regex1, regex3] for x in regexList: if re. findall(x, your string): some_list = re. findall(x, your string) for y in some_list: found_regex_list.

How do you replace all occurrences of a regex pattern in a string in python?

sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string. Set the count value to the number of replacements you want to perform.

How do you match a pattern to a list in Python?

Method : Using join regex + loop + re.match() This task can be performed using combination of above functions. In this, we create a new regex string by joining all the regex list and then match the string against it to check for match using match() with any of the element of regex list.

What is a regex in Python?

A regex is a special sequence of characters that defines a pattern for complex string-matching functionality. Earlier in this series, in the tutorial Strings and Character Data in Python, you learned how to define and manipulate string objects. Since then, you’ve seen some ways to determine whether two strings match each other:

What is regex cheat sheet in Python?

Python Regex Cheat Sheet. Regex or Regular Expressions are an important part of Python Programming or any other Programming Language. It is used for searching and even replacing the specified text pattern. In the regular expression, a set of characters together form the search pattern. It is also known as reg-ex pattern.

How do I split a string in Python with regular expression?

RegEx Module. Python has a built-in package called re, which can be used to work with Regular Expressions. Import the re module: import re. ... The split() function returns a list where the string has been split at each match: Example. Split at each white-space character: import re

How to match against a string using regex?

A pattern defined using RegEx can be used to match against a string. Matched? Python has a module named re to work with RegEx. Here's an example: import re pattern = '^a...s$' test_string = 'abyss' result = re.match (pattern, test_string) if result: print("Search successful.") else: print("Search unsuccessful.")


Video Answer


6 Answers

You may use this regex with a capture group and a negative lookahead:

For Perl, you can use this variant with forward referencing:

^foo((?!.*\1) [abc])+$

RegEx Demo

RegEx Details:

  • ^: Start
  • foo: Match foo
  • (: Start a capture group #1
    • (?!.*\1): Negative lookahead to assert that we don't match what we have in capture group #1 anywhere in input
    • [abc]: Match a space followed by a or b or c
  • )+: End capture group #1. Repeat this group 1+ times
  • $: End

As mentioned earlier, this regex is using a feature called Forward Referencing which is a back-reference to a group that appears later in the regex pattern. JGsoft, .NET, Java, Perl, PCRE, PHP, Delphi, and Ruby allow forward references but Python doesn't.


Here is a work-around of same regex for Python that doesn't use forward referencing:

^foo(?!.* ([abc]).*\1)(?: [abc])+$

Here we use a negative lookahead before repeated group to check and fail the match if there is any repeat of allowed substrings i.e. [abc].

RegEx Demo 2

like image 99
anubhava Avatar answered Oct 17 '22 04:10

anubhava


You can assert that there is no match for a second match for a space and a letter at the right:

foo(?!(?: [abc])*( [abc])(?: [abc])*\1)(?: [abc])*
  • foo Match literally
  • (?! Negative lookahead
    • (?: [abc])* Match optional repetitions of a space and a b or c
    • ( [abc]) Capture group, use to compare with a backreference for the same
    • (?: [abc])* Match again a space and either a b or c
    • \1 Backreference to group 1
  • ) Close lookahead
  • (?: [abc])* Match optional repetitions or a space and either a b or c

Regex demo

If you don't want to match only foo, you can change the quantifier to 1 or more (?: [abc])+


A variant in perl reusing the first subpattern using (?1) which refers to the capture group ([abc])

^foo ([abc])(?: (?!\1)((?1))(?: (?!\1|\2)(?1))?)?$

Regex demo

like image 39
The fourth bird Avatar answered Oct 17 '22 02:10

The fourth bird


If it doesn't have to be a regex:

import collections

# python >=3.10
def is_a_match(sentence):
    words = sentence.split()
    return (
      (len(words) > 0)
      and (words[0] == 'foo')
      and (collections.Counter(words) <= collections.Counter(['foo', 'a', 'b', 'c']))
    )

# python <3.10
def is_a_match(sentence):
    words = sentence.split()
    return (
      (len(words) > 0)
      and (words[0] == 'foo')
      and not (collections.Counter(words) - collections.Counter(['foo', 'a', 'b', 'c']))
    )

# TESTING
#foo a b c True
#foo b c a True
#foo a b True
#foo b True
#foo b b False

Or with a set and the walrus operator:

def is_a_match(sentence):
    words = sentence.split()
    return (
      (len(words) > 0)
      and (words[0] == 'foo')
      and (
        (s := set(words[1:])) <= set(['a', 'b', 'c'])
        and len(s) == len(words) - 1
      )
    )
like image 4
Stef Avatar answered Oct 17 '22 02:10

Stef


You can do it using references to previously captured groups.

foo(?: ([abc]))?(?: (?!\1)([abc]))?(?: (?!\1|\2)([abc]))?$

This gets quite long with many options. Such a regex can be generated dynamically, if necessary.

def match_sequence_without_repeats(options, seperator):
    def prevent_previous(n):
        if n == 0:
            return ""
        groups = "".join(rf"\{i}" for i in range(1, n + 1))
        return f"(?!{groups})"

    return "".join(
        f"(?:{seperator}{prevent_previous(i)}([{options}]))?"
        for i in range(len(options))
    )


print(f"foo{match_sequence_without_repeats('abc', ' ')}$")
like image 3
LeopardShark Avatar answered Oct 17 '22 04:10

LeopardShark


Here is a modified version of anubhava's answer, using a backreference (which works in Python, and is easier to understand at least for me) instead of a forward reference.

Match using [abc] inside a capturing group, then check that the text matched by the capturing group does not appear again anywhere after it:

^foo(?:( [abc])(?!.*\1))+$

regex demo

  • ^: Start
  • foo: Match foo
  • (?:: Start non-capturing group (?:( [abc])(?!.*\1))
    • ( [abc]): Capturing Group 1, matching a space followed by either a, b, or c
    • (?!.*\1): Negative lookahead, failing to match if the text matched by the first capturing group occurs after zero or more characters matched by .
  • )+: End non-capturing group and match it 1 or more times
  • $: End
like image 3
irregular espresso Avatar answered Oct 17 '22 03:10

irregular espresso


I have assumed that the elements of the string can be in any order and appear any number of times. For example, 'a foo' should match and 'a foo b foo' should not.

You can do that with a series of alternations employing lookaheads, one for each substring of interest, but it becomes a bit of a dog's breakfast when there are many strings to consider. Let's suppose you wanted to match zero or one "foo"'s and/or zero or one "a"'s. You could use the following regular expression:

^(?:(?!.*\bfoo\b)|(?=(?:(?!\bfoo\b).)*\bfoo\b(?!(.*\bfoo\b))))(?:(?!.*\ba\b)|(?=(?:(?!\ba\b).)*\ba\b(?!(.*\ba\b))))

Start your engine!

This matches, for example, 'foofoo', 'aa' and afooa. If they are not to be matched remove the word breaks (\b).

Notice that this expression begins by asserting the start of the string (^) followed by two positive lookaheads, one for 'foo' and one for 'a'. To also check for, say, 'c' one would tack on

(?:(?!.*\bc\b)|(?=(?:(?!\bc\b).)*\bc\b(?!(.*\bc\b))))

which is the same as

(?:(?!.*\ba\b)|(?=(?:(?!\ba\b).)*\ba\b(?!(.*\ba\b))))

with \ba\b changed to \bc\b.

It would be nice to be able to use back-references but I don't see how that could be done.

By hovering over the regular expression in the link an explanation is provided for each element of the expression. (If this is not clear I am referring to the cursor.)

Note that

(?!\bfoo\b).

matches a character provided it does not begin the word 'foo'. Therefore

(?:(?!\bfoo\b).)*

matches a substring that does not contain 'foo' and does not end with 'f' followed by 'oo'.

Would I advocate this approach in practice, as opposed to using simple string methods? Let me ponder that.

like image 2
Cary Swoveland Avatar answered Oct 17 '22 04:10

Cary Swoveland