Sorry about the title, I couldn't come up with a clean way to ask my question.
In Python I would like to match an expression 'c[some stuff]t', where [some stuff] could be any number of consecutive a's, b's, or c's and in any order.
For example, these work: 'ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat'
but these don't: 'cbcbbaat', 'caaccbabbt'
Edit: a's, b's, and c's are just an example but I would really like to be able to extend this to more letters. I'm interested in regex and non-regex solutions.
'Structural Pattern Matching' was newly introduced in Python 3.10. The syntax for this new feature was proposed in PEP 622 in JUne 2020. The pattern matching statement of Python was inspired by similar syntax found in Scala, Erlang, and other languages.
Python 3.10 was released in mid-2021 and comes with structural pattern matching, also known as a match case statement. This is Python 3.10's most important new feature; the new functionality allows you to more easily control the flow of your programs by executing certain parts of code if conditions (or cases) are met.
As of early 2021, the match keyword does not exist in the released Python versions <= 3.9. Since Python doesn't have any functionality similar to switch/case in other languages, you'd typically use nested if/elif/else statements or a dictionary.
Not thoroughly tested, but I think this should work:
import re
words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat', 'cbcbbaat', 'caaccbabbt']
pat = re.compile(r'^c(?:([abc])\1*(?!.*\1))*t$')
for w in words:
print w, "matches" if pat.match(w) else "doesn't match"
#ct matches
#cat matches
#cbbt matches
#caaabbct matches
#cbbccaat matches
#cbcbbaat doesn't match
#caaccbabbt doesn't match
This matches runs of a
, b
or c
(that's the ([abc])\1*
part), while the negative lookahead (?!.*\1)
makes sure no other instance of that character is present after the run.
(edit: fixed a typo in the explanation)
Not sure how attached you are to regex, but here is a solution using a different method:
from itertools import groupby
words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat', 'cbcbbaat', 'caaccbabbt']
for w in words:
match = False
if w.startswith('c') and w.endswith('t'):
temp = w[1:-1]
s = set(temp)
match = s <= set('abc') and len(s) == len(list(groupby(temp)))
print w, "matches" if match else "doesn't match"
The string matches if a set of the middle characters is a subset of set('abc')
and the number of groups returned by groupby()
is the same as the number of elements in the set.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With