I'm trying to learn how to use regular expressions but have a question. Let's say I have the string
line = 'Cow Apple think Woof`
I want to see if line
has at least two words that begin with capital letters (which, of course, it does). In Python, I tried to do the following
import re
test = re.search(r'(\b[A-Z]([a-z])*\b){2,}',line)
print(bool(test))
but that prints False
. If I instead do
test = re.search(r'(\b[A-Z]([a-z])*\b)',line)
I find that print(test.group(1))
is Cow
but print(test.group(2))
is w
, the last letter of the first match (there are no other elements in test.group
).
Any suggestions on pinpointing this issue and/or how to approach the problem better in general?
The last letter of the match is in group because of inner parentheses. Just drop those and you'll be fine.
>>> t = re.findall('([A-Z][a-z]+)', line)
>>> t
['Cow', 'Apple', 'Woof']
>>> t = re.findall('([A-Z]([a-z])+)', line)
>>> t
[('Cow', 'w'), ('Apple', 'e'), ('Woof', 'f')]
The count of capitalised words is, of course, len(t)
.
I use the findall
function to find all instances that match the regex. The use len
to see how many matches there are, in this case, it prints out 3
. You can check if the length is greater than 2 and return a True
or False
.
import re
line = 'Cow Apple think Woof'
test = re.findall(r'(\b[A-Z]([a-z])*\b)',line)
print(len(test) >= 2)
If you want to use only regex, you can search for a capitalized word then some characters in between and another capitalized word.
test = re.search(r'(\b[A-Z][a-z]*\b)(.*)(\b[A-Z][a-z]*\b)',line)
print(bool(test))
(\b[A-Z][a-z]*\b)
- finds a capitalized word(.*)
- matches 0 or more characters(\b[A-Z][a-z]*\b)
- finds the second capitalized wordThis method isn't as dynamical since it will not work for trying to match 3 capitalized word.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With