Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex - finding capital words in string

Tags:

python

regex

I'm trying to learn how to use regular expressions but have a question. Let's say I have the string

line = 'Cow Apple think Woof`

I want to see if line has at least two words that begin with capital letters (which, of course, it does). In Python, I tried to do the following

import re
test = re.search(r'(\b[A-Z]([a-z])*\b){2,}',line)
print(bool(test))

but that prints False. If I instead do

test = re.search(r'(\b[A-Z]([a-z])*\b)',line)

I find that print(test.group(1)) is Cow but print(test.group(2)) is w, the last letter of the first match (there are no other elements in test.group).

Any suggestions on pinpointing this issue and/or how to approach the problem better in general?

like image 203
Argon Avatar asked Apr 15 '17 04:04

Argon


2 Answers

The last letter of the match is in group because of inner parentheses. Just drop those and you'll be fine.

>>> t = re.findall('([A-Z][a-z]+)', line)
>>> t
['Cow', 'Apple', 'Woof']
>>> t = re.findall('([A-Z]([a-z])+)', line)
>>> t
[('Cow', 'w'), ('Apple', 'e'), ('Woof', 'f')]

The count of capitalised words is, of course, len(t).

like image 61
Synedraacus Avatar answered Oct 18 '22 12:10

Synedraacus


I use the findall function to find all instances that match the regex. The use len to see how many matches there are, in this case, it prints out 3. You can check if the length is greater than 2 and return a True or False.

import re

line = 'Cow Apple think Woof'

test = re.findall(r'(\b[A-Z]([a-z])*\b)',line)
print(len(test) >= 2)

If you want to use only regex, you can search for a capitalized word then some characters in between and another capitalized word.

test = re.search(r'(\b[A-Z][a-z]*\b)(.*)(\b[A-Z][a-z]*\b)',line)
print(bool(test))
  • (\b[A-Z][a-z]*\b) - finds a capitalized word
  • (.*) - matches 0 or more characters
  • (\b[A-Z][a-z]*\b) - finds the second capitalized word

This method isn't as dynamical since it will not work for trying to match 3 capitalized word.

like image 27
davidhu Avatar answered Oct 18 '22 12:10

davidhu