I'm looking for a regex to match hyphenated words in python.
The closest I've managed to get is: '\w+-\w+[-w+]*'
text = "one-hundered-and-three- some text foo-bar some--text"
hyphenated = re.findall(r'\w+-\w+[-\w+]*',text)
which returns list ['one-hundered-and-three-', 'foo-bar'].
This is almost perfect except for the trailing hyphen after 'three'. I only want the additional hyphen if followed by a 'word'. i.e. instead of the '[-\w+]*' I need something like '(-\w+)*' which I thought would work, but doesn't (it returns ['-three, '']). i.e. something that matches |word followed by hyphen followed by word followed by hyphen_word zero or more times|.
In regular expressions, the hyphen ("-") notation has special meaning; it indicates a range that would match any number from 0 to 9. As a result, you must escape the "-" character with a forward slash ("\") when matching the literal hyphens in a social security number.
It means "dash." They probably expect some negative numbers (i.e. -0.5). The () means that it is capturing the matches.
Inside character class - denotes range. e.g. 0-9 . If you want to include - , write it in beginning or ending of character class like [-0-9] or [0-9-] . You also don't need to escape .
Try this:
re.findall(r'\w+(?:-\w+)+',text)
Here we consider a hyphenated word to be:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With