Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex for hyphenated words

I'm looking for a regex to match hyphenated words in python.

The closest I've managed to get is: '\w+-\w+[-w+]*'

text = "one-hundered-and-three- some text foo-bar some--text"
hyphenated = re.findall(r'\w+-\w+[-\w+]*',text)

which returns list ['one-hundered-and-three-', 'foo-bar'].

This is almost perfect except for the trailing hyphen after 'three'. I only want the additional hyphen if followed by a 'word'. i.e. instead of the '[-\w+]*' I need something like '(-\w+)*' which I thought would work, but doesn't (it returns ['-three, '']). i.e. something that matches |word followed by hyphen followed by word followed by hyphen_word zero or more times|.

like image 612
Sixhobbits Avatar asked Dec 05 '11 09:12

Sixhobbits


People also ask

How do you specify a hyphen in regex?

In regular expressions, the hyphen ("-") notation has special meaning; it indicates a range that would match any number from 0 to 9. As a result, you must escape the "-" character with a forward slash ("\") when matching the literal hyphens in a social security number.

What does a hyphen mean in Python?

It means "dash." They probably expect some negative numbers (i.e. -0.5). The () means that it is capturing the matches.

How do you add a hyphen to a regular expression in Java?

Inside character class - denotes range. e.g. 0-9 . If you want to include - , write it in beginning or ending of character class like [-0-9] or [0-9-] . You also don't need to escape .


1 Answers

Try this:

re.findall(r'\w+(?:-\w+)+',text)

Here we consider a hyphenated word to be:

  • a number of word chars
  • followed by any number of:
    • a single hyphen
    • followed by word chars
like image 69
a'r Avatar answered Sep 21 '22 18:09

a'r