Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match words with hyphens and/or apostrophes

I was looking for a regex to match words with hyphens and/or apostrophes. So far, I have:

(\w+([-'])(\w+)?[']?(\w+))

and that works most of the time, though if there's a apostrophe and then a hyphen, like "qu'est-ce", it doesn't match. I could append more optionals, though perhaps there's another more efficient way?

Some examples of what I'm trying to match: Mary's, High-school, 'tis, Chambers', Qu'est-ce.

like image 683
empedocle Avatar asked Aug 10 '15 02:08

empedocle


4 Answers

use this pattern

(?=\S*['-])([a-zA-Z'-]+)

Demo

(?=                 # Look-Ahead
  \S                # <not a whitespace character>
  *                 # (zero or more)(greedy)
  ['-]              # Character in ['-] Character Class
)                   # End of Look-Ahead
(                   # Capturing Group (1)
  [a-zA-Z'-]        # Character in [a-zA-Z'-] Character Class
  +                 # (one or more)(greedy)
)                   # End of Capturing Group (1)
like image 63
alpha bravo Avatar answered Oct 19 '22 21:10

alpha bravo


[\w'-]+ would match pretty much any occurrence of words with (or without) hyphens and apostrophes, but also in cases where those characters are adjacent. (?:\w|['-]\w)+ should match cases where the characters can't be adjacent.

If you need to be sure that the word contains hyphens and/or apostrophes and that those characters aren't adjacent maybe try \w*(?:['-](?!['-])\w*)+. But that would also match ' and - alone.

like image 45
OrderNChaos Avatar answered Oct 19 '22 20:10

OrderNChaos


debuggex.com is a great resource for visualizing these sorts of things

\b\w*[-']\w*\b should do the trick

like image 11
Patrick Avatar answered Oct 19 '22 19:10

Patrick


The problem you're running into is that you actually have three possible sub-patterns: one or more chars, an apostrophe followed by one or more chars, and a hyphen followed by one or more chars.

This presumes you don't wish to accept words that begin or end with apostrophes or hyphens or have hyphens next to apostrophes (or vice versa).

I believe the best way to represent this in a RegExp would be:

/\b[a-z]+(?:['-]?[a-z]+)*\b/

which is described as:

\b                   # word-break
[a-z]+               # one or more
(?:                  # start non-matching group
  ['-]?              # zero or one
  [a-z]+             # one or more
)*                   # end of non-matching group, zero or more
\b                   # word-break

which will match any word that begins and ends with an alpha and can contain zero or more groups of either a apos or a hyphen followed by one or more alpha.

like image 4
Rob Raisch Avatar answered Oct 19 '22 20:10

Rob Raisch