Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match words and those with an apostrophe

Update: As per comments regarding the ambiguity of my question, I've increased the detail in the question.

(Terminology: by words I am refering to any succession of alphanumerical characters.)

I'm looking for a regex to match the following, verbatim:

  • Words.
  • Words with one apostrophe at the beginning.
  • Words with any number of non-contiguous apostrophe throughout the middle.
  • Words with one apostrophe at the end.

I would like to match the following, however not verbatim, rather, removing the apostrophes:

  • Words with an apostrophe at the beginning and at the end would be matched to the word, without the apostrophes. So 'foo' would be matched to foo.
  • Words with more than one contiguous apostrophe in the middle would be resolved to two different words: the fragment before the contiguous apostrophes and the fragment after the contiguous apostrophes. So, foo''bar would be matched to foo and bar.
  • Words with more than one contiguous apostrophe at the beginning or at the end would be matched to the word, without the apostrophes. So, ''foo would be matched to foo and ''foo'' to foo.

Examples These would be matched verbatim:

  • 'bout
  • it's
  • persons'

But these would be ignored:

  • '
  • ''

And, for 'open', open would be matched.

like image 759
Humphrey Bogart Avatar asked Apr 08 '10 00:04

Humphrey Bogart


People also ask

How do you escape an apostrophe in regex?

How do you escape an apostrophe in regex? The apostrophe is a special character and needs to be escaped from the standard text by prefixing with '\', try pattern="^([a-zA-Z\'-]+)$" HTH.

How do you insert an apostrophe in regex?

Escape sequences For example, apostrophes. Apostrophes can be used in R to define strings (as well as quotation marks). For example name <- 'Cote d'Ivore'' will return an error. When we want to use an apostrophe as an apostrophe and not a string delimiter, we need to use the “escape” character \' .

What does \b mean in regex?

The \b metacharacter matches at the beginning or end of a word.

What does \s mean in regex?

The regular expression \s is a predefined character class. It indicates a single whitespace character. Let's review the set of whitespace characters: [ \t\n\x0B\f\r]


2 Answers

Try using this:

(?=.*\w)^(\w|')+$

'bout     # pass
it's      # pass
persons'  # pass
'         # fail
''        # fail

Regex Explanation

NODE      EXPLANATION
  (?=       look ahead to see if there is:
    .*        any character except \n (0 or more times
              (matching the most amount possible))
    \w        word characters (a-z, A-Z, 0-9, _)
  )         end of look-ahead
  ^         the beginning of the string
  (         group and capture to \1 (1 or more times
            (matching the most amount possible)):
    \w        word characters (a-z, A-Z, 0-9, _)
   |         OR
    '         '\''
  )+        end of \1 (NOTE: because you're using a
            quantifier on this capture, only the LAST
            repetition of the captured pattern will be
            stored in \1)
  $         before an optional \n, and the end of the
            string
like image 113
maček Avatar answered Oct 15 '22 08:10

maček


/('\w+)|(\w+'\w+)|(\w+')|(\w+)/
  • '\w+ Matches a ' followed by one or more alpha characters, OR
  • \w+'\w+ Matche sone or more alpha characters followed by a ' followed by one or more alpha characters, OR
  • \w+' Matches one or more alpha characters followed by a '
  • \w+ Matches one or more alpha characters
like image 23
WhirlWind Avatar answered Oct 15 '22 09:10

WhirlWind