i need a regex that matches an expression ending with a word boundary, but which does not consider the hyphen as a boundary. i.e. get all expressions matched by <pre class="prettyprint"><code>type ([a-z])\b </code></pre> but do not match e.g. <pre class="prettyprint"><code>type a-1 </code></pre> to rephrase: i want an equivalent of the word boundary operator \b which instead of using the word character class <code>[A-Za-z0-9_]</code>, uses the extended class: <code>[A-Za-z0-9_-]</code>

You can use a lookahead for this, the shortest would be to use a negative lookahead: <pre class="prettyprint"><code>type ([a-z])(?![\w-]) </code></pre> <code>(?![\w-])</code> would mean "fail the match if the next character is in <code>\w</code> or is a <code>-</code>". Here is an option that uses a normal lookahead: <pre class="prettyprint"><code>type ([a-z])(?=[^\w-]|$) </code></pre> You can read <code>(?=[^\w-]|$)</code> as "only match if the next character is not in the character class <code>[\w-]</code>, or this is the end of the string". See it working: http://www.rubular.com/r/NHYhv72znm

regex word boundary excluding the hyphen

i need a regex that matches an expression ending with a word boundary, but which does not consider the hyphen as a boundary. i.e. get all expressions matched by

type ([a-z])\b

but do not match e.g.

type a-1

to rephrase: i want an equivalent of the word boundary operator \b which instead of using the word character class [A-Za-z0-9_], uses the extended class: [A-Za-z0-9_-]

How do you escape a hyphen in regex?

The quantifier notations In regular expressions, the hyphen ("-") notation has special meaning; it indicates a range that would match any number from 0 to 9. As a result, you must escape the "-" character with a forward slash ("\") when matching the literal hyphens in a social security number.

What does \b mean in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.

How does word boundary work in regex?

Introduction to the Python regex word boundaryBefore the first character in the string if the first character is a word character ( \w ). Between two characters in the string if the first character is a word character ( \w ) and the other is not ( \W – inverse character set of the word character \w ).

I had a pretty similar problem except I didn't want to consider the '*' as a boundary character. Here's what I did:

\b(?<!\*)([^\s\*]+)\b(?!*)

Basically, if you're at a word boundary, look back one character and don't match if the previous character was an '*'. If you're in the middle, don't match on a space or asterisk. If you're at the end, make sure the end isn't an asterisk. In your case, I think you could use \w instead of \s. For me, this worked in these situations:

*word wo*rd word*

You can use a lookahead for this, the shortest would be to use a negative lookahead:

type ([a-z])(?![\w-])

(?![\w-]) would mean "fail the match if the next character is in \w or is a -".

Here is an option that uses a normal lookahead:

type ([a-z])(?=[^\w-]|$)

You can read (?=[^\w-]|$) as "only match if the next character is not in the character class [\w-], or this is the end of the string".

See it working: http://www.rubular.com/r/NHYhv72znm

regex word boundary excluding the hyphen

Tags:

regex

o17t H1H' S'k

People also ask

2 Answers

Jonathan

Andrew Clark

Recent Activity

Donate For Us

regex word boundary excluding the hyphen

Tags:

regex

o17t H1H' S'k

People also ask

2 Answers

Jonathan

Andrew Clark

Related questions

Recent Activity

Donate For Us