Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex last character of a WORD

Tags:

string

regex

I'm attempting to match the last character in a WORD.

A WORD is a sequence of non-whitespace characters '[^\n\r\t\f ]', or an empty line matching ^$.

The expression I made to do this is: "[^ \n\t\r\f]\(?:[ \$\n\t\r\f]\)"

The regex matches a non-whitespace character that follows a whitespace character or the end of the line.

But I don't know how to stop it from excluding the following whitespace character from the result and why it doesn't seem to capture a character preceding the end of the line.

Using the string "Hi World!", I would expect: the "i" and "!" to be captured.

Instead I get: "i ".

What steps can I take to solve this problem?

like image 290
Aquaactress Avatar asked May 08 '17 21:05

Aquaactress


People also ask

How do I get the last character of a string in RegEx?

. {2} matches two characters. $ matches the end of the string.

What does \b mean in RegEx?

The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).

What is \r and \n in RegEx?

Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.

What does * do in RegEx?

The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.


1 Answers

"Word" that is a sequence of non-whitespace characters scenario

Note that a non-capturing group (?:...) in [^ \n\t\r\f](?:[ \$\n\t\r\f]) still matches (consumes) the whitespace char (thus, it becomes a part of the match) and it does not match at the end of the string as the $ symbol is not a string end anchor inside a character class, it is parsed as a literal $ symbol.

You may use

\S(?!\S)

See the regex demo

The \S matches a non-whitespace char that is not followed with a non-whitespace char (due to the (?!\S) negative lookahead).

General "word" case

If a word consists of just letters, digits and underscores, that is, if it is matched with \w+, you may simply use

\w\b

Here, \w matches a "word" char, and the word boundary asserts there is no word char right after.

See another regex demo.

like image 88
Wiktor Stribiżew Avatar answered Sep 20 '22 23:09

Wiktor Stribiżew