I've been trying to extract the word before the match. For example, I have the following sentence: "Allatoona was a town located in extreme southeastern Bartow County, Georgia." I want to extract the word before "Bartow". I've tried the following regex to extract that word: <pre class="prettyprint"><code>\w\sCounty, </code></pre> What I get returned is "w County" when what I wanted is just the word Bartow. Any assistance would be greatly appreciated. Thanks!

You can use this regex with a lookahead to find word before <code>County</code>: <pre class="prettyprint"><code>\w+(?=\s+County) </code></pre> <code>(?=\s+County)</code> is a positive lookahead that asserts presence of 1 or more whitespaces followed by word <code>County</code> ahead of current match. RegEx Demo If you want to avoid lookahead then you can use a capture group: <pre class="prettyprint"><code>(\w+)\s+County </code></pre> and extract captured group #1 from match result.

Your <code>\w\sCounty,</code> regex returns <code>w County</code> because <code>\w</code> matches a single character that is either a letter, digit, or <code>_</code>. It does not match a whole word. To match 1 or more symbols, you need to use a <code>+</code> quantifier and to capture the part you need to extract you can rely on capturing groups, <code>(...)</code>. So, you can fix your pattern by mere replacing <code>\w</code> with <code>(\w+)</code> and then, after getting a match, access the <code>Match.Groups[1].Value</code>. However, if the county name contains a non-word symbol, like a hyphen, <code>\w+</code> won't match it. A <code>\S+</code> matching 1 or more non-whitespace symbols might turn out a better option in that case. See a C# demo: <pre class="prettyprint"><code>var m = Regex.Match(s, @"(\S+)\s+County"); if (m.Success) { Console.WriteLine(m.Groups[1].Value); } </code></pre> See a regex demo.<img src="https://i.stack.imgur.com/uCbKd.png" alt="enter image description here">

Regex to return the word before the match

I've been trying to extract the word before the match. For example, I have the following sentence:

"Allatoona was a town located in extreme southeastern Bartow County, Georgia."

I want to extract the word before "Bartow".

I've tried the following regex to extract that word:

\w\sCounty,

What I get returned is "w County" when what I wanted is just the word Bartow.

Any assistance would be greatly appreciated. Thanks!

What does ?= * Mean in regex?

Save this question. . means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it. My alphabet.txt contains a line abcdefghijklmnopqrstuvwxyz.

What does \b mean in regex?

The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).

What is \r and \n in regex?

Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.

How do you match line breaks in regex?

If you want to indicate a line break when you construct your RegEx, use the sequence “\r\n”. Whether or not you will have line breaks in your expression depends on what you are trying to match. Line breaks can be useful “anchors” that define where some pattern occurs in relation to the beginning or end of a line.

You can use this regex with a lookahead to find word before County:

\w+(?=\s+County)

(?=\s+County) is a positive lookahead that asserts presence of 1 or more whitespaces followed by word County ahead of current match.

RegEx Demo

If you want to avoid lookahead then you can use a capture group:

(\w+)\s+County

and extract captured group #1 from match result.

Your \w\sCounty, regex returns w County because \w matches a single character that is either a letter, digit, or _. It does not match a whole word.

To match 1 or more symbols, you need to use a + quantifier and to capture the part you need to extract you can rely on capturing groups, (...).

So, you can fix your pattern by mere replacing \w with (\w+) and then, after getting a match, access the Match.Groups[1].Value.

However, if the county name contains a non-word symbol, like a hyphen, \w+ won't match it. A \S+ matching 1 or more non-whitespace symbols might turn out a better option in that case.

See a C# demo:

var m = Regex.Match(s, @"(\S+)\s+County");
if (m.Success) 
{
     Console.WriteLine(m.Groups[1].Value);  
}

See a regex demo. enter image description here

Regex to return the word before the match

Tags:

c#

regex

Andy Evans

People also ask

2 Answers

anubhava

Wiktor Stribiżew

Recent Activity

Donate For Us

Regex to return the word before the match

Tags:

c#

regex

Andy Evans

People also ask

2 Answers

anubhava

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us