I've been trying to extract the word before the match. For example, I have the following sentence:
"Allatoona was a town located in extreme southeastern Bartow County, Georgia."
I want to extract the word before "Bartow".
I've tried the following regex to extract that word:
\w\sCounty,
What I get returned is "w County" when what I wanted is just the word Bartow.
Any assistance would be greatly appreciated. Thanks!
Save this question. . means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it. My alphabet.txt contains a line abcdefghijklmnopqrstuvwxyz.
The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).
Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.
If you want to indicate a line break when you construct your RegEx, use the sequence “\r\n”. Whether or not you will have line breaks in your expression depends on what you are trying to match. Line breaks can be useful “anchors” that define where some pattern occurs in relation to the beginning or end of a line.
You can use this regex with a lookahead to find word before County
:
\w+(?=\s+County)
(?=\s+County)
is a positive lookahead that asserts presence of 1 or more whitespaces followed by word County
ahead of current match.
RegEx Demo
If you want to avoid lookahead then you can use a capture group:
(\w+)\s+County
and extract captured group #1 from match result.
Your \w\sCounty,
regex returns w County
because \w
matches a single character that is either a letter, digit, or _
. It does not match a whole word.
To match 1 or more symbols, you need to use a +
quantifier and to capture the part you need to extract you can rely on capturing groups, (...)
.
So, you can fix your pattern by mere replacing \w
with (\w+)
and then, after getting a match, access the Match.Groups[1].Value
.
However, if the county name contains a non-word symbol, like a hyphen, \w+
won't match it. A \S+
matching 1 or more non-whitespace symbols might turn out a better option in that case.
See a C# demo:
var m = Regex.Match(s, @"(\S+)\s+County");
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
See a regex demo.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With