Trying to use a wildcard in C# to grab information from a webpage source, but I cannot seem to figure out what to use as the wildcard character. Nothing I've tried works!
The wildcard only needs to allow for numbers, but as the page is generated the same every time, I may as well allow for any characters.
Regex statement in use:
Regex guestbookWidgetIDregex = new Regex("GuestbookWidget(' INSERT WILDCARD HERE ', '(.*?)', 500);", RegexOptions.IgnoreCase);
If anyone can figure out what I'm doing wrong, it would be greatly appreciated!
The wildcard character is .
.
To match any number of arbitrary characters, use .*
(which means zero or more .
) or .+
(which means one or more .
)
Note that you need to escape your parentheses as \\(
and \\)
. (or \(
and \)
in an @""
string)
In regular expression, the dot .
matches almost any character. The only characters it doesn't normally match are the newline characters. For the dot to match all characters, you must enable what is called the single line mode (aka "dot all").
In C#, this is specified using RegexOptions.Singleline
. You can also embed this as (?s)
in the pattern.
The .
isn't the only regex metacharacters. They are:
( ) { } [ ] ? * + - ^ $ . | \
Depending on where they appear, if you want these characters to mean literally (e.g. .
as a period), you may need to do what is called "escaping". This is done by preceding the character with a \
.
Of course, a \
is also an escape character for C# string literals. To get a literal \
, you need to double it in your string literal (i.e. "\\"
is a string of length one). Alternatively, C# also has what is called @
-quoted string literals, where escape sequences are not processed. Thus, the following two strings are equal:
"c:\\Docs\\Source\\a.txt"
@"c:\Docs\Source\a.txt"
Since \
is used a lot in regular expression, @
-quoting is often used to avoid excessive doubling.
string
Regular expression engines allow you to define character classes, e.g. [aeiou]
is a character class containing the 5 vowel letters. You can also use -
metacharacter to define a range, e.g. [0-9]
is a character classes containing all 10 digit characters.
Since digit characters are so frequently used, regex also provides a shorthand notation for it, which is \d
. In C#, this will also match decimal digits from other Unicode character sets, unless you're using RegexOptions.ECMAScript
where it's strictly just [0-9]
.
\w
It looks like the following will work for you:
@-quoting digits_ _____anything but ', captured
| / \ / \
new Regex(@"GuestbookWidget\('\d*', '([^']*)', 500\);", RegexOptions.IgnoreCase);
\/ \/
escape ( escape )
Note that I've modified the pattern slightly so that it uses negated character class instead of reluctance wildcard matching. This causes a slight difference in behavior if you allow '
to be escaped in your input string, but neither pattern handle this case perfectly. If you're not allowing '
to be escaped, however, this pattern is definitely better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With