I have wrote a script to grab different fields in an HTML file and populate variables with the results. I'm having issues with the regular expression for grabbing the email. Here is some sample code:
$txt='<p class=FillText><a name="InternetMail_P3"></a>[email protected]</p>'
$re='.*?'+'([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7})'
if ($txt -match $re)
{
$email1=$matches[1]
write-host "$email1"
}
I get the following error:
Bad argument to operator '-match': parsing ".*?([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\
.)+[a-zA-Z]{2,7})([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7})" - [x-y] range in reverse order..
At line:7 char:16
+ if ($txt -match <<<< $re)
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : BadOperatorArgument
What am I missing here? Also, is there a better regex for email?
Thanks in advance.
A regular expression is a pattern used to match text. It can be made up of literal characters, operators, and other constructs. This article demonstrates regular expression syntax in PowerShell. PowerShell has several operators and cmdlets that use regular expressions.
This command syntactically validates the email address [email protected] and returns a boolean value which tells if the address is valid or not. Note that the default verification level is Syntax. To specify a different verification level use the -Level parameter.
PowerShell's regular expression flavorNET implementation. And . NET in turn essentially uses Perl 5's regular expression syntax, with a few added features such as named captures.
Actually any regex that is suitable for .Net or C# will work for PowerShell. And you could find tons and tons samples at stackoverflow and inet. For example: How to Find or Validate an Email Address: The Official Standard: RFC 2822
$txt='<p class=FillText><a name="InternetMail_P3"></a>[email protected]</p>'
$re="[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"
[regex]::MAtch($txt, $re, "IgnoreCase ")
But there is also other part of this answer. Regex by nature is not very suitable to parse XML/HTML. You could find more details here: Using regular expressions to parse HTML: why not?
To provide real solution, I'm recomment first
When it comes to email validation I usually choose the short version of RFC 2822 being:
[a-z0-9!#$%&'*+/=?^_
{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_
{|}~-]+)*@(?:a-z0-9?.)+a-z0-9?
You can find more info about email validation here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With