Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Regex in Powershell to grab email

I have wrote a script to grab different fields in an HTML file and populate variables with the results. I'm having issues with the regular expression for grabbing the email. Here is some sample code:

$txt='<p class=FillText><a name="InternetMail_P3"></a>[email protected]</p>'

$re='.*?'+'([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7})'

if ($txt -match $re)
{
    $email1=$matches[1]
    write-host "$email1"
}

I get the following error:

Bad argument to operator '-match': parsing ".*?([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\
.)+[a-zA-Z]{2,7})([\\w-+]+(?:\\.[\\w-+]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7})" - [x-y] range in reverse order..
At line:7 char:16
+ if ($txt -match <<<<  $re)
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : BadOperatorArgument

What am I missing here? Also, is there a better regex for email?

Thanks in advance.

like image 637
gp80586 Avatar asked Jul 19 '12 15:07

gp80586


People also ask

Can you use regex in PowerShell?

A regular expression is a pattern used to match text. It can be made up of literal characters, operators, and other constructs. This article demonstrates regular expression syntax in PowerShell. PowerShell has several operators and cmdlets that use regular expressions.

How do I validate an email address in PowerShell?

This command syntactically validates the email address [email protected] and returns a boolean value which tells if the address is valid or not. Note that the default verification level is Syntax. To specify a different verification level use the -Level parameter.

What flavor of regex does PowerShell use?

PowerShell's regular expression flavorNET implementation. And . NET in turn essentially uses Perl 5's regular expression syntax, with a few added features such as named captures.


2 Answers

Actually any regex that is suitable for .Net or C# will work for PowerShell. And you could find tons and tons samples at stackoverflow and inet. For example: How to Find or Validate an Email Address: The Official Standard: RFC 2822

$txt='<p class=FillText><a name="InternetMail_P3"></a>[email protected]</p>'
$re="[a-z0-9!#\$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#\$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?"
[regex]::MAtch($txt, $re, "IgnoreCase ")

But there is also other part of this answer. Regex by nature is not very suitable to parse XML/HTML. You could find more details here: Using regular expressions to parse HTML: why not?

To provide real solution, I'm recomment first

  1. convert HTML → XHTML
  2. walk over XML tree
  3. work with individual nodes one by one, even using regex.
like image 111
Akim Avatar answered Nov 15 '22 06:11

Akim


When it comes to email validation I usually choose the short version of RFC 2822 being:

[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_{|}~-]+)*@(?:a-z0-9?.)+a-z0-9?

You can find more info about email validation here

like image 36
Pierluc SS Avatar answered Nov 15 '22 07:11

Pierluc SS