What would be the regular expressions to extract the name and email from strings like these?
[email protected]
John <[email protected]>
John Doe <[email protected]>
"John Doe" <[email protected]>
It can be assumed that the email is valid. The name will be separated by the email by a single space, and might be quoted.
The expected results are:
[email protected]
Name: nil
Email: [email protected]
John <[email protected]>
Name: John
Email: [email protected]
John Doe <[email protected]>
Name: John Doe
Email: [email protected]
"John Doe" <[email protected]>
Name: John Doe
Email: [email protected]
This is my progress so far:
(("?(.*)"?)\s)?(<?(.*@.*)>?)
(which can be tested here: http://regexr.com/?337i5)
In the example, we created a function with regex /([a-zA-Z0-9. _-]+@[a-zA-Z0-9. _-]+\. [a-zA-Z0-9_-]+)/ to extract email ids (address) from the long text.
To extract emails form text, we can take of regular expression. In the below example we take help of the regular expression package to define the pattern of an email ID and then use the findall() function to retrieve those text which match this pattern.
i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.
\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \.
The following regex appears to work on all inputs and uses only two capturing groups:
(?:"?([^"]*)"?\s)?(?:<?(.+@[^>]+)>?)
http://regex101.com/r/dR8hL3
Thanks to @RohitJain and @burning_LEGION for introducing the idea of non-capturing groups and character exclusion respectively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With