From what I understand, Regex is not the best thing to use when scanning for emails within a given document. I am wondering if there are any alternatives to this? Or some best practice way that I'm unaware of?
For parsing jobs it is always a good idea to rely on libraries. You are right, a library will always have dealt with the problem in more detail than a regular expression, considering different cases, etc.
One Ruby library for parsing emails is Mail:
Mail is an internet library for Ruby that is designed to handle emails generation, parsing and sending in a simple, rubyesque manner.
[...] Mail has been designed with a very simple object oriented system that really opens up the email messages you are parsing, if you know what you are doing, you can fiddle with every last bit of your email directly.
Here is an example of how the email's content is accessed:
mail = Mail.read('/path/to/message.eml')
mail.envelope.from #=> '[email protected]'
mail.from.addresses #=> ['[email protected]', '[email protected]']
mail.sender.address #=> '[email protected]'
mail.to #=> '[email protected]'
mail.cc #=> '[email protected]'
mail.subject #=> "This is the subject"
mail.date.to_s #=> '21 Nov 1997 09:55:06 -0600'
mail.message_id #=> '<[email protected]>'
mail.body.decoded #=> 'This is the body of the email...
It also enables you to parse a multipart email, as well as test and extract the attachments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With