I have a string which is like:
Return-Path: [email protected]
Received-SPF: pass (fake.link.com: Sender is authorized to use '[email protected]' in 'mfrom' identity (mechanism 'include:spf.smtp2go.com' matched)) receiver=pmxlab01.permission.email; identity=mailfrom; envelope-from="[email protected]"; helo=e2i353.smtp2go.com; client-ip=103.2.141.97
Received: from e2i353.smtp2go.com (e2i353.smtp2go.com [103.2.141.97])
by mailserver.fake.com(Proxmox) with ESMTP id A4F983E1048
for <[email protected]>; Tue, 24 Aug 2021 14:47:20 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
d=smtpcorp.com; s=a1-4; h=Feedback-ID:X-Smtpcorp-Track:Message-Id:Subject:
Date:To:From:Reply-To:Sender:List-Unsubscribe;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=; b=STU7lctit7L5LJ2tA3Re1fe4II
lXJbY/SBXTGqCHh9p4K86aLK5Bvz98Q7eR9xwjFib6x4NoZZ5L1fke0XQERd1eQvxkl9R+kRIGU8A
QOtrLPpt8coN8P+syoaTRR4pDJQG9OfJO1fON9OaOP8HwnEg/91ie6Cm+wQRxjwyat859uAcu89Xv
6/mrcequkSp6kfiQN4goZ7vMYJYfBYuooslbTciaK4SYIfxdINyrrWGA6QhJPobdW0uuedRNY5jBG
OdMbVmm7FTpxDJs51rB1PTIcFQ8W1oypcttqSgCjI+5eMVrabU/IoIxhX5F0Cn3zm7E9CHlaJuLt1
CRXVbwdw==;
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=fake.com; [email protected]; q=dns/txt; s=s575655;
t=1629812840; h=from : subject : to : message-id : date;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=;
b=TEeEsPNLf7Wi6b8aaxE6JvfymfBKYjLq7izcUVrOXTW7sGIznxOA5udhfmDh15Fgp6Qgh
Kv5HX9uPNa8TEeoaJ+gV/4KERuscnc4GXEHwo0eclktx6f6JI5h1/q+qCe34+cN/EweaP5n
iOs+nrzsRuWn/iQ0Yck+b4IXVWHoTW8298xmBNuC1JF4jIVXREJFAC0nACfGU03OlpjDXf/
qvI6Ffnn5YGTNxgIkOdrtymaqOvjG9NM0PWtgSkvsTCJdUvxkrI+rRUG6ixiNi+vifqwvox
aQ6BRnMmeNK7A954Dy9r9r09QzbTthsBsi+lORKH7DntBKhm7Rb5/Q9j0xVA==
Received: from [10.176.58.103] (helo=SmtpCorp) by smtpcorp.com with esmtpsa
(TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <[email protected]>)
id 1mIWls-TRjyEC-AK for [email protected]; Tue, 24 Aug 2021 13:47:20 +0000
Received: from [10.86.20.232] (helo=DESKTOP-69OG2R3)
by smtpcorp.com with esmtpsa (TLS1.2:ECDHE_RSA_SECP256R1__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <[email protected]>)
id 1mIWlr-9EFPsz-U0 for [email protected]; Tue, 24 Aug 2021 13:47:19 +0000
MIME-Version: 1.0
From: [email protected]
To: [email protected]
Date: 24 Aug 2021 14:46:30 +0100
Subject: Test Email 2xM9e5Dj
Content-Type: multipart/alternative;
boundary=--boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Message-Id: <[email protected]>
X-Smtpcorp-Track: 1XmW_r9EFeszl0.JChXLDDjoy7xH
Feedback-ID: 575655m:575655aVI_MaS:575655sNpPp5WOdD
X-Report-Abuse: Please forward a copy of this message, including all headers,
to <[email protected]>
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a text message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a html message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e--
This is stored in a variable called $emailText
I'm trying to use a regex to take the From part out of the text
From: [email protected]
My regex isnt super strong, however my testing looks like this works: (?<=From: ).*.
But when I try and take the text out, it appears I can't get the regex to go through properly.
echo [[ $emailText =~ (?<=From: ).*. ]]
To start using Regex in Python, you first need to import Python’s “re” module This post is divided into three sections, reviewing three simple functions to extract useful information from strings with examples. Regex’s findall () function is extremely useful as it returns a list of strings containing all matches.
This article is for advanced users, who are already familiar with basic regular expressions in Bash. For an introduction to Bash regular expressions, see our Bash regular expressions for beginners with examples article instead. Another article which you may find interesting is Regular Expressions in Python. Ready to get started?
In this topic, we are going to learn about Bash Variable in String. In the programming world, the variable is thought to be an advanced programming concept, where the programmer would use variable only when the value is not known to the code from the start. For example, if we write a program to calculate the sum of 10 & 20.
The expr command is a member of the Coreutils package. Therefore, it’s available on all Linux systems. Further, expr has also a substr subcommand that we can use to extract index-based substrings easily: It’s worth mentioning that the expr command uses the 1-based index system.
bash
regex doesn't support lookbehind or lookahead assertions.
It is much easier to use a non-regex approach using awk here:
awk -F ': ' '$1 == "From" {print $2}' <<< "$emailText"
[email protected]
With bash
:
[[ "$emailText" =~ From:\ ([^$'\n']*) ]] && echo "${BASH_REMATCH[1]}"
Output:
[email protected]
With your shown samples, attempts; please try following awk
code. Simple explanation would be, checking condition if 1st field is From: then print 2nd field of that line.
awk '$1=="From:"{print $2}' Input_file
2nd solution: In case you have only 1 entry of From:
in whole file then try following, where we can use exit
function to exit from Input_file after printing the matched line, to stop un-necessary reading of whole Input_file.
awk '$1=="From:"{print $2;exit}' Input_file
Assuming you only want the email terminus, here's a quick and dirty Awk script.
awk '/^$/ { exit 1 }
/^From: .* <[^<>@]+@[^<>]+>/ {
split($0, g, /[<>]/); print g[1]; exit }
/^From: / { print $2; exit }' file.eml
This should work correctly for all these cases:
From: Real Name <[email protected]>
From: "Name, Real" <[email protected]>
From: [email protected]
From: [email protected] (Real Name)
From: =?q?utf-8?Real_N=A3=E4me?= <[email protected]>
As especially the last example should convince you, you will need significantly more work if you also need the full name of the correspondent in normalized form.
If there should be a mail address present, you can match it first using awk
(without the unsupported need for lookarounds)
awk 'match($0, /^From: [^[:space:]@]+@[^[:space:]@]+$/) {
print $2
}' <<< "$emailText"
Output
[email protected]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With