Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting string to variable using regex bash

Tags:

regex

bash

awk

I have a string which is like:

Return-Path: [email protected]
Received-SPF: pass (fake.link.com: Sender is authorized to use '[email protected]' in 'mfrom' identity (mechanism 'include:spf.smtp2go.com' matched)) receiver=pmxlab01.permission.email; identity=mailfrom; envelope-from="[email protected]"; helo=e2i353.smtp2go.com; client-ip=103.2.141.97
Received: from e2i353.smtp2go.com (e2i353.smtp2go.com [103.2.141.97])
    by mailserver.fake.com(Proxmox) with ESMTP id A4F983E1048
    for <[email protected]>; Tue, 24 Aug 2021 14:47:20 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
    d=smtpcorp.com; s=a1-4; h=Feedback-ID:X-Smtpcorp-Track:Message-Id:Subject:
    Date:To:From:Reply-To:Sender:List-Unsubscribe;
    bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=; b=STU7lctit7L5LJ2tA3Re1fe4II
    lXJbY/SBXTGqCHh9p4K86aLK5Bvz98Q7eR9xwjFib6x4NoZZ5L1fke0XQERd1eQvxkl9R+kRIGU8A
    QOtrLPpt8coN8P+syoaTRR4pDJQG9OfJO1fON9OaOP8HwnEg/91ie6Cm+wQRxjwyat859uAcu89Xv
    6/mrcequkSp6kfiQN4goZ7vMYJYfBYuooslbTciaK4SYIfxdINyrrWGA6QhJPobdW0uuedRNY5jBG
    OdMbVmm7FTpxDJs51rB1PTIcFQ8W1oypcttqSgCjI+5eMVrabU/IoIxhX5F0Cn3zm7E9CHlaJuLt1
    CRXVbwdw==;
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=fake.com; [email protected]; q=dns/txt; s=s575655;
 t=1629812840; h=from : subject : to : message-id : date;
 bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=;
 b=TEeEsPNLf7Wi6b8aaxE6JvfymfBKYjLq7izcUVrOXTW7sGIznxOA5udhfmDh15Fgp6Qgh
 Kv5HX9uPNa8TEeoaJ+gV/4KERuscnc4GXEHwo0eclktx6f6JI5h1/q+qCe34+cN/EweaP5n
 iOs+nrzsRuWn/iQ0Yck+b4IXVWHoTW8298xmBNuC1JF4jIVXREJFAC0nACfGU03OlpjDXf/
 qvI6Ffnn5YGTNxgIkOdrtymaqOvjG9NM0PWtgSkvsTCJdUvxkrI+rRUG6ixiNi+vifqwvox
 aQ6BRnMmeNK7A954Dy9r9r09QzbTthsBsi+lORKH7DntBKhm7Rb5/Q9j0xVA==
Received: from [10.176.58.103] (helo=SmtpCorp) by smtpcorp.com with esmtpsa
 (TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256)
 (Exim 4.94.2-S2G) (envelope-from <[email protected]>)
 id 1mIWls-TRjyEC-AK for [email protected]; Tue, 24 Aug 2021 13:47:20 +0000
Received: from [10.86.20.232] (helo=DESKTOP-69OG2R3)
 by smtpcorp.com with esmtpsa (TLS1.2:ECDHE_RSA_SECP256R1__AES_256_GCM:256)
 (Exim 4.94.2-S2G) (envelope-from <[email protected]>)
 id 1mIWlr-9EFPsz-U0 for [email protected]; Tue, 24 Aug 2021 13:47:19 +0000
MIME-Version: 1.0
From: [email protected]
To: [email protected]
Date: 24 Aug 2021 14:46:30 +0100
Subject: Test Email 2xM9e5Dj
Content-Type: multipart/alternative;
 boundary=--boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Message-Id: <[email protected]>
X-Smtpcorp-Track: 1XmW_r9EFeszl0.JChXLDDjoy7xH
Feedback-ID: 575655m:575655aVI_MaS:575655sNpPp5WOdD
X-Report-Abuse: Please forward a copy of this message, including all headers,
 to <[email protected]>


----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

This is a text message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

This is a html message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e--


This is stored in a variable called $emailText

I'm trying to use a regex to take the From part out of the text

From: [email protected]

My regex isnt super strong, however my testing looks like this works: (?<=From: ).*.

But when I try and take the text out, it appears I can't get the regex to go through properly.

echo [[ $emailText =~ (?<=From: ).*. ]]

like image 532
MissCoder87 Avatar asked Aug 24 '21 14:08

MissCoder87


People also ask

How to use regex in Python?

To start using Regex in Python, you first need to import Python’s “re” module This post is divided into three sections, reviewing three simple functions to extract useful information from strings with examples. Regex’s findall () function is extremely useful as it returns a list of strings containing all matches.

Can I use regular expressions in Bash?

This article is for advanced users, who are already familiar with basic regular expressions in Bash. For an introduction to Bash regular expressions, see our Bash regular expressions for beginners with examples article instead. Another article which you may find interesting is Regular Expressions in Python. Ready to get started?

What is Bash variable in string?

In this topic, we are going to learn about Bash Variable in String. In the programming world, the variable is thought to be an advanced programming concept, where the programmer would use variable only when the value is not known to the code from the start. For example, if we write a program to calculate the sum of 10 & 20.

How to extract index-based substrings from a string in Linux?

The expr command is a member of the Coreutils package. Therefore, it’s available on all Linux systems. Further, expr has also a substr subcommand that we can use to extract index-based substrings easily: It’s worth mentioning that the expr command uses the 1-based index system.


5 Answers

bash regex doesn't support lookbehind or lookahead assertions.

It is much easier to use a non-regex approach using awk here:

awk -F ': ' '$1 == "From" {print $2}' <<< "$emailText"

[email protected]
like image 191
anubhava Avatar answered Oct 12 '22 10:10

anubhava


With bash:

[[ "$emailText" =~ From:\ ([^$'\n']*) ]] && echo "${BASH_REMATCH[1]}"

Output:

[email protected]
like image 41
Cyrus Avatar answered Oct 12 '22 12:10

Cyrus


With your shown samples, attempts; please try following awk code. Simple explanation would be, checking condition if 1st field is From: then print 2nd field of that line.

awk '$1=="From:"{print $2}' Input_file

2nd solution: In case you have only 1 entry of From: in whole file then try following, where we can use exit function to exit from Input_file after printing the matched line, to stop un-necessary reading of whole Input_file.

awk '$1=="From:"{print $2;exit}' Input_file
like image 3
RavinderSingh13 Avatar answered Oct 12 '22 12:10

RavinderSingh13


Assuming you only want the email terminus, here's a quick and dirty Awk script.

awk '/^$/ { exit 1 }
    /^From: .* <[^<>@]+@[^<>]+>/ {
        split($0, g, /[<>]/); print g[1]; exit }
    /^From: / { print $2; exit }' file.eml

This should work correctly for all these cases:

From: Real Name <[email protected]>
From: "Name, Real" <[email protected]>
From: [email protected]
From: [email protected] (Real Name)
From: =?q?utf-8?Real_N=A3=E4me?= <[email protected]>

As especially the last example should convince you, you will need significantly more work if you also need the full name of the correspondent in normalized form.

like image 2
tripleee Avatar answered Oct 12 '22 11:10

tripleee


If there should be a mail address present, you can match it first using awk (without the unsupported need for lookarounds)

awk 'match($0, /^From: [^[:space:]@]+@[^[:space:]@]+$/) {
  print $2
}' <<< "$emailText"

Output

[email protected]
like image 2
The fourth bird Avatar answered Oct 12 '22 12:10

The fourth bird