Match only the first paragraph using bash

Question

We have

...a file containing paragraphs, splitted by 2 newlines or . The paraghraphs themselves may contain single newlines or . The goal is to use a Bash one-liner to match only the first paragraph and to print it to stdout.

E.G.:

$ cat foo.txt
Foo
* Bar

Baz
* Foobar

Even more stuff to match here.

results in:

$ cat foo.txt | <some-command>
Foo
* Bar

I've already tried

...this regex (?s)(.+?)( ? ){2}|.+?$ with grep using

GIT Bash on Windows (GNU grep 3.1),
Bash on Lubuntu 20.4.1 LTS (GNU grep 3.4) and
iTerm+Fish on Mac (BSD grep 2.5.1-FreeBSD).

The first two approaches resulted in:

$ grep -Poz '(?s)(.+?)(
?
){2}|.+?$' foo.txt
Foo                                                                                                                          
* Bar

Baz                                                                                                                          
* Foobar

The approach on Mac failed, due to differences between BSD grep and GNU grep.

But

... on regex101.com this regex works on foo.txt: https://regex101.com/r/uoej8O/1. This may be due to disabling the global flag?

anubhava · Accepted Answer

This is a tailor-made problem for gnu awk by using a custom record separator. We can use a custom RS that breaks file data by 2 or more of an optional followed by :

awk -v RS='(
?
){2,}' 'NR == 1' file

This outputs:

Foo
* Bar

If you want awk to be more efficient when input is very big:

awk -v RS='(
?
){2,}' '{print; exit}' file

James Brown · Answer

For GNU awk if the paragraphs are separated by or :

$ awk -v RS="
?

?
" '{print $0;exit}' file

Output:

Foo
* Bar

Wiktor Stribiżew · Answer

You can use a GNU grep like this:

grep -Poz '(?s)^.+?(?=\R{2}|$)' file

See the PCRE regex demo.

Details

(?s) - a DOTALL inline modifier that makes . match all chars including linebreak chars
^ - start of the whole string
.+? - any 1 or more chars, as few as possible
(?=\R{2}|$) - a positive lookahead that matches a location immediately followed with a double line break sequence (\R{2}) or end of string ($).

Match only the first paragraph using bash

Tags:

regex

grep

bash

sed

awk

We have

E.G.:

I've already tried

But

trilloyd

3 Answers

anubhava

James Brown

Wiktor Stribiżew

Recent Activity

Donate For Us

Match only the first paragraph using bash

Tags:

regex

grep

bash

sed

awk

We have

E.G.:

I've already tried

But

trilloyd

3 Answers

anubhava

James Brown

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us