...a file containing paragraphs, splitted by 2 newlines \r\n\r\n
or \n\n
. The paraghraphs themselves may contain single newlines \r\n
or \n
. The goal is to use a Bash one-liner to match only the first paragraph and to print it to stdout.
$ cat foo.txt
Foo
* Bar
Baz
* Foobar
Even more stuff to match here.
results in:
$ cat foo.txt | <some-command>
Foo
* Bar
...this regex (?s)(.+?)(\r?\n){2}|.+?$
with grep
using
The first two approaches resulted in:
$ grep -Poz '(?s)(.+?)(\r?\n){2}|.+?$' foo.txt
Foo
* Bar
Baz
* Foobar
The approach on Mac failed, due to differences between BSD grep and GNU grep.
... on regex101.com this regex works on foo.txt: https://regex101.com/r/uoej8O/1. This may be due to disabling the global
flag?
This is a tailor-made problem for gnu awk
by using a custom record separator. We can use a custom RS
that breaks file data by 2 or more of an optional \r
followed by \n
:
awk -v RS='(\r?\n){2,}' 'NR == 1' file
This outputs:
Foo
* Bar
If you want awk
to be more efficient when input is very big:
awk -v RS='(\r?\n){2,}' '{print; exit}' file
For GNU awk if the paragraphs are separated by \r\n\r\n
or \n\n
:
$ awk -v RS="\r?\n\r?\n" '{print $0;exit}' file
Output:
Foo
* Bar
You can use a GNU grep
like this:
grep -Poz '(?s)^.+?(?=\R{2}|$)' file
See the PCRE regex demo.
Details
(?s)
- a DOTALL inline modifier that makes .
match all chars including linebreak chars^
- start of the whole string.+?
- any 1 or more chars, as few as possible(?=\R{2}|$)
- a positive lookahead that matches a location immediately followed with a double line break sequence (\R{2}
) or end of string ($
).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With