...a file containing paragraphs, splitted by 2 newlines \r\n\r\n or \n\n. The paraghraphs themselves may contain single newlines \r\n or \n. The goal is to use a Bash one-liner to match only the first paragraph and to print it to stdout.
$ cat foo.txt
Foo
* Bar
Baz
* Foobar
Even more stuff to match here.
results in:
$ cat foo.txt | <some-command>
Foo
* Bar
...this regex (?s)(.+?)(\r?\n){2}|.+?$ with grep using
The first two approaches resulted in:
$ grep -Poz '(?s)(.+?)(\r?\n){2}|.+?$' foo.txt
Foo
* Bar
Baz
* Foobar
The approach on Mac failed, due to differences between BSD grep and GNU grep.
... on regex101.com this regex works on foo.txt: https://regex101.com/r/uoej8O/1. This may be due to disabling the global flag?
This is a tailor-made problem for gnu awk by using a custom record separator. We can use a custom RS that breaks file data by 2 or more of an optional \r followed by \n:
awk -v RS='(\r?\n){2,}' 'NR == 1' file
This outputs:
Foo
* Bar
If you want awk to be more efficient when input is very big:
awk -v RS='(\r?\n){2,}' '{print; exit}' file
For GNU awk if the paragraphs are separated by \r\n\r\n or \n\n:
$ awk -v RS="\r?\n\r?\n" '{print $0;exit}' file
Output:
Foo
* Bar
You can use a GNU grep like this:
grep -Poz '(?s)^.+?(?=\R{2}|$)' file
See the PCRE regex demo.
Details
(?s) - a DOTALL inline modifier that makes . match all chars including linebreak chars^ - start of the whole string.+? - any 1 or more chars, as few as possible(?=\R{2}|$) - a positive lookahead that matches a location immediately followed with a double line break sequence (\R{2}) or end of string ($).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With