Regex to match all until blank line in Javascript

Question

What I am trying to accomplish is matching all text as well as blank lines in a file until it finds a completely blank line. The text itself looks something like this:

===Substantiv===

Det var en gång en liten höna som gick på bio, fast det visste hon inte först. Alltså visste hon inte. Fast ändå var det ganska roligt för henne.

==Annat==

Trots att det var roligt var det inte det.

What I would like to match is everything from "===Substantiv===" to the blank line just above "==Annat==". Since there happen to be more lines with three equal signs, I would also like the code to be somewhat easy to change to another word rather than "===Substantiv===".

What I have tried so far, using regex, is something like:

===Adjektiv(.|
)+

But as you can probably tell from the structure of that, there is no absolute way to end it upon locating a blank line, since that will go on forever until the very end of the text I would like to match.

Best regards,

Wiktor Stribiżew · Accepted Answer

You may use

/===Substantiv===(.*(?:
?
(?!
?
).*)*)/g
                 ^^^^^^^^^^^^^^^^^^^^^^^^

See the regex demo, your value is inside Group 1. You may trim it after a match is found.

The .*(?: ? (?! ? ).*)* part captures into Group 1 any zero or more chars other than line break chars (.*), then zero or more occurrences (due to (?:...)*) of a line break sequence ( or - see ?) that is not followed with another line break sequence (see the negative lookahead (?! ? )) and then any 0+ chars other than line break symbols.

Note it is an unrolled variant of

/===Substantiv===([\s\S]*?)(?=(?:
?
){2}|$)/g

which is slower than the above pattern, but looks a bit more readable. See the regex demo. Here, ([\s\S]*?) captures any 0+ chars as few as possible up to the first double line break ((?: ? ){2}) or the end of string ($).

If by a blank line you mean a line that may contain tabs, spaces, etc. you may use

/===Substantiv===(.*(?:
?
(?!\s*
?
).*)*)/g
                               ^^^

or

/===Substantiv===(.*(?:
?
(?![^\S
]*
?
).*)*)/g
                               ^^^^^^^^^^

See another demo

JS demo:

var regex = /===Substantiv===(.*(?:
?
(?!\s*
?
).*)*)/g;
var str = "===Substantiv===
Det var en gång en liten höna som gick på bio, fast det visste hon inte först.
Alltså visste hon inte.
Fast ändå var det ganska roligt för henne.

  
==Annat==
Trots att det var roligt var det inte det.";
var res = [], m;
while ((m = regex.exec(str)) !== null) {
   res.push(m[1].trim());
}
console.log(res);
// Getting all but the matches above
var regex = /===Substantiv===.*(?:
?
(?!\s*
?
).*)*/;
console.log(str.split(regex).filter(Boolean));

Another idea to get all Substantivs substrings: splitting with a line break and filter the matches:

var regex = /
?
\s*
?
/;
var str = "===Substantiv===
Det var en gång en liten höna som gick på bio, fast det visste hon inte först.
Alltså visste hon inte.
Fast ändå var det ganska roligt för henne.

  
==Annat==
Trots att det var roligt var det inte det.

===Substantiv===
Another substantive";
var res = str.split(regex).filter(function (m) {return m.startsWith("===Substantiv===");}).map(function (x) {return x.substr(16).trim();});
console.log(res);

Regex to match all until blank line in Javascript

Tags:

javascript

regex

D. Ataro

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

Regex to match all until blank line in Javascript

Tags:

javascript

regex

D. Ataro

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us