What I am trying to accomplish is matching all text as well as blank lines in a file until it finds a completely blank line. The text itself looks something like this:
===Substantiv===
Det var en gång en liten höna som gick på bio, fast det visste hon inte först. Alltså visste hon inte. Fast ändå var det ganska roligt för henne.
==Annat==
Trots att det var roligt var det inte det.
What I would like to match is everything from "===Substantiv===" to the blank line just above "==Annat==". Since there happen to be more lines with three equal signs, I would also like the code to be somewhat easy to change to another word rather than "===Substantiv===".
What I have tried so far, using regex, is something like:
===Adjektiv(.|\n)+
But as you can probably tell from the structure of that, there is no absolute way to end it upon locating a blank line, since that will go on forever until the very end of the text I would like to match.
Best regards,
You may use
/===Substantiv===(.*(?:\r?\n(?!\r?\n).*)*)/g
^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo, your value is inside Group 1. You may trim it after a match is found.
The .*(?:\r?\n(?!\r?\n).*)* part captures into Group 1 any zero or more chars other than line break chars (.*), then zero or more occurrences (due to (?:...)*) of a line break sequence (\r\n or \n - see \r?\n) that is not followed with another line break sequence (see the negative lookahead (?!\r?\n)) and then any 0+ chars other than line break symbols.
Note it is an unrolled variant of
/===Substantiv===([\s\S]*?)(?=(?:\r?\n){2}|$)/g
which is slower than the above pattern, but looks a bit more readable. See the regex demo. Here, ([\s\S]*?) captures any 0+ chars as few as possible up to the first double line break ((?:\r?\n){2}) or the end of string ($).
If by a blank line you mean a line that may contain tabs, spaces, etc. you may use
/===Substantiv===(.*(?:\r?\n(?!\s*\r?\n).*)*)/g
^^^
or
/===Substantiv===(.*(?:\r?\n(?![^\S\r\n]*\r?\n).*)*)/g
^^^^^^^^^^
See another demo
JS demo:
var regex = /===Substantiv===(.*(?:\r?\n(?!\s*\r?\n).*)*)/g;
var str = "===Substantiv===\nDet var en gång en liten höna som gick på bio, fast det visste hon inte först.\nAlltså visste hon inte.\nFast ändå var det ganska roligt för henne.\n\n \n==Annat==\nTrots att det var roligt var det inte det.";
var res = [], m;
while ((m = regex.exec(str)) !== null) {
res.push(m[1].trim());
}
console.log(res);
// Getting all but the matches above
var regex = /===Substantiv===.*(?:\r?\n(?!\s*\r?\n).*)*/;
console.log(str.split(regex).filter(Boolean));
Another idea to get all Substantivs substrings: splitting with a line break and filter the matches:
var regex = /\r?\n\s*\r?\n/;
var str = "===Substantiv===\nDet var en gång en liten höna som gick på bio, fast det visste hon inte först.\nAlltså visste hon inte.\nFast ändå var det ganska roligt för henne.\n\n \n==Annat==\nTrots att det var roligt var det inte det.\n\n===Substantiv===\nAnother substantive";
var res = str.split(regex).filter(function (m) {return m.startsWith("===Substantiv===");}).map(function (x) {return x.substr(16).trim();});
console.log(res);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With