I have following input string
Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia ...
Splitting rules by example
[
"Lorem ipsum dolor", // A: Tree words <6 letters
"sit amet", // B: Two words <6 letters if next word >6 letters
"consectetur", // C: One word >=6 letters if next word >=6 letters
"adipiscing elit", // D: Two words: first >=6, second <6 letters
"sed doeiusmod", // E: Two words: firs<6, second >=6 letters
"tempor" // rule C
"incididunt ut" // rule D
"Duis aute irure" // rule A
"dolor in" // rule B
"reprehenderit in" // rule D
"esse cillum" // rule E
"dolor eu fugia" // rule D
...
]
So as you can see string in array can have min one and max tree words. I try to do it as follows but doesn't work - how to do it?
let s="Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";
let a=[""];
s.split(' ').map(w=> {
let line=a[a.length-1];
let n= line=="" ? 0 : line.match(/ /g).length // num of words in line
if(n<3) line+=w+' ';
n++;
if(n>=3) a[a.length-1]=line
});
console.log(a);
UPDATE
Boundary conditions: if last words/word not match any rules then just add them as last array element (but two long words cannot be newer in one string)
SUMMARY AND INTERESTING CONCLUSIONS
We get 8 nice answer for this question, in some of them there was discussion about self-describing (or self-explainable) code. The self-describing code is when the person which not read the question is able to easy say what exactly code do after first look. Sadly any of answers presents such code - so this question is example which shows that self-describing is probably a myth
You can express your rules as abbreviated regular expressions, build a real regex from them and apply it to your input:
text = "Lorem ipsum, dolor. sit amet? consectetur, adipiscing, elit! sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia bla?";
rules = ['(SSS)', '(SS(?=L))', '(L(?=L))', '(SL)', '(LS)', '(.+)']
regex = new RegExp(
rules
.join('|')
.replace(/S/g, '\\w{1,5}\\W+')
.replace(/L/g, '\\w{6,}\\W+')
, 'g')
console.log(text.match(regex))
If the rules don't change, the regex construction part is only needed once.
Note that this also handles punctuation in a reasonable way.
One option is to first create an array of rules, like:
const rules = [
// [# of words to splice if all conditions met, condition for word1, condition for word2, condition for word3...]
[3, 'less', 'less', 'less'],
// the above means: splice 3 words if the next 3 words' lengths are <6, <6, <6
[2, 'less', 'less', 'eqmore'],
// the above means: splice 2 words if the next 3 words' lengths are <6, <6, >=6
[1, 'eqmore', 'eqmore'],
[2, 'eqmore', 'less'],
[2, 'less', 'eqmore']
];
Then iterate through the array of rules, finding the rule that matches, extracting the appropriate number of words to splice from the matching rule, and push to the output array:
const rules = [
[3, 'less', 'less', 'less'],
[2, 'less', 'less', 'eqmore'],
[1, 'eqmore', 'eqmore'],
[2, 'eqmore', 'less'],
[2, 'less', 'eqmore']
];
const s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";
const words = s.split(' ');
const output = [];
const verify = (cond, word) => cond === 'less' ? word.length < 6 : word.length >= 6;
while (words.length) {
const [wordCount] = rules.find(
([wordCount, ...conds]) => conds.every((cond, i) => verify(cond, words[i]))
);
output.push(words.splice(0, wordCount).join(' '));
}
console.log(output);
Of course, the .find
assumes that every input string will always have a matching rule for each position spliced.
For the additional rule that any words not matched by the previous rules just be added to the output, put [1]
into the bottom of the rules
array:
const rules = [
[3, 'less', 'less', 'less'],
[2, 'less', 'less', 'eqmore'],
[1, 'eqmore', 'eqmore'],
[2, 'eqmore', 'less'],
[2, 'less', 'eqmore'],
[1]
];
const s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";
const words = s.split(' ');
const output = [];
const verify = (cond, word) => cond === 'less' ? word.length < 6 : word.length >= 6;
while (words.length) {
const [wordCount] = rules.find(
([wordCount, ...conds]) => conds.every((cond, i) => words[i] && verify(cond, words[i]))
);
output.push(words.splice(0, wordCount).join(' '));
}
console.log(output);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With