Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string to array of strings with 1-3 words depends on length

Tags:

javascript

I have following input string

Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia ...

Splitting rules by example

[
     "Lorem ipsum dolor",  // A: Tree words <6 letters  
     "sit amet",           // B: Two words <6 letters if next word >6 letters
     "consectetur",        // C: One word >=6 letters if next word >=6 letters
     "adipiscing elit",    // D: Two words: first >=6, second <6 letters
     "sed doeiusmod",      // E: Two words: firs<6, second >=6 letters
     "tempor"              // rule C
     "incididunt ut"       // rule D
     "Duis aute irure"     // rule A
     "dolor in"            // rule B
     "reprehenderit in"    // rule D
     "esse cillum"         // rule E
     "dolor eu fugia"      // rule D
     ...
]

So as you can see string in array can have min one and max tree words. I try to do it as follows but doesn't work - how to do it?

let s="Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

let a=[""];
s.split(' ').map(w=> {
  let line=a[a.length-1];
  let n= line=="" ? 0 : line.match(/ /g).length // num of words in line
  if(n<3) line+=w+' ';
  n++;
  if(n>=3) a[a.length-1]=line 
}); 

console.log(a);

UPDATE

Boundary conditions: if last words/word not match any rules then just add them as last array element (but two long words cannot be newer in one string)

SUMMARY AND INTERESTING CONCLUSIONS

We get 8 nice answer for this question, in some of them there was discussion about self-describing (or self-explainable) code. The self-describing code is when the person which not read the question is able to easy say what exactly code do after first look. Sadly any of answers presents such code - so this question is example which shows that self-describing is probably a myth

like image 586
Kamil Kiełczewski Avatar asked Jun 06 '19 21:06

Kamil Kiełczewski


2 Answers

You can express your rules as abbreviated regular expressions, build a real regex from them and apply it to your input:

text = "Lorem ipsum, dolor. sit amet? consectetur,   adipiscing,  elit! sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia bla?";

rules = ['(SSS)', '(SS(?=L))', '(L(?=L))', '(SL)', '(LS)', '(.+)']

regex = new RegExp(
    rules
        .join('|')
        .replace(/S/g, '\\w{1,5}\\W+')
        .replace(/L/g, '\\w{6,}\\W+')
    , 'g')

console.log(text.match(regex))

If the rules don't change, the regex construction part is only needed once.

Note that this also handles punctuation in a reasonable way.

like image 94
georg Avatar answered Nov 15 '22 22:11

georg


One option is to first create an array of rules, like:

const rules = [
  // [# of words to splice if all conditions met, condition for word1, condition for word2, condition for word3...]
  [3, 'less', 'less', 'less'],
  // the above means: splice 3 words if the next 3 words' lengths are <6, <6, <6
  [2, 'less', 'less', 'eqmore'],
  // the above means: splice 2 words if the next 3 words' lengths are <6, <6, >=6
  [1, 'eqmore', 'eqmore'],
  [2, 'eqmore', 'less'],
  [2, 'less', 'eqmore']
];

Then iterate through the array of rules, finding the rule that matches, extracting the appropriate number of words to splice from the matching rule, and push to the output array:

    const rules = [
      [3, 'less', 'less', 'less'],
      [2, 'less', 'less', 'eqmore'],
      [1, 'eqmore', 'eqmore'],
      [2, 'eqmore', 'less'],
      [2, 'less', 'eqmore']
    ];
const s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

const words = s.split(' ');
const output = [];
const verify = (cond, word) => cond === 'less' ? word.length < 6 : word.length >= 6;
while (words.length) {
  const [wordCount] = rules.find(
    ([wordCount, ...conds]) => conds.every((cond, i) => verify(cond, words[i]))
  );
  output.push(words.splice(0, wordCount).join(' '));
}
console.log(output);

Of course, the .find assumes that every input string will always have a matching rule for each position spliced.

For the additional rule that any words not matched by the previous rules just be added to the output, put [1] into the bottom of the rules array:

const rules = [
      [3, 'less', 'less', 'less'],
      [2, 'less', 'less', 'eqmore'],
      [1, 'eqmore', 'eqmore'],
      [2, 'eqmore', 'less'],
      [2, 'less', 'eqmore'],
      [1]
    ];
const s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

const words = s.split(' ');
const output = [];
const verify = (cond, word) => cond === 'less' ? word.length < 6 : word.length >= 6;
while (words.length) {
  const [wordCount] = rules.find(
    ([wordCount, ...conds]) => conds.every((cond, i) => words[i] && verify(cond, words[i]))
  );
  output.push(words.splice(0, wordCount).join(' '));
}
console.log(output);
like image 44
CertainPerformance Avatar answered Nov 15 '22 22:11

CertainPerformance