Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to process an array of sentences to return another array with longest possible sentences below x characters?

Tags:

javascript

I have an array of sentences of varying lengths. Let's assume it looks like this:

sentences = [
   "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts."
   "I never thought that would happen!",
   "This one?",
   "No, no, that one.",
   "Okay but please ensure your sentences are long enough to be split when longer than 100 characters, although some could be too short as well.",
   "This is also a random text like all others",
]

What I need is to build another array of sentences based on the first one, with each element as big as possible but under 100 characters. On the contrary, sentences longer than 100 characters should be split into smaller chunks. So, if there are 5 sentences in the original array with the following lengths:

[0] => 150
[1] => 10
[2] => 35
[3] => 5
[4] => 70

Then the new array should have the following element lengths:

[0] => 100 // Split since longer than 100 chars
[1] => 100 // 50 carried forward from [0] + 10 + 35 + 5
[2] => 70

Please note that I do not want to split words in the process.

I have tried something like the following:

let para = [];

let index = 0;
let i = 0;
while(nsentences[i]) {
  let bigsentence = nsentences[i];
  let x = i + 1;

  let bs = bigsentence + ' ' + nsentences[x];
  console.log(bs);
  while(bs.length < 140){
    console.log(bs);

  }


  while(x) {
    let bs = bigsentence + ' ' + nsentences[x];
    if(bs.length < 100) {
      bigsentence += ' ' + nsentences[x];
      x++;
      i += x;
    } else {
      para.push(bigsentence);
      break;
    }
  }
}

But as you'd expect, it doesn't work. The snippet just returns an infinite loop of the first two sentences concatenated!

like image 462
TheLearner Avatar asked Aug 14 '19 09:08

TheLearner


3 Answers

Join the array of sentences by spaces, then match up to 100 characters with a regular expression, and end at a position followed by a space (or the end of the string), to ensure that the last character matched is at the end of a word:

const sentences = [
   "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts.",
   "I never thought that would happen!",
   "This one?",
   "No, no, that one.",
   "Okay but please ensure your sentences are long enough to be split when longer than 100 characters, although some could be too short as well.",
   "This is also a random text like all others",
];

const words = sentences.join(' ');
const output = words.match(/\S.{1,99}(?= |$)/g);
console.log(output);

The \S at the beginning of the pattern is there to ensure that the first character matched is not a space.

like image 189
CertainPerformance Avatar answered Oct 13 '22 06:10

CertainPerformance


Here is a slightly different approach relying on a function generator.

Since I didn't exactly understand how limited your output is, this solution is:

  • Acquiring a single string joined by empty spaces.
  • Splitting said string by empty spaces.
  • Yielding a sentence that has length <= 100, trying to be as close a possible to 100.
  • Continues until the string is finished.

It probably can be reviewed to increase quality and performances, though it still should do the job correctly. The below code will generate an array of 99,95,96 and 70 elements.

const sentences = [
   "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts.",
   "I never thought that would happen!",
   "This one?",
   "No, no, that one.",
   "Okay but please ensure your sentences are long enough to be split when longer than 100 characters, although some could be too short as well.",
   "This is also a random text like all others",
];

function* splitToLength(arr, length) {
  // Join the original array of strings and split it by spaces.
  let str = arr.join(' ').split(' ');
  let strlength = 0, acc = []; // Define a string length counter and an accumulator.
  for (let word of str) { // Iterate each word.
    if ((word.length + strlength + 1) <= length) acc.push(word), strlength += word.length + 1; // if the length of the looped word increased by one (empty space) is lower than the desired length, then accumulate the word and increase the counter by the lenght of the word plus one (empty space).
    else {
      yield acc.join(' '); // Otherwise, yield the current sentence.
      acc = [word]; // And reset the accumulator with just the current word.
      strlength = word.length + 1; // and reset the length counter to the current word length plus one (empty space).
    }
  }
  if (acc.length > 0) yield acc.join(' '); // finally, if the last sentence is not yet yield, do that.
}

const res = [...splitToLength(sentences, 100)];
console.log(res);
console.log(res.map(i => i.length));
like image 23
briosheje Avatar answered Oct 13 '22 04:10

briosheje


I've done this using simple loops. The algorhythm works as follows.

  1. Create an array of all words
  2. Take each word ensuring that the limit won't be reached
  3. Create a new line when this limit is reached
  4. Return the lines when there aren't words left

const sentences = [
   "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts.",
   "I never thought that would happen!",   
   "This one?",   
   "No, no, that one.",
   "Okay but please ensure your sentences are long enough to be split when longer than 100 characters, although some could be too short as well.",
   "This is also a random text like all others"
];

const lengths = sentences => sentences.map(s => s.length); 

const words = sentences.join(' ').split(' ');

const takeWords = (charlimit,words) => {
  var currlinelength, lines = [], j=0;
  for(let i = 0;  ; i++){
    currlinelength = 0;
    lines[i] = "";
    while(true){
      if (j >= words.length) {
        //remove last space
        return lines.map(l => l.trim());
      }
      if ((currlinelength + words[j].length) > charlimit){
        break;
      }
      lines[i] += words[j] + " ";
      currlinelength += 1 + words[j].length; 
      j++;
    }
    
  }
};

console.log(lengths(sentences));
result = takeWords(100, words);
console.log(result);
console.log(lengths(result));
// output
[
  "Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live",
  "the blind texts. I never thought that would happen! This one? No, no, that one. Okay but please",
  "ensure your sentences are long enough to be split when longer than 100 characters, although some",
  "could be too short as well. This is also a random text like all others"
]
// length of each sentence
[
  99,
  95,
  96,
  70
]
like image 39
David Lemon Avatar answered Oct 13 '22 04:10

David Lemon