Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript (jQuery) remove last sentence of long text

I'm looking for a javascript function that is smart enough to remove the last sentence of a long chunk of text (one paragraph actually). Some example text to show the complexity:

<p>Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane."</p>

Now I could split on . and remove the last entry of the array but that would not work for sentences ending with ? or ! and some sentences end with quotes like something: "stuff."

function removeLastSentence(text) {
  sWithoutLastSentence = ...; // ??
  return sWithoutLastSentence;
}

How to do this? What's a proper algorithm?

Edit - By long text I mean all the content in my paragraph and by sentence I mean an actual sentence (not a line), so in my example the last sentence is: He later described it as: "Something insane." When that one is removed, the next one is She did not know, "I think we should move past the fence!", she quickly said."

like image 881
sougonde Avatar asked Sep 23 '11 15:09

sougonde


2 Answers

Define your rules: // 1. A sentence Starts with a Capital letter // 2. A sentence is preceded by nothing or [.!?], but not [,:;] // 3. A sentence may be preceded by quotes if not formatted properly, such as ["'] // 4. A sentence may be incorrectly in this case if the word following a quote is a Name

Any additional Rules?

Define your Purpose: // 1. Remove the last sentence

Assumptions: If you started from the last character in the string of text and worked backwards, then you'd identify the beginning of the sentence as: 1. The string of text before the character is [.?!] OR 2. The string of text before the character is ["'] and preceded by a Capital letter 3. Every [.] is preceded by a space 4. We aren't correcting for html tags 5. These assumptions are not robust and will need to be adapted regularly

Possible Solution: Read in your string and split it on the space character to give us chunks of strings to review in reverse.

var characterGroups = $('#this-paragraph').html().split(' ').reverse();

If your string is:

Blabla, some more text here. Sometimes basic html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane."

var originalString = 'Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane."';

Then your array in characterGroups would be:

    ["insane."", ""Something", "as:", "it", "described", "later", "He",
 "said.", "quickly", "she", "fence!",", "the", "past", "move", "should", "we",
 "think", ""I", "know,", "not", "did", "She", "there?"", "up", "doing", "it",
 "is", ""What", "mind:", "to", "came", "that", "thing", "first", "the", "asked",
 "I", "over.", "flying", "plane", "a", "saw", "I", "and", "window", "the", "up",
 "looked", "I", "harder!", "any", "sentence", "the", "of", ""selection"", "the",
 "make", "not", "should", "that", "but", "used", "is", "code", "html", "basic",
 "Sometimes", "here.", "text", "more", "some", "Blabla,"]

Note: the '' tags and others would be removed using the .text() method in jQuery

Each block is followed by a space, so when we have identified our sentence start position (by array index) we'll know what index the space had and we can split the original string in the location where the space occupies that index from the end of the sentence.

Give ourselves a variable to mark if we've found it or not and a variable to hold the index position of the array element we identify as holding the start of the last sentence:

var found = false;
var index = null;

Loop through the array and look for any element ending in [.!?] OR ending in " where the previous element started with a capital letter.

var position     = 1,//skip the first one since we know that's the end anyway
    elements     = characterGroups.length,
    element      = null,
    prevHadUpper = false,
    last         = null;

while(!found && position < elements) {
    element = characterGroups[position].split('');

    if(element.length > 0) {
       last = element[element.length-1];

       // test last character rule
       if(
          last=='.'                      // ends in '.'
          || last=='!'                   // ends in '!'
          || last=='?'                   // ends in '?'
          || (last=='"' && prevHadUpper) // ends in '"' and previous started [A-Z]
       ) {
          found = true;
          index = position-1;
          lookFor = last+' '+characterGroups[position-1];
       } else {
          if(element[0] == element[0].toUpperCase()) {
             prevHadUpper = true;
          } else {
             prevHadUpper = false;
          }
       }
    } else {
       prevHadUpper = false;
    }
    position++;
}

If you run the above script it will correctly identify 'He' as the start of the last sentence.

console.log(characterGroups[index]); // He at index=6

Now you can run through the string you had before:

var trimPosition = originalString.lastIndexOf(lookFor)+1;
var updatedString = originalString.substr(0,trimPosition);
console.log(updatedString);

// Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said.

Run it again and get: Blabla, some more text here. Sometimes basic html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?"

Run it again and get: Blabla, some more text here. Sometimes basic html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over.

Run it again and get: Blabla, some more text here. Sometimes basic html code is used but that should not make the "selection" of the sentence any harder!

Run it again and get: Blabla, some more text here.

Run it again and get: Blabla, some more text here.

So, I think this matches what you're looking for?

As a function:

function trimSentence(string){
    var found = false;
    var index = null;

    var characterGroups = string.split(' ').reverse();

    var position     = 1,//skip the first one since we know that's the end anyway
        elements     = characterGroups.length,
        element      = null,
        prevHadUpper = false,
        last         = null,
        lookFor      = '';

    while(!found && position < elements) {
        element = characterGroups[position].split('');

        if(element.length > 0) {
           last = element[element.length-1];

           // test last character rule
           if(
              last=='.' ||                // ends in '.'
              last=='!' ||                // ends in '!'
              last=='?' ||                // ends in '?'
              (last=='"' && prevHadUpper) // ends in '"' and previous started [A-Z]
           ) {
              found = true;
              index = position-1;
              lookFor = last+' '+characterGroups[position-1];
           } else {
              if(element[0] == element[0].toUpperCase()) {
                 prevHadUpper = true;
              } else {
                 prevHadUpper = false;
              }
           }
        } else {
           prevHadUpper = false;
        }
        position++;
    }


    var trimPosition = string.lastIndexOf(lookFor)+1;
    return string.substr(0,trimPosition);
}

It's trivial to make a plugin for it if, but beware the ASSUMPTIONS! :)

Does this help?

Thanks, AE

like image 186
MyStream Avatar answered Sep 30 '22 16:09

MyStream


This ought to do it.

/*
Assumptions:
- Sentence separators are a combination of terminators (.!?) + doublequote (optional) + spaces + capital letter. 
- I haven't preserved tags if it gets down to removing the last sentence. 
*/
function removeLastSentence(text) {

    lastSeparator = Math.max(
        text.lastIndexOf("."), 
        text.lastIndexOf("!"), 
        text.lastIndexOf("?")
    );

    revtext = text.split('').reverse().join('');
    sep = revtext.search(/[A-Z]\s+(\")?[\.\!\?]/); 
    lastTag = text.length-revtext.search(/\/\</) - 2;

    lastPtr = (lastTag > lastSeparator) ? lastTag : text.length;

    if (sep > -1) {
        text1 = revtext.substring(sep+1, revtext.length).trim().split('').reverse().join('');
        text2 = text.substring(lastPtr, text.length).replace(/['"]/g,'').trim();

        sWithoutLastSentence = text1 + text2;
    } else {
        sWithoutLastSentence = '';
    }
    return sWithoutLastSentence;
}

/*
TESTS: 

var text = '<p>Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the text any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane. "</p>';

alert(text + '\n\n' + removeLastSentence(text));
alert(text + '\n\n' + removeLastSentence(removeLastSentence(text)));
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(text))));
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(text)))));
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(text))))));
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(text)))))));
alert(text + '\n\n' + removeLastSentence('<p>Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the text any harder! I looked up the '));
*/
like image 42
Poojan Avatar answered Sep 30 '22 15:09

Poojan