Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

javascript remove words less than 3 characters

I am tired to remove all the words less than 3 characters, like in, on ,the....

My code not work for me, Uncaught TypeError: Object ... has no method 'replace' ask for a help.

var str = 'Proin néc turpis eget dolor dictǔm lacínia. Nullam nǔnc magna, tincidunt eǔ porta in, faucibus sèd magna. Suspendisse laoreet ornare ullamcorper. Nulla in tortòr nibh. Pellentesque sèd est vitae odio vestibulum aliquet in nec leo.';
var newstr = str.split(" ").replace(/(\b(\w{1,3})\b(\s|$))/g,'');
alert(newstr);
like image 431
cj333 Avatar asked Dec 04 '22 07:12

cj333


2 Answers

You need to change the order of split and replace:

var newstr = str.replace(/(\b(\w{1,3})\b(\s|$))/g,'').split(" ");

Otherwise, you end up calling replace on an array, which does not have this method.

See it in action.

Note: Your current regex does not correctly handle the case where a "short" word is immediately followed by a punctuation character. You can change it slightly to do that:

/(\b(\w{1,3})\b(\W|$))/g
                ^^

Apart from that, you also have to take care of the fact that the resulting array may contain empty strings (because deleting consecutive short words separated by spaces will end up leaving consecutive spaces in the string before it's split). So you might also want to change how you split. All of this gives us:

var newstr = str.replace(/(\b(\w{1,3})\b(\W|$))/g,'').split(/\s+/);

See it in action.

Update: As Ray Toal correctly points out in a comment, in JavaScript regexes \w does not match non-ASCII characters (e.g. characters with accents). This means that the above regexes will not work correctly (they will work correctly on certain other flavors of regex). Unfortunately, there is no convenient way around that and you will have to replace \w with a character group such as [a-zA-Zéǔí], and do the converse for \W.

Update:

Ugh, doing this in JavaScript regex is not easy. I came up with this regex:

([^ǔa-z\u00C0-\u017E]([ǔa-z\u00C0-\u017E]{1,3})(?=[^ǔa-z\u00C0-\u017E]|$))

...which I still don't like because I had to manually include the ǔ in there.

See it in action.

like image 185
Jon Avatar answered Dec 05 '22 20:12

Jon


Try this:

str = str.split( ' ' ).filter(function ( str ) {
    var word = str.match(/(\w+)/);
    return word && word[0].length > 3;
}).join( ' ' );

Live demo: http://jsfiddle.net/sTfEs/1/

like image 44
Šime Vidas Avatar answered Dec 05 '22 21:12

Šime Vidas