Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a string into an array of words, punctuation and spaces in JavaScript

I have a string which I'd like to split into items contained in an array as the following example:

var text = "I like grumpy cats. Do you?"

// to result in:

var wordArray = ["I", " ", "like", " ", "grumpy", " ", "cats", ".", "  ", "Do", " ", "you", "?" ]

I've tried the following expression (and a similar varieties without success

var wordArray = text.split(/(\S+|\W)/)
//this disregards spaces and doesn't separate punctuation from words

In Ruby there's a Regex operator (\b) that splits at any word boundary preserving spaces and punctuation but I can't find a similar for Java Script. Would appreciate your help.

like image 656
alopez02 Avatar asked Nov 30 '16 06:11

alopez02


People also ask

How do you split a string into an array in Javascript?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.

How do you split a string with spaces?

To split a string with space as delimiter in Java, call split() method on the string object, with space " " passed as argument to the split() method. The method returns a String Array with the splits as elements in the array.

How do I split a string into multiple parts?

Answer: You just have to pass (“”) in the regEx section of the Java Split() method. This will split the entire String into individual characters.

Which method can you use character other than comma to separate values from array?

In that case, the split() method returns an array with the entire string as an element. In the example below, the message string doesn't have a comma (,) character.


2 Answers

Use String#match method with regex /\w+|\s+|[^\s\w]+/g.

  1. \w+ - for any word match
  2. \s+ - for whitespace
  3. [^\s\w]+ - for matching combination of anything other than whitespace and word character.

var text = "I like grumpy cats. Do you?";

console.log(
  text.match(/\w+|\s+|[^\s\w]+/g)
)

Regex explanation here


FYI : If you just want to match single special char then you can use \W or . instead of [^\s\w]+.

like image 53
Pranav C Balan Avatar answered Nov 14 '22 21:11

Pranav C Balan


The word boundary \b should work fine.

Example

"I like grumpy cats. Do you?".split(/\b/)
// ["I", " ", "like", " ", "grumpy", " ", "cats", ". ", "Do", " ", "you", "?"]

Edit

To handle the case of ., we can split it on [.\s] as well

Example

"I like grumpy cats. Do you?".split(/(?=[.\s]|\b)/)
// ["I", " ", "like", " ", "grumpy", " ", "cats", ".", " ", "Do", " ", "you", "?"]
  • (?=[.\s] Positive look ahead, splits just before . or \s
like image 30
nu11p01n73R Avatar answered Nov 14 '22 22:11

nu11p01n73R