Suppose I've a long string containing newlines and tabs as:
var x = "This is a long string.\n\t This is another one on next line.";
So how can we split this string into tokens, using regular expression?
I don't want to use .split(' ')
because I want to learn Javascript's Regex.
A more complicated string could be this:
var y = "This @is a #long $string. Alright, lets split this.";
Now I want to extract only the valid words out of this string, without special characters, and punctuation, i.e I want these:
var xwords = ["This", "is", "a", "long", "string", "This", "is", "another", "one", "on", "next", "line"];
var ywords = ["This", "is", "a", "long", "string", "Alright", "lets", "split", "this"];
Here is a jsfiddle example of what you asked: http://jsfiddle.net/ayezutov/BjXw5/1/
Basically, the code is very simple:
var y = "This @is a #long $string. Alright, lets split this.";
var regex = /[^\s]+/g; // This is "multiple not space characters, which should be searched not once in string"
var match = y.match(regex);
for (var i = 0; i<match.length; i++)
{
document.write(match[i]);
document.write('<br>');
}
UPDATE: Basically you can expand the list of separator characters: http://jsfiddle.net/ayezutov/BjXw5/2/
var regex = /[^\s\.,!?]+/g;
UPDATE 2: Only letters all the time: http://jsfiddle.net/ayezutov/BjXw5/3/
var regex = /\w+/g;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With