I need a tokenizer that given a string with arbitrary white-space among words will create an array of words without empty sub-strings.
For example, given a string:
" I dont know what you mean by glory Alice said."
I use:
str2.split(" ")
This also returns empty sub-strings:
["", "I", "dont", "know", "what", "you", "mean", "by", "glory", "", "Alice", "said."]
How to filter out empty strings from an array?
You can split a String by whitespaces or tabs in Java by using the split() method of java. lang. String class. This method accepts a regular expression and you can pass a regex matching with whitespace to split the String where words are separated by spaces.
To split a string keeping the whitespace, call the split() method passing it the following regular expression - /(\s+)/ . The regular expression uses a capturing group to preserve the whitespace when splitting the string.
To split a string by space or comma, pass the following regular expression to the split() method - /[, ]+/ . The method will split the string on each occurrence of a space or comma and return an array containing the substrings.
str. split() method is used to split the given string into array of strings by separating it into substrings using a specified separator provided in the argument. separator: It is used to specified the character, or the regular expression, to use for splitting the string.
str.match(/\S+/g)
returns a list of non-space sequences ["I", "dont", "know", "what", "you", "mean", "by", "glory", "Alice", "said."]
(note that this includes the dot in "said.")
str.match(/\w+/g)
returns a list of all words: ["I", "dont", "know", "what", "you", "mean", "by", "glory", "Alice", "said"]
docs on match()
You probably don't even need to filter, just split using this Regular Expression:
" I dont know what you mean by glory Alice said.".split(/\b\s+/)
You should trim the string before using split.
var str = " I dont know what you mean by glory Alice said."
var trimmed = str.replace(/^\s+|\s+$/g, '');
trimmed = str.split(" ")
I recommend .match
:
str.match(/\b\w+\b/g);
This matches words between word boundaries, so all spaces are not matched and thus not included in the resulting array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With