Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Split string into words commas and full stops

Tags:

java

regex

split

I have been using myString.split("\\s+"); to get the each word. But now I want to split the commas and full stops aswell. For Example:

Mama always said life was like a box of chocolates, you never know what you're gonna get.

to:

{Mama, always, said, life, was, like, a, box, of, chocolates ,,, You, never, know, what, you're, gonna, get,.,}

How would one go about doing this?

like image 233
Reg Avatar asked Nov 28 '12 22:11

Reg


People also ask

How do you separate a string with a comma?

To split a string with comma, use the split() method in Java. str. split("[,]", 0);

How do you split a string by a space and a comma?

To split a string by space or comma, pass the following regular expression to the split() method - /[, ]+/ . The method will split the string on each occurrence of a space or comma and return an array containing the substrings.

How do you parse a comma delimited string in Java?

In order to parse a comma-delimited String, you can just provide a "," as a delimiter and it will return an array of String containing individual values. The split() function internally uses Java's regular expression API (java. util. regex) to do its job.

Can you split a string with multiple delimiters in Java?

Example 4: Split String by Multiple Delimiters Java program to split a string with multiple delimiters. Use regex OR operator '|' symbol between multiple delimiters. In the given example, I am splitting the string with two delimiters hyphen and dot.


2 Answers

If commas and periods are always followed by whitespace or end-of-string, then you can write:

myString.split("(?=[,.])|\\s+");

If they're not and you want e.g. a,b to be split into three strings, then:

myString.split("(?<=[,.])|(?=[,.])|\\s+");
like image 158
ruakh Avatar answered Sep 28 '22 01:09

ruakh


You could use a lookahead to split before dots and commas, too:

myString.split("\\s+|(?=[,.])");

That the lookahead is not included in the actual match, so the actual character (comma or period) will end up in the resultant array.

like image 39
Martin Ender Avatar answered Sep 28 '22 01:09

Martin Ender