Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string on spaces in Java, except if between quotes (i.e. treat \"hello world\" as one token) [duplicate]

Tags:

java

How do I split a String based on space but take quoted substrings as one word?

Example:

Location "Welcome  to india" Bangalore Channai "IT city"  Mysore 

it should be stored in ArrayList as

Location Welcome to india Bangalore Channai IT city Mysore 
like image 459
user1000535 Avatar asked Oct 18 '11 08:10

user1000535


People also ask

How do you split a string with double quotes?

Use method String. split() It returns an array of String, splitted by the character you specified.

How do you split a string in between spaces?

You can split a String by whitespaces or tabs in Java by using the split() method of java. lang. String class. This method accepts a regular expression and you can pass a regex matching with whitespace to split the String where words are separated by spaces.

What does split \\ s+ do in Java?

split("\\s+") will split the string into string of array with separator as space or multiple spaces. \s+ is a regular expression for one or more spaces.

How do you break apart a string in Java?

split("-"); We can simply use a character/substring instead of an actual regular expression. Of course, there are certain special characters in regex which we need to keep in mind, and escape them in case we want their literal value. Once the string is split, the result is returned as an array of Strings.


1 Answers

Here's how:

String str = "Location \"Welcome  to india\" Bangalore " +              "Channai \"IT city\"  Mysore";  List<String> list = new ArrayList<String>(); Matcher m = Pattern.compile("([^\"]\\S*|\".+?\")\\s*").matcher(str); while (m.find())     list.add(m.group(1)); // Add .replace("\"", "") to remove surrounding quotes.   System.out.println(list); 

Output:

[Location, "Welcome  to india", Bangalore, Channai, "IT city", Mysore] 

The regular expression simply says

  • [^"]     - token starting with something other than "
  • \S*       - followed by zero or more non-space characters
  • ...or...
  • ".+?"   - a "-symbol followed by whatever, until another ".
like image 73
aioobe Avatar answered Sep 20 '22 12:09

aioobe