Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java string split on alphanumeric and new lines?

Tags:

java

I have a test.txt file containing several lines for example, such as:

"h3llo, @my name is, bob! (how are you?)"

"i am fine@@@@@"

I want to split all the alphanumeric characters and the new line into an arraylist so the output would be

output = ["h", "llo", "my", "name", "is", "bob", "how", "are", "you", "i", "am", "fine"]

Right now, I tried splitting my text with

output.split("\\P{Alpha}+")

But for some reason this seems to add a comma in the first spot in the arraylist, and replaces the newline with an empty string

output = ["", "h", "llo", "my", "name", "is", "bob", "how", "are", "you", "", "i", "am", "fine"]

Is there another way to fix this? Thank you!

--

EDIT: How can I make sure it ignores the new line?

like image 579
evelyn Avatar asked Jan 13 '16 16:01

evelyn


People also ask

How do I split a string after a new line?

Split a string at a newline character. When the literal \n represents a newline character, convert it to an actual newline using the compose function. Then use splitlines to split the string at the newline character. Create a string in which two lines of text are separated by \n .

What does split \\ s+ do in Java?

split("\\s+") will split the string into string of array with separator as space or multiple spaces. \s+ is a regular expression for one or more spaces.


1 Answers

Java's String.split() behavior is pretty confusing. A much better splitting utility is Guava's Splitter. Their documentation goes into more detail about the problems with String.split():

The built in Java utilities for splitting strings can have some quirky behaviors. For example, String.split silently discards trailing separators, and StringTokenizer respects exactly five whitespace characters and nothing else.

Quiz: ",a,,b,".split(",") returns...

  1. "", "a", "", "b", ""
  2. null, "a", null, "b", null
  3. "a", null, "b"
  4. "a", "b"
  5. None of the above

The correct answer is none of the above: "", "a", "", "b". Only trailing empty strings are skipped. What is this I don't even.

In your case this should work:

Splitter.onPattern("\\P{Alpha}+").omitEmptyStrings().splitToList(output);
like image 139
dimo414 Avatar answered Oct 27 '22 00:10

dimo414