Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does this split line in Scala mean?

Tags:

regex

scala

I have a problem with a certain Scala code, where I found this split line. Before I only used split lines like:

var newLine = line.split(",")

But what does this split mean?

var newLine2 = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)")

The line I need to split looks like this:

1966, "Green, Green Grass of Home", Tom Jones, 850000

Thanks in advance!

like image 603
amko23 Avatar asked Jun 03 '13 07:06

amko23


2 Answers

The string inside split method defines a regular expression. The group (?=([^\"]*\"[^\"]*\")*[^\"]*$) is a positive lookahead assertion. That means split on a comma, but only if the pattern ([^\"]*\"[^\"]*\")*[^\"]*$ is following the comma.

([^\"]*     # a series of non double quote characters
    \"      # a double quote
    [^\"]*  # a series of non double quote characters
\")         # a double quote
*           # repeat that whole group 0 or more times
[^\"]*$     # a series of non double quote characters till the end of the string

that means it will only split on commas, when there is an equal amount of double quotes following the comma, so in other words, split only if the comma is not inside double quotes. (This will work as long there are only pairs of quotes in the string.)

like image 188
stema Avatar answered Oct 13 '22 00:10

stema


This is an regular expression ("RegEx"), see http://en.wikipedia.org/wiki/Regular_expression for an explanation

like image 33
Landei Avatar answered Oct 13 '22 00:10

Landei