Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zero-length string being returned from String#split

Tags:

ruby

In Ruby 1.9.3 (and probably earlier versions, not sure), I'm trying to figure out why Ruby's String#split method is giving me certain results. The results I'm getting seem counter-intuitive from what I would expect. Here's an example:

"abcabc".split("b") #=> ["a", "ca", "c"]
"abcabc".split("a") #=> ["", "bc", "bc"]
"abcabc".split("c") #=> ["ab", "ab"]

Here, the first example returns exactly what I would expect.

But in the second example, I'm confused as to why #split is returning a zero-length string as the first value of the returned array. What is the reasoning for this? This is what I would expect instead:

"abcabc".split("a") #=> ["bc", "bc"]

And along that same line, how come there is no trailing zero-length string returned in the third example? If the second example returns a zero-length string as it's first value, then the last example should return the same as it's last value.

Enlighten me, what am I missing here??

EDIT: Looking more into it, I realize why this is the default behavior and why my thinking is completely wrong. If we were to go through a CSV file for example, splitting on each column, our data would be thrown off because empty leading columns would be ignored.

Also it's important to note that this question isn't related to only Ruby--I'm learning that many other languages behave in the exact same manner. I was simply using Ruby when I learned of this.

like image 853
Threeve Avatar asked Jun 27 '12 20:06

Threeve


2 Answers

The ruby 1.9 documentation says

If the limit parameter is omitted, trailing null fields are suppressed.

So if we take your example:

 "abcabc".split("a") #=> ["bc", "bc"]

And we include a limit value:

 "abcabc".split("a", -1)  #=> ["ab", "ab", ""]

You get the expected behavior.

like image 191
tlehman Avatar answered Nov 20 '22 12:11

tlehman


"abcabc".split("b") #=> ["a", "ca", "c"]
"abcabc".split("a") #=> ["", "bc", "bc"]
"abcabc".split("c") #=> ["ab", "ab"]

Suppose you were splitting on a comma. What behaviour would you expect from ",bc,bc".split(',')? It's not different with splitting on 'a'. For the third example, split omits the trailing empties by default.

like image 21
steenslag Avatar answered Nov 20 '22 13:11

steenslag