Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split doesn't return empty string

Tags:

split

ruby

Is there a way to obtain:

"[][][]".split('[]')
#=> ["", "", ""]

instead of

#=>[]

without having to write a function?

The behavior is surprising here because sometimes irb would respond as expected:

"[]a".split('[]')
#=>["", "a"]`
like image 425
Sylvain Martin Saint Léon Avatar asked Dec 19 '22 19:12

Sylvain Martin Saint Léon


2 Answers

From the docs:

If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if limit is 1, the entire string is returned as the only entry in an array). If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.

And so:

"[][][]".split("[]", -1)
# => ["", "", "", ""]

This yields four empty strings rather than your three, but if you think about it it's the only result that makes sense. If you split ,,, on each comma you would expect to get four empty strings as well, since there's one empty item "before" the first comma and one "after" the last.

like image 76
Jordan Running Avatar answered Dec 21 '22 08:12

Jordan Running


String#split takes two arguments: a pattern to split on, and a limit to the number of results returned. In this case, limit can help us.

The documentation for String#split says:

If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if *limit( is 1, the entire string is returned as the only entry in an array).

The key phrase here is trailing null fields are suppressed, in other words, if you have extra, empty matches at the end of the string, they'll be dropped from the result unless you have set a limit.

Here's an example:

"[]a[][]".split("[]")
#=> ["", "a"]

You might expect to get ["", "a", "", ""], but because trailing null fields are suppressed, everything after the last non-empty match (the a) is dropped.

We could set a limit, and only get that many results:

"[]a[][]".split("[]", 3)
#=> ["", "a", "[]"]

In this case, since we've asked for 3 results, the last [] is ignored and forms part of the last result. This is useful when we know how many results we expect, but not so useful in your specific case.

Fortunately, the docs continue:

If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.

In other words, we can pass a limit of -1, and get all the matches, even the trailing empty ones:

"[]a[][]".split('[]', -1)
#=> ["", "a", "", ""]

This even works when all the matches are empty:

"[][][]".split('[]', -1)
#=> ["", "", "", ""]
like image 30
georgebrock Avatar answered Dec 21 '22 07:12

georgebrock