How can I write a Ruby function that splits the input by any kind of whitespace, and remove all the whitespace from the result? For example, if the input is
aa bbb cc dd ee
Then return an array ["aa", "bbb", "cc", "dd", "ee"]
.
split is a String class method in Ruby which is used to split the given string into an array of substrings based on a pattern specified. Here the pattern can be a Regular Expression or a string. If pattern is a Regular Expression or a string, str is divided where the pattern matches.
Strings can be converted to arrays using a combination of the split method and some regular expressions. The split method serves to break up the string into distinct parts that can be placed into array element. The regular expression tells split what to use as the break point during the conversion process.
This is the default behavior of String#split
:
input = <<-TEXT aa bbb cc dd ee TEXT input.split
Result:
["aa", "bbb", "cc", "dd", "ee"]
This works in all versions of Ruby that I tested, including 1.8.7, 1.9.3, 2.0.0, and 2.1.2.
The following should work for the example you gave:
str.gsub(/\s+/m, ' ').strip.split(" ")
it returns:
["aa", "bbb", "cc", "dd", "ee"]
Meaning of code:
/\s+/m
is the more complicated part. \s
means white space, so \s+
means one ore more white space letters. In the /m
part, m
is called a modifier, in this case it means, multiline, meaning visit many lines, not just one which is the default behavior. So, /\s+/m
means, find sequences of one or more white spaces.
gsub
means replace all.
strip
is the equivalent of trim
in other languages, and removes spaces from the front and end of the string.
As, I was writing the explanation, it could be the case where you do end up with and end-line character at the end or the beginning of the string.
To be safe
The code could be written as:
str.gsub(/\s+/m, ' ').gsub(/^\s+|\s+$/m, '').split(" ")
So if you had:
str = "\n aa bbb\n cc dd ee\n\n"
Then you'd get:
["aa", "bbb", "cc", "dd", "ee"]
Meaning of new code:
^\s+
a sequence of white spaces at the beginning of the string
\s+$
a sequence of white spaces at the end of the string
So gsub(/^\s+|\s+$/m, '')
means remove any sequence of white space at the beginning of the string and at the end of the string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With