Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split text in Ruby without creating empty strings?

Splitting on whitespace, period, comma or double quotes, and not on single quotes:

str = %Q{this is the.string    to's split,real "ok" nice-like.}
str.split(/\s|\.|,|"/)
=> ["this", "is", "the", "string", "", "", "", "to's", "split", "real", "", "ok", "", "nice-like"]

How to eloquently remove empty strings?

How to eloquently remove strings that are shorter than MIN_LENGTH?

like image 586
B Seven Avatar asked Mar 15 '12 03:03

B Seven


3 Answers

The idea of using split is not right in this case. You should be using scan.

str = %Q{this is the.string    to's split,real "ok" nice-like.}
str.scan(/[\w'-]+/)
# => ["this", "is", "the", "string", "to's", "split", "real", "ok", "nice-like"]

In order to match strings that are MIN_LENGTH or longer, do like this:

MIN_LENGTH = 3
str.scan(/[\w'-]{#{MIN_LENGTH},}/)
# => ["this", "the", "string", "to's", "split", "real", "nice-like"]

When to use split, when to use scan

  • When the delimiters are messy and making a regex match them is difficult, use scan.
  • When the substrings to extract are messy and making a regex match them is difficult, use split.
  • When you want to impose conditions on the form of the substrings to be extracted, you scan.
  • When you want to impose conditions on the form of the delimiters, use split.
like image 195
sawa Avatar answered Oct 13 '22 01:10

sawa


I'm not entirely clear on the problem domain, but if you just want to avoid the empty strings, why not split on one or more occurrences of your separators?

str.split /[\s\.,"]+/
like image 33
Tobias Cohen Avatar answered Oct 13 '22 03:10

Tobias Cohen


I would think a simple way to do that is as follows:

str.split(/\s|\.|,|"/).select{|s| s.length >= MIN_LENGTH}
like image 43
Nikhil Avatar answered Oct 13 '22 02:10

Nikhil