Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string without getting an empty string inserted in the array

Tags:

regex

split

ruby

I'm having trouble splitting a character from a string using a regular expression, assuming there is a match.

I want to split off either an "m" or an "f" character from the first part of a string assuming the next character is one or more numbers followed by optional space characters, followed by a string from an array I have.

I tried:

2.4.0 :006 > MY_SEPARATOR_TOKENS = ["-", " to "]
 => ["-", " to "] 
2.4.0 :008 > str = "M14-19"
 => "M14-19" 
2.4.0 :011 > str.split(/^(m|f)\d+[[:space:]]*#{Regexp.union(MY_SEPARATOR_TOKENS)}/i)
 => ["", "M", "19"] 

Notice the extraneous "" element at the beginning of my array and also notice that the last expression is just "19" whereas I would want everything else in the string ("14-19").

How do I adjust my regular expression so that only the parts of the expression that get split end up in the array?

like image 770
Dave Avatar asked Jan 31 '26 10:01

Dave


2 Answers

I find match to be a bit more elegant when extracting characters from regular expressions in Ruby:

string = "M14-19"
string.match(/\A(?<m>[M|F])(?<digits>\d{2}(-| to )\d{2})/)[1, 2]
=> ["M", "14-19"]
# also can extract the symbols from match
extract_string = string.match(/\A(?<m>[M|F])(?<digits>\d{2}(-| to )\d{2})/)
[[extract_string[:m], extract_string[:digits]]
=> ["M", "14-19"]
string = 'M14 to 14'
extract_string = string.match(/\A(?<m>[M|F])(?<digits>\d{2}(-| to )\d{2})/)[1, 2]
=> ["M", "14 to 14"]
like image 113
David Gross Avatar answered Feb 03 '26 04:02

David Gross


 TOKENS = ["-", " to "]

 r = /
     (?<=\A[mMfF])             # match the beginning of the string and then one
                               # of the 4 characters in a positive lookbehind
     (?=                       # begin positive lookahead
       \d+                     # match one or more digits
       [[:space:]]*            # match zero or more spaces
       (?:#{TOKENS.join('|')}) # match one of the tokens
     )                         # close the positive lookahead
     /x                        # free-spacing regex definition mode

(?:#{TOKENS.join('|')}) is replaced by (?:-| to ).

This can of course be written in the usual way.

r = /(?<=\A[mMfF])(?=\d+[[:space:]]*(?:#{TOKENS.join('|')}))/

When splitting on r you are splitting between two characters (between a positive lookbehind and a positive lookahead) so no characters are consumed.

"M14-19".split r
  #=> ["M", "14-19"]
"M14     to 19".split r
  #=> ["M", "14     to 19"]
"M14     To 19".split r
  #=> ["M14     To 19"]

If it is desired that ["M", "14 To 19"] be returned in the last example, change [mMfF] to [mf] and /x to /xi.

like image 24
Cary Swoveland Avatar answered Feb 03 '26 06:02

Cary Swoveland