Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a seach term by ":"

Tags:

regex

ruby

I have the following search term:

"login:17639 email:[email protected] ref:co-10000 common_name:testingdomain organization:'Internet Company'"

This term is derived from a params variable where everything to the left of the : is a filter term and everything on the right of : is the value of the filter. What I'm trying to do is to split the term into keys and values and create a hash from them. This is the end goal:

search_filters = {
  login:17639,
  email:'[email protected]',
  etc, etc,
}

I'm playing around with split, gsub, tr to get these values but I'm having a problem with the organization field. Here is what I have so far:

term.gsub(/'/,'').tr(':', ' ').split(" ")
term.gsub(":")

And basically, many other variations like the above. The problem is the organization field. Every iteration results in something like this ["organization", "Internet", "Company"] the problem is that "Internet Company" is being split. I can't place a simple if/else statement just for this filter to glue them together because there are more filters to process. Is there a way I can simply divide the filter term based off the colon easier? Thank you.

like image 855
Dan Rubio Avatar asked Dec 19 '19 18:12

Dan Rubio


2 Answers

Here's an example on how to start:

def splart(input)
  input.scan(/([^:]+):('[^']*'|"[^"]*"|\S+)/).to_h
end

That will tease out the data you need. You may have to clean it up after.

like image 143
tadman Avatar answered Oct 14 '22 23:10

tadman


str = "login:17639 email:[email protected] ref:co-10000 " + 
      "common_name:testingdomain organization:'ABC Internet Company'"

Hash[*str.split(/:| +(?![^'":]+['"])/)].transform_keys(&:to_sym)
  #=> {:login=>"17639", :email=>"[email protected]",
  #    :ref=>"co-10000", :common_name=>"testingdomain",
  #    :organization=>"'ABC Internet Company'"} 

See Hash::[] and Hash#transform_keys.

We can document the regular expression by writing it in free-spacing mode:

/
:         # match :
|         # or
[ ]+      # match > 0 spaces
(?!       # begin negative lookahead
  [^'":]+ # match > 0 chars other than ', " or :
  ['"]    # match ' or "
)         # end negative lookahead
/x        # free-spacing regex definition mode

In free-spacing mode spaces are removed before the expression is parsed. That is why spaces intended to be part of the regex must be protected. I've done that by enclose a space in a character class ([ ]) but one could instead escape a space character, use Unicode's [[:space:]] or \p{Space} or, if appropriate, \s, which would include tabs and newlines (and a few more characters).

Suppose str were shorter and contained only two key-value pairs, and we computed:

arr = str.split(/:| +(?![^'":]+['"])/)
  #=> ["login", "17639", "email", "[email protected]"]

We would use Hash::[] as follows:

Hash["login", "17639", "email", "[email protected]"]
  #=> {"login"=>"17639", "email"=>"[email protected]"}

which is the same as:

Hash[*arr]
like image 24
Cary Swoveland Avatar answered Oct 14 '22 23:10

Cary Swoveland