Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use one line regular expression to get matched content

Tags:

regex

ruby

I'm a newbie to ruby, I want to know if I can use just one line to do the job.

Take the 'search' of this site for example. When user typed [ruby] regex, I can use following code to get the tag and keyword

'[ruby] regex' =~ /\[(.*?)\](.*)/ tag, keyword = $1, $2 

Can we write it just in one line?


UPDATE

Thank you so much! May I make it harder and more interesting, that the input may contains more than one tags, like:

[ruby] [regex] [rails] one line 

Is it possible to use one line code to get the tags array and the keyword? I tried, but failed.

like image 627
Freewind Avatar asked Jun 22 '10 06:06

Freewind


People also ask

How do I match an entire line in regex?

To expand the regex to match a complete line, add ‹ . * › at both ends. The dot-asterisk sequences match zero or more characters within the current line. The asterisk quantifiers are greedy, so they will match as much text as possible.

How do you match everything including newline regex?

The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.

How do you use the Match function in regex?

The REGEXMATCH function belongs to Google Sheets' suite of REGEX functions along with functions like REGEXEXTRACT and REGEXREPLACE. Its main task is to find if a string of text matches a regular expression. The function returns a TRUE if the text matches the regular expression's pattern and a FALSE if it doesn't.

What regular expression would you use to match a single character?

Use square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore). Use \d to match any single digit. Use \s to match any single whitespace character.


1 Answers

You need the Regexp#match method. If you write /\[(.*?)\](.*)/.match('[ruby] regex'), this will return a MatchData object. If we call that object matches, then, among other things:

  • matches[0] returns the whole matched string.
  • matches[n] returns the nth capturing group ($n).
  • matches.to_a returns an array consisting of matches[0] through matches[N].
  • matches.captures returns an array consisting of just the capturing group (matches[1] through matches[N]).
  • matches.pre_match returns everything before the matched string.
  • matches.post_match returns everything after the matched string.

There are more methods, which correspond to other special variables, etc.; you can check MatchData's docs for more. Thus, in this specific case, all you need to write is

tag, keyword = /\[(.*?)\](.*)/.match('[ruby] regex').captures 

Edit 1: Alright, for your harder task, you're going to instead want the String#scan method, which @Theo used; however, we're going to use a different regex. The following code should work:

# You could inline the regex, but comments would probably be nice. tag_and_text = / \[([^\]]*)\] # Match a bracket-delimited tag,                  \s*          # ignore spaces,                  ([^\[]*) /x  # and match non-tag search text. input        = '[ruby] [regex] [rails] one line [foo] [bar] baz' tags, texts  = input.scan(tag_and_text).transpose 

The input.scan(tag_and_text) will return a list of tag–search-text pairs:

[ ["ruby", ""], ["regex", ""], ["rails", "one line "] , ["foo", ""], ["bar", "baz"] ] 

The transpose call flips that, so that you have a pair consisting of a tag list and a search-text list:

[["ruby", "regex", "rails", "foo", "bar"], ["", "", "one line ", "", "baz"]] 

You can then do whatever you want with the results. I might suggest, for instance

search_str = texts.join(' ').strip.gsub(/\s+/, ' ') 

This will concatenate the search snippets with single spaces, get rid of leading and trailing whitespace, and replace runs of multiple spaces with a single space.

like image 126
Antal Spector-Zabusky Avatar answered Sep 20 '22 21:09

Antal Spector-Zabusky