Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: match everything before FIRST underscore and everything in between AFTER

Tags:

regex

I have an expression like

test_abc_HelloWorld_there could be more here.
  1. I'd like a regex that takes the first word before the first underscore. So get "test"

I tried [A-Za-z]{1,}_ but that didn't work.

  1. Then I'd like to get "abc" or anything in between the first 2 underscores.

2 Separate Regular expressions, not combined

Any help is very appreciated!

Example:

for 1) the regex would match the word test for 2) the regex would match the word abc

so any other match for either case would be wrong. As in, if I were to replace what I matched on then I would get something like this:

for case 1) match "test" and replace "test" with "Goat".

'Goat_abc_HelloWorld_there could be more here'

I don't want a replace, I just want a match on a word.

like image 215
EKet Avatar asked May 12 '11 23:05

EKet


People also ask

How do you match everything including newline regex?

If you want . to match really everything, including newlines, you need to enable “dot-matches-all” mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.

What does ?= * Mean in regex?

is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What does _ do in regex?

The _ (underscore) character in the regular expression means that the zone name must have an underscore immediately following the alphanumeric string matched by the preceding brackets. The . (period) matches any character (a wildcard).

What does \f mean in regex?

Definition and Usage The \f metacharacter matches form feed characters.


2 Answers

In both case you can use assertions.

^[^_]+(?=_)

will get you everything up to the first underscore of the line, and

(?<=_)[^_]+(?=_)

will match whatever string is located between two unserscores.

like image 200
Thomas Hupkens Avatar answered Nov 15 '22 19:11

Thomas Hupkens


Step back and consider that maybe you're overengineering the solution here. Ruby has a split method for this, other languages probably have their own equivalents

given something like this "AAPL_annual_i.xls", you could just do this and take advantage of the fact that your data is already structured

string_object = "AAPL_annual_i.xls"
ary = string_object.split("_")
#=> ["AAPL", "annual", "i.xls"]
extension = ary.split(".")[1]
#=> ["xls"]
filetype = ary[3].split(".")[0] #etc

'doh!

But seriously, I've found that leaning on the split method is not only easier on me, it's easier on my associates who have to read my code and understand what it does.

like image 34
boulder_ruby Avatar answered Nov 15 '22 19:11

boulder_ruby