I have an expression like
test_abc_HelloWorld_there could be more here.
"test"
I tried [A-Za-z]{1,}_
but that didn't work.
2 Separate Regular expressions, not combined
Any help is very appreciated!
Example:
for 1) the regex would match the word test
for 2) the regex would match the word abc
so any other match for either case would be wrong. As in, if I were to replace what I matched on then I would get something like this:
for case 1) match "test" and replace "test" with "Goat".
'Goat_abc_HelloWorld_there could be more here'
I don't want a replace, I just want a match on a word.
If you want . to match really everything, including newlines, you need to enable “dot-matches-all” mode in your regex engine of choice (for example, add re. DOTALL flag in Python, or /s in PCRE.
is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
The _ (underscore) character in the regular expression means that the zone name must have an underscore immediately following the alphanumeric string matched by the preceding brackets. The . (period) matches any character (a wildcard).
Definition and Usage The \f metacharacter matches form feed characters.
In both case you can use assertions.
^[^_]+(?=_)
will get you everything up to the first underscore of the line, and
(?<=_)[^_]+(?=_)
will match whatever string is located between two unserscores.
Step back and consider that maybe you're overengineering the solution here. Ruby has a split method for this, other languages probably have their own equivalents
given something like this "AAPL_annual_i.xls", you could just do this and take advantage of the fact that your data is already structured
string_object = "AAPL_annual_i.xls"
ary = string_object.split("_")
#=> ["AAPL", "annual", "i.xls"]
extension = ary.split(".")[1]
#=> ["xls"]
filetype = ary[3].split(".")[0] #etc
'doh!
But seriously, I've found that leaning on the split method is not only easier on me, it's easier on my associates who have to read my code and understand what it does.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With