I am writing a parser for some Oracle commands, like
LOAD DATA
INFILE /DD/DATEN
TRUNCATE
PRESERVE BLANKS
INTO TABLE aaa.bbb
( some parameters... )
I already created a regex to match the entire command. I am now looking for a way to capture the name of the input file ("/DD/DATEN" for instance here). My problem is that using the following regex will only return the last character of the first group ("N").
^\s*LOAD DATA\s*INFILE\s*(\w|\\|/)+\s*$
Debuggex Demo
Any ideas? Many thanks in advance
EDIT: following @HamZa 's question, here would be the entire regex to parse Oracle LOAD DATA INFILE command (simplified though):
^\s*LOAD DATA\s*INFILE\s*((?:\w|\\|/)+)\s*((?:TRUNCATE|PRESERVE BLANKS)\s*){0,2}\s*INTO TABLE\s*((?:\w|\.)+)\s*\(\s*((\w+)\s*POSITION\s*\(\s*\d+\s*\:\s*\d+\s*\)\s*((DATE\s*\(\s*(\d+)\s*\)\s*\"YYYY-MM-DD\")|(INTEGER EXTERNAL)|(CHAR\s*\(\s*(\d+)\s*\)))\s*\,{0,1}\s*)+\)\s*$
Debuggex Demo
"Capturing a repeated group captures all iterations." In your regex101 try to replace your regex with (\w+),? and it will give you the same result. The key here is the g flag which repeats your pattern to match into multiple groups.
What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.
groups() method. This method returns a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.
Let's point out the wrongdoer in your regex (\w|\\|/)+
. What happens here ?
You're matching either a word character or a back/forwardslash and putting it in group 1 (\w|\\|/)
after that you're telling the regex engine to do this one or more times +
. What you actually want is to match those characters several times before grouping them. So you might use a non-matching group (?:)
: ((?:\w|\\|/)+)
.
You might notice that you could just use a character class after all ([\w\\/]+)
. Hence, your regex could look like
^\s*LOAD DATA\s*INFILE\s*([\w\\/]+)\s*$
On a side note: that end anchor $
will cause your regex to fail if you're not using multiline mode. Or is it that you intentionally didn't post the full regex :) ?
Not tested but...
^\s*LOAD DATA\s*INFILE\s*(\S+)\s*$
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With