Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx: capture entire group content

Tags:

regex

parsing

I am writing a parser for some Oracle commands, like

LOAD DATA
  INFILE  /DD/DATEN
TRUNCATE
PRESERVE BLANKS
INTO TABLE aaa.bbb
( some parameters... )

I already created a regex to match the entire command. I am now looking for a way to capture the name of the input file ("/DD/DATEN" for instance here). My problem is that using the following regex will only return the last character of the first group ("N").

^\s*LOAD DATA\s*INFILE\s*(\w|\\|/)+\s*$

Regular expression visualization

Debuggex Demo

Any ideas? Many thanks in advance

EDIT: following @HamZa 's question, here would be the entire regex to parse Oracle LOAD DATA INFILE command (simplified though):

^\s*LOAD DATA\s*INFILE\s*((?:\w|\\|/)+)\s*((?:TRUNCATE|PRESERVE BLANKS)\s*){0,2}\s*INTO TABLE\s*((?:\w|\.)+)\s*\(\s*((\w+)\s*POSITION\s*\(\s*\d+\s*\:\s*\d+\s*\)\s*((DATE\s*\(\s*(\d+)\s*\)\s*\"YYYY-MM-DD\")|(INTEGER EXTERNAL)|(CHAR\s*\(\s*(\d+)\s*\)))\s*\,{0,1}\s*)+\)\s*$

Regular expression visualization

Debuggex Demo

like image 511
Jérémie Avatar asked Nov 10 '13 19:11

Jérémie


People also ask

How do you repeat a capturing group in regex?

"Capturing a repeated group captures all iterations." In your regex101 try to replace your regex with (\w+),? and it will give you the same result. The key here is the g flag which repeats your pattern to match into multiple groups.

What is regex grouping?

What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.

When capturing regex groups what datatype does the groups method return?

groups() method. This method returns a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.


2 Answers

Let's point out the wrongdoer in your regex (\w|\\|/)+. What happens here ?
You're matching either a word character or a back/forwardslash and putting it in group 1 (\w|\\|/) after that you're telling the regex engine to do this one or more times +. What you actually want is to match those characters several times before grouping them. So you might use a non-matching group (?:) : ((?:\w|\\|/)+).

You might notice that you could just use a character class after all ([\w\\/]+). Hence, your regex could look like

^\s*LOAD DATA\s*INFILE\s*([\w\\/]+)\s*$

On a side note: that end anchor $ will cause your regex to fail if you're not using multiline mode. Or is it that you intentionally didn't post the full regex :) ?

like image 151
2 revs, 2 users 71% Avatar answered Oct 26 '22 00:10

2 revs, 2 users 71%


Not tested but...

^\s*LOAD DATA\s*INFILE\s*(\S+)\s*$
like image 37
Vorsprung Avatar answered Oct 26 '22 00:10

Vorsprung