I have 2 case as mention in main method.
*In case 1 * match method is returning the exactly what i want but
In case 2 its returning empty list.
please help me to give a common regex which can work in both cases. examples input string:
"\"NB_DAY_DIM\".\"MONTH_YEAR\"+\"l\".\"m\""
desired o/p:
[{column=MONTH_YEAR, value="NB_DAY_DIM"."MONTH_YEAR", table=NB_DAY_DIM},{column=m, value="l"."m", table=l}]
public static List<Map<String, String>> match(String source) {
String pattern = "\"(.*?)\".\"(.*?)\"";
List<Object> list = new ArrayList<Object>();
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(source);
while (m.find()) {
Map<String, String> mp = new HashMap<String, String>();
mp.put("value", m.group(0));
mp.put("table", m.group(1));
mp.put("column", m.group(2));
list.add(mp);
source = source.replace(m.group(0), "");
m = r.matcher(source);
}
return list;
}
public static void main(String[] args) {
System.out.println(match("\"NB_DAY_DIM\".\"MONTH_YEAR\"+\"l\".\"m\""));//case 1
System.out.println(match("NB_DAY_DIM.MONTH_YEAR+l.m"));//case 2
}
operator between NB_DAY_DIM.MONTH_YEAR and l.m can be arithmetic or logical operator like "NB_DAY_DIM.MONTH_YEAR-l.m"
it can also have repetitive pattern like NB_DAY_DIM.MONTH_YEAR+l.m-xyz.abc*l.m
If the quotation marks are completely optional, try this expression (non-Java notation):
"?([^+"]+)"?\."?([^+"]+)"?
A short breakdown:
"? means optional quotation marks ([^+"]+) means a sequence of at least one character which is not a plus or quotation markUpdate:
Here's an expression for "either all quotation marks or none":
(?<=\+|^)("?)([^+"]+)\1\.\1([^+"]+)\1(?:\+|$)
The changes are:
\1 is a back reference to group one, i.e. if the first quotation mark is found, all others have to be there as well, if not, no quotation mark would be allowed.(?=\+|$) is used to define that a match must be followed by either a plus or the end of input. This is needed to reject cases which only have trailing quotation marks, e.g. NB_DAY_DIM.MONTH_YEAR"(?<=\+|^) is used to prevent cases which only have leading quotation marks, e.g. "NB_DAY_DIM.MONTH_YEAR+l.mThis expression would match
NB_DAY_DIM.MONTH_YEAR+"l"."m""NB_DAY_DIM"."MONTH_YEAR"+"l"."m"NB_DAY_DIM.MONTH_YEAR+l.m but not
"NB_DAY_DIM.MONTH_YEAR+"l"."m"NB_DAY_DIM."MONTH_YEAR"+"l"."m"NB_DAY_DIM.MONTH_YEAR"+l.m Update 2: since the comment says the delimiter could be any arithmetical operator, just expand the disallowed characters to inlcude them, e.g. instead of [^+"] use [^+\-*/"]. Additionally expand the look-behind/look-ahead from \+|^ to [+\-*/]|^.
Here's an expanded expression, if there are additional requirements, feel free to add them:
(?<=[+\-*/]|^)("?)([^+\-*/"]+)\1\.\1([^+\-*/"]+)\1(?=[+\-*/]|$)
This would match NB_DAY_DIM.MONTH_YEAR+l.m-xyz.abc*l.m.
Update 3:
In order to extract <table>.<column> pairs from your string, you can use an expression like this:
"?(\w+)"?\."?(\w+)"?
Note that this doesn't ensure that either all quotes are set or none and it also assumes you're only using word characters (i.e. [a-zA-Z0-9_]) for table and column names.
It might serve your purpose, however.
If you need additional help, please start a new question and don't put it all into this one. I'd advise to dive into regex syntax (a good source would be http://regular-epxressions.info), if you need it more often (and it's always good to know).
A last note on regular expressions: not all problems are best solved (or even solvable) with regex. Your examples are starting to get more and more complicated and it seems you're actually attempting to write some parser. Regex are of limited use here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With