Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parse the specific string into desired output

Tags:

java

regex

I have 2 case as mention in main method.

*In case 1 * match method is returning the exactly what i want but

In case 2 its returning empty list.

please help me to give a common regex which can work in both cases. examples input string:

"\"NB_DAY_DIM\".\"MONTH_YEAR\"+\"l\".\"m\""

desired o/p:

[{column=MONTH_YEAR, value="NB_DAY_DIM"."MONTH_YEAR", table=NB_DAY_DIM},{column=m, value="l"."m", table=l}]

public static List<Map<String, String>> match(String source) {
String pattern = "\"(.*?)\".\"(.*?)\"";


    List<Object> list = new ArrayList<Object>();

    Pattern r = Pattern.compile(pattern);
    Matcher m = r.matcher(source);

    while (m.find()) {

        Map<String, String> mp = new HashMap<String, String>();
        mp.put("value", m.group(0));
        mp.put("table", m.group(1));
        mp.put("column", m.group(2));

        list.add(mp);
        source = source.replace(m.group(0), "");
        m = r.matcher(source);
    }

    return list;
}

public static void main(String[] args) {
    System.out.println(match("\"NB_DAY_DIM\".\"MONTH_YEAR\"+\"l\".\"m\""));//case 1
    System.out.println(match("NB_DAY_DIM.MONTH_YEAR+l.m"));//case 2
}

operator between NB_DAY_DIM.MONTH_YEAR and l.m can be arithmetic or logical operator like "NB_DAY_DIM.MONTH_YEAR-l.m" it can also have repetitive pattern like NB_DAY_DIM.MONTH_YEAR+l.m-xyz.abc*l.m

like image 785
Shekhar Avatar asked Apr 16 '26 19:04

Shekhar


1 Answers

If the quotation marks are completely optional, try this expression (non-Java notation):

"?([^+"]+)"?\."?([^+"]+)"?

A short breakdown:

  • "? means optional quotation marks
  • ([^+"]+) means a sequence of at least one character which is not a plus or quotation mark
  • Thus the complete expression means "two sequences of at least one character which are not a plus or quotation mark, separated by a dot and surrounded by optional quotation marks.

Update:

Here's an expression for "either all quotation marks or none":

(?<=\+|^)("?)([^+"]+)\1\.\1([^+"]+)\1(?:\+|$)

The changes are:

  • group 1 is now the quotation mark, thus you'd have to change your code which reads the qroups
  • \1 is a back reference to group one, i.e. if the first quotation mark is found, all others have to be there as well, if not, no quotation mark would be allowed.
  • (?=\+|$) is used to define that a match must be followed by either a plus or the end of input. This is needed to reject cases which only have trailing quotation marks, e.g. NB_DAY_DIM.MONTH_YEAR"
  • likewise (?<=\+|^) is used to prevent cases which only have leading quotation marks, e.g. "NB_DAY_DIM.MONTH_YEAR+l.m

This expression would match

  • NB_DAY_DIM.MONTH_YEAR+"l"."m"
  • "NB_DAY_DIM"."MONTH_YEAR"+"l"."m"
  • NB_DAY_DIM.MONTH_YEAR+l.m
  • etc.

but not

  • "NB_DAY_DIM.MONTH_YEAR+"l"."m"
  • NB_DAY_DIM."MONTH_YEAR"+"l"."m"
  • NB_DAY_DIM.MONTH_YEAR"+l.m
  • etc.

Update 2: since the comment says the delimiter could be any arithmetical operator, just expand the disallowed characters to inlcude them, e.g. instead of [^+"] use [^+\-*/"]. Additionally expand the look-behind/look-ahead from \+|^ to [+\-*/]|^.

Here's an expanded expression, if there are additional requirements, feel free to add them:

(?<=[+\-*/]|^)("?)([^+\-*/"]+)\1\.\1([^+\-*/"]+)\1(?=[+\-*/]|$)

This would match NB_DAY_DIM.MONTH_YEAR+l.m-xyz.abc*l.m.

Update 3:

In order to extract <table>.<column> pairs from your string, you can use an expression like this:

"?(\w+)"?\."?(\w+)"?

Note that this doesn't ensure that either all quotes are set or none and it also assumes you're only using word characters (i.e. [a-zA-Z0-9_]) for table and column names.

It might serve your purpose, however.

If you need additional help, please start a new question and don't put it all into this one. I'd advise to dive into regex syntax (a good source would be http://regular-epxressions.info), if you need it more often (and it's always good to know).

A last note on regular expressions: not all problems are best solved (or even solvable) with regex. Your examples are starting to get more and more complicated and it seems you're actually attempting to write some parser. Regex are of limited use here.

like image 113
Thomas Avatar answered Apr 18 '26 08:04

Thomas