Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split complex string

I have a string like this:

1|f1|</a1|a2/></a3|a4/>|f2

I want to split by '|' by java but I need to ignore </ and />. How can I do this? Seems like a regexp approach this

the above string should split into:

1

f1

a1|a2

a3|a4

f2

like image 282
Bassem Aly Avatar asked Apr 12 '13 19:04

Bassem Aly


3 Answers

split method uses regex as parameter and | in regex is special character which means OR. To make it normal character place \\ before it like

"yourString".split("\\|");

In your case you would need also look-ahead mechanism so your regex can look like

/></|(/>)?\\|(?=[^>]*(</|$))(</)?

It will split on

  • /></
  • | with optional /> before or </ after it BUT ONLY if there will be no > after it until </ or end of your input $. This will guarantee that | is outside of </ />

Also to get rid of problems in situations like "</a|b/>|c|</d|e/>" where </ is at start and /> at the end of your input you need to remove them before split.

It seems necessary because we don't want to produce empty String as first or last element in produced array like in case "ab".split("a") which will produce {"", "b"}

Lets test it:

for (String s : "</a0|b0/>|1|f1|</a1|a2/></a3|a4/>|f2|</a5|a6/>"
        .replaceAll("^</", "").split("/></|/>$|(/>)?\\|(?=[^>]*(</|$))(</)?")) {
    System.out.println(s);
}

output:

a0|b0
1
f1
a1|a2
a3|a4
f2
a5|a6
like image 78
Pshemo Avatar answered Nov 10 '22 08:11

Pshemo


You could try the following Regex which uses negative look ahead.

(?!</[^\|]*)[\|](?![^\|]*/>)

This works out as:

[\|] matches occurrences of |

(?!</[^\|]*) states that said matches must not be preceded by </sometext

(?![^\|]*/>) states that said matches must not be followed by sometext/>

Note: in the above example sometext is zero or more characters which are not a |

like image 24
Alan Avatar answered Nov 10 '22 09:11

Alan


this regex should match. imma leave a list of possible things to try if one fails go to the next. The first one is \b should escape word bound but java might need to not escape the escape backslash so i added a second one. If both those fail move to the last one. This one says that it must be a letter between capital A to common z. There should be not option now for any spaces what so ever.

The end result for the last one is match:

"<" any character, multiple time,">" if that fails then

match:

any character that is a letter or number

"(<.*?>|[^|\\b]*)" "(<.*?>|[^|\b]*)" "(<.*?>|[A-z0-9]*)"

public String[] methodName(String s)
{
ArrayList<String>list= new ArrayList<String>();
Pattern p=Pattern.compile("(<.*?>|[^|]*)");
Matcher match=p.matcher(s);
while(match.find())
{
 list.add(match.group());
}
String[] listArray= new String[list.size()];
return listArray.toArray(listArray);
}

remember to vote if it helps cheers mate

like image 44
Lpc_dark Avatar answered Nov 10 '22 07:11

Lpc_dark