I have this string :
<meis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" uri="localhost/naro-nei" onded="flpSW531213" identi="lemenia" id="75" lastStop="bendi" xsi:noNamespaceSchemaLocation="http://localhost/xsd/postat.xsd xsd/postat.xsd">
How can I get lastStop
property value in JAVA?
This regex worked when tested on http://www.myregexp.com/
But when I try it in java I don't see the matched text, here is how I tried :
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class SimpleRegexTest {
public static void main(String[] args) {
String sampleText = "<meis xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" uri=\"localhost/naro-nei\" onded=\"flpSW531213\" identi=\"lemenia\" id=\"75\" lastStop=\"bendi\" xsi:noNamespaceSchemaLocation=\"http://localhost/xsd/postat.xsd xsd/postat.xsd\">";
String sampleRegex = "(?<=lastStop=[\"']?)[^\"']*";
Pattern p = Pattern.compile(sampleRegex);
Matcher m = p.matcher(sampleText);
if (m.find()) {
String matchedText = m.group();
System.out.println("matched [" + matchedText + "]");
} else {
System.out.println("didn’t match");
}
}
}
Maybe the problem is that I use escape char in my test , but real string doesn't have escape inside. ?
UPDATE
Does anyone know why this doesn't work when used in java ? or how to make it work?
You can extract a substring from a String using the substring() method of the String class to this method you need to pass the start and end indexes of the required substring.
The substr() method extracts a part of a string. The substr() method begins at a specified position, and returns a specified number of characters. The substr() method does not change the original string. To extract characters from the end of the string, use a negative start position.
Ways to take string input in Java:Using BufferedReader class readLine() method. Using Scanner class nextLine() method. Through Scanner class next() method. Using Command-line arguments of the main() method.
Public String substring(int startIndex, int endIndex):This method is used to return a new String object that includes a substring of the given string with their indexes lying between startIndex and endIndex. If the second argument is given, the substring begins with the element at the startIndex to endIndex -1.
(?<=lastStop=[\"']?)[^\"]+
The reason it doesn't work as you expect is because of the *
in [^\"']*
. The lookbehind is matching at the position before the "
in lastStop="
, which is permitted because the quote is optional: [\"']?
. The next part is supposed to match zero or more non-quote characters, but because the next character is a quote, it matches zero characters.
If you change that *
to a +
, the second part will fail to match at that position, forcing the regex engine to bump ahead one more position. The lookbehind will match the quote, and [^\"']+
will match what follows. However, you really shouldn't be using a lookbehind for this in the first place. It's much easier to just match the whole sequence in the normal way and extract the part you want to keep via a capturing group:
String sampleRegex = "lastStop=[\"']?([^\"']*)";
Pattern p = Pattern.compile(sampleRegex);
Matcher m = p.matcher(sampleText);
if (m.find()) {
String matchedText = m.group(1);
System.out.println("matched [" + matchedText + "]");
} else {
System.out.println("didn’t match");
}
It will also make it easier to deal with the problem @Kobi mentioned. You're trying to allow for values contained in double-quotes, single-quotes or no quotes, but your regex is too simplistic. For one thing, a quoted value can contain whitespace, but an unquoted one can't. To deal with all three possibilities, you'll need two or three capturing groups, not just one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With