Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to read string part in java

Tags:

java

regex

I have this string :

<meis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" uri="localhost/naro-nei" onded="flpSW531213" identi="lemenia" id="75" lastStop="bendi" xsi:noNamespaceSchemaLocation="http://localhost/xsd/postat.xsd xsd/postat.xsd">

How can I get lastStop property value in JAVA?

This regex worked when tested on http://www.myregexp.com/

But when I try it in java I don't see the matched text, here is how I tried :

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class SimpleRegexTest {
    public static void main(String[] args) {
        String sampleText = "<meis xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" uri=\"localhost/naro-nei\" onded=\"flpSW531213\" identi=\"lemenia\" id=\"75\" lastStop=\"bendi\" xsi:noNamespaceSchemaLocation=\"http://localhost/xsd/postat.xsd xsd/postat.xsd\">";
        String sampleRegex = "(?<=lastStop=[\"']?)[^\"']*";
        Pattern p = Pattern.compile(sampleRegex);
        Matcher m = p.matcher(sampleText);
        if (m.find()) {
            String matchedText = m.group();
            System.out.println("matched [" + matchedText + "]");
        } else {
            System.out.println("didn’t match");
        }
    }
}

Maybe the problem is that I use escape char in my test , but real string doesn't have escape inside. ?

UPDATE

Does anyone know why this doesn't work when used in java ? or how to make it work?

like image 575
Gandalf StormCrow Avatar asked Apr 14 '10 08:04

Gandalf StormCrow


People also ask

How extract part of string from string in Java?

You can extract a substring from a String using the substring() method of the String class to this method you need to pass the start and end indexes of the required substring.

How do I retrieve part of a string?

The substr() method extracts a part of a string. The substr() method begins at a specified position, and returns a specified number of characters. The substr() method does not change the original string. To extract characters from the end of the string, use a negative start position.

How do you read a string in Java?

Ways to take string input in Java:Using BufferedReader class readLine() method. Using Scanner class nextLine() method. Through Scanner class next() method. Using Command-line arguments of the main() method.

How does substring () inside string works?

Public String substring(int startIndex, int endIndex):This method is used to return a new String object that includes a substring of the given string with their indexes lying between startIndex and endIndex. If the second argument is given, the substring begins with the element at the startIndex to endIndex -1.


2 Answers

(?<=lastStop=[\"']?)[^\"]+
like image 150
Hun1Ahpu Avatar answered Oct 13 '22 20:10

Hun1Ahpu


The reason it doesn't work as you expect is because of the * in [^\"']*. The lookbehind is matching at the position before the " in lastStop=", which is permitted because the quote is optional: [\"']?. The next part is supposed to match zero or more non-quote characters, but because the next character is a quote, it matches zero characters.

If you change that * to a +, the second part will fail to match at that position, forcing the regex engine to bump ahead one more position. The lookbehind will match the quote, and [^\"']+ will match what follows. However, you really shouldn't be using a lookbehind for this in the first place. It's much easier to just match the whole sequence in the normal way and extract the part you want to keep via a capturing group:

String sampleRegex = "lastStop=[\"']?([^\"']*)";
Pattern p = Pattern.compile(sampleRegex);
Matcher m = p.matcher(sampleText);
if (m.find()) {
    String matchedText = m.group(1);
    System.out.println("matched [" + matchedText + "]");
} else {
    System.out.println("didn’t match");
}

It will also make it easier to deal with the problem @Kobi mentioned. You're trying to allow for values contained in double-quotes, single-quotes or no quotes, but your regex is too simplistic. For one thing, a quoted value can contain whitespace, but an unquoted one can't. To deal with all three possibilities, you'll need two or three capturing groups, not just one.

like image 2
Alan Moore Avatar answered Oct 13 '22 19:10

Alan Moore