Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: scanning string for a pattern

This is probably a quicky. Why does this code not return anything?

import java.util.Scanner;

public class MainClass {

public static void main(String[] args) {
    try {

        Scanner sc = new Scanner("asda ASA adad");
        String pattern = "[A-Z]+";

        while ((sc.hasNext(pattern))) {

            System.out.println(sc.next(pattern));
        }
        sc.close();
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}
like image 522
Markos Fragkakis Avatar asked Mar 05 '10 22:03

Markos Fragkakis


People also ask

What is Nextstring () in Java?

next(String pattern) method returns the next token if it matches the pattern constructed from the specified string. If the match is successful, the scanner advances past the input that matched the pattern.

What is hasNext () in Java?

The hasNext() method checks if the Scanner has another token in its input. A Scanner breaks its input into tokens using a delimiter pattern, which matches whitespace by default. That is, hasNext() checks the input and returns true if it has another non-whitespace character.

What does nextInt () do in Java?

The nextInt() method scans the next token of the input data as an “int”. As the name of the class Scanner elaborates, nextInt() method of this class is used to scan or parse the input. The input can be stored either as String, read from a file, real-time data or any System input by the user.


1 Answers

hasNext(String pattern) only returns true if the next token matches the pattern. In your case, "asda" is the next token, and that does NOT match "[A-Z]+". The documentation is clear in that "[the] scanner does not advance past any input".

If you change the pattern to "[A-Za-z]+", then you'd get three tokens, which may be what you intended.

If in fact you only want to get tokens that match "[A-Z]+", then you can do any of the following:

  • simply discard non-matching tokens
  • useDelimiter("[^A-Z]+"), then simply invoke next()
  • use skip("[^A-Z]+")
  • use findInLine("[A-Z]+")

Tip: if performance is critical, you'd want to use the precompiled Pattern overloads of these methods.

Tip: do keep in mind that"Xooo ABC" has two "[A-Z]+" matches. If this is not what you want, then the regex will have to be a bit more complicated. Or you can always simply discard non-matching tokens.

like image 148
polygenelubricants Avatar answered Nov 12 '22 16:11

polygenelubricants