Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find All Word between < and > with Regex

Tags:

java

regex

I want to find word between < and > from a String.

For example:

String str=your mobile number is <A> and username is <B> thanks <C>;

I want to get A, B, C from the String.

I have tried

import java.util.regex.*;

public class Main
{
  public static void main (String[] args)
  {
     String example = your mobile number is <A> and username is <B> thanks <C>;
     Matcher m = Pattern.compile("\\<([^)]+)\\>").matcher(example);
     while(m.find()) {
       System.out.println(m.group(1));    
     }
  }
}

What's wrong with what I am doing?

like image 689
Bhavesh Avatar asked May 08 '15 12:05

Bhavesh


People also ask

How do I find all words in a regular expression?

The regular expression \b[A]\w+ can be used to find all words in the text which start with A. The \b means to begin searching for matches at the beginning of words, the [A] means that these matches start with the letter A, and the \w+ means to match one or more word characters.

Which regex matches the whole words dog or cat?

If we want to improve the first example to match whole words only, we would need to use \b(cat|dog)\b. This tells the regex engine to find a word boundary, then either cat or dog, and then another word boundary.

What is a word boundary regex?

A word boundary, in most regex dialects, is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ( [0-9A-Za-z_] ). So, in the string "-12" , it would match before the 1 or after the 2.


3 Answers

Use the following idiom and back-reference to get the values for your A, B and C placeholders:

String example = "your mobile number is <A> and username is <B> thanks <C>";
//                           ┌ left delimiter - no need to escape here
//                           | ┌ group 1: 1+ of any character, reluctantly quantified
//                           | |   ┌ right delimiter
//                           | |   |
Matcher m = Pattern.compile("<(.+?)>").matcher(example);
while (m.find()) {
    System.out.println(m.group(1));
}

Output

A
B
C

Note

If you favor a solution with no indexed back-reference, and "look-arounds", you can achieve the same with the following code:

String example = "your mobile number is <A> and username is <B> thanks <C>";
//                            ┌ positive look-behind for left delimiter
//                            |    ┌ 1+ of any character, reluctantly quantified
//                            |    |   ┌ positive look-ahead for right delimiter
//                            |    |   |
Matcher m = Pattern.compile("(?<=<).+?(?=>)").matcher(example);
while (m.find()) {
    // no index for back-reference here, catching main group
    System.out.println(m.group());
}

I personally find the latter less readable in this instance.

like image 107
Mena Avatar answered Oct 04 '22 07:10

Mena


You need to use > or <> inside the negated character class. [^)]+ in your regex matches any charcater but not of ), one or more times. So this would match also the < or > symbols.

 Matcher m = Pattern.compile("<([^<>]+)>").matcher(example);
 while(m.find()) {
   System.out.println(m.group(1));
 }

OR

Use lookarounds.

 Matcher m = Pattern.compile("(?<=<)[^<>]*(?=>)").matcher(example);
 while(m.find()) {
   System.out.println(m.group());
 }
like image 35
Avinash Raj Avatar answered Oct 04 '22 09:10

Avinash Raj


Can you please try this?

public static void main(String[] args) {
        String example = "your mobile number is <A> and username is <B> thanks <C>";
        Matcher m = Pattern.compile("\\<(.+?)\\>").matcher(example);
        while(m.find()) {
            System.out.println(m.group(1));
        }
    }
like image 20
akhil_mittal Avatar answered Oct 04 '22 08:10

akhil_mittal