Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to grab all words that start with capital letters?

Tags:

java

regex

I want to create a Java regular expression to grab all words that start with a capital letter then capital or small letters, but those letters may contain accents.

Examples :

Where

Àdónde

Rápido

Àste

Can you please help me with that ?

like image 756
Brad Avatar asked Jan 19 '23 15:01

Brad


2 Answers

Regex:

\b\p{Lu}\p{L}*\b

Java string:

"(?U)\\b\\p{Lu}\\p{L}*\\b"

Explanation:

\b      # Match at a word boundary (start of word)
\p{Lu}  # Match an uppercase letter
\p{L}*  # Match any number of letters (any case)
\b      # Match at a word boundary (end of word)

Caveat: This only works correctly in very recent Java versions (JDK7); for others you may need to substitute a longer sub-regex for \b. As you can see here, you may need to use (kudos to @tchrist)

(?:(?<=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])(?![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])|(?<![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])(?=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]]))

for \b, so the Java string would look like this:

"(?:(?<=[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])(?![\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])|(?<![\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])(?=[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\]))\\p{Lu}\\p{L}*(?:(?<=[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])(?![\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])|(?<![\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])(?=[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\]))"
like image 154
Tim Pietzcker Avatar answered Jan 25 '23 23:01

Tim Pietzcker


Code for to detect the Capital Letters in a given para. in this case input given as Console Input.

import java.io.*;
import java.util.regex.*;
import java.util.Scanner;

public class problem9 {

    public static void main(String[] args) {
    String line1;
    Scanner in = new Scanner(System.in);
    String pattern = "(?U)\\b\\p{Lu}\\p{L}*\\b";

    line1 = in.nextLine();
    String delimiter = "\\s";   
    String[] words1 = line1.split(delimiter);

    for(int i=0; i<words1.length;i++){
        if(words1[i].matches(pattern)){
        System.out.println(words1[i]);
        }    
    }

  }
 }

If you give the Input something like

Input:This is my First Program

output:

This

First

Program

like image 39
agiles Avatar answered Jan 25 '23 23:01

agiles