Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex expression to capture only words without numbers or symbols

Tags:

I need some regex that given the following string:

"test test3 t3st test: word%5 test! testing t[st"

will match only words in a-z chars:

Should match: test testing

Should not match: test3 t3st test: word%5 test! t[st

I have tried ([A-Za-z])\w+ but word%5 should not be a match.

like image 477
Digao Avatar asked Jul 21 '17 13:07

Digao


People also ask

How do you regex only words?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

How do I allow only letters and numbers in regex?

You can use regular expressions to achieve this task. In order to verify that the string only contains letters, numbers, underscores and dashes, we can use the following regex: "^[A-Za-z0-9_-]*$".

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.


2 Answers

You may use

String patt = "(?<!\\S)\\p{Alpha}+(?!\\S)";

See the regex demo.

It will match 1 or more letters that are enclosed with whitespace or start/end of string locations. Alternative pattern is either (?<!\S)[a-zA-Z]+(?!\S) (same as the one above) or (?<!\S)\p{L}+(?!\S) (if you want to also match all Unicode letters).

Details:

  • (?<!\\S) - a negative lookbehind that fails the match if there is a non-whitespace char immediately to the left of the current location
  • \\p{Alpha}+ - 1 or more ASCII letters (same as [a-zA-Z]+, but if you use a Pattern.UNICODE_CHARACTER_CLASS modifier flag, \p{Alpha} will be able to match Unicode letters)
  • (?!\\S) - a negative lookahead that fails the match if there is a non-whitespace char immediately to the right of the current location.

See a Java demo:

String s = "test test3 t3st test: word%5 test! testing t[st";
Pattern pattern = Pattern.compile("(?<!\\S)\\p{Alpha}+(?!\\S)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    System.out.println(matcher.group(0)); 
} 

Output: test and testing.

like image 126
Wiktor Stribiżew Avatar answered Oct 11 '22 12:10

Wiktor Stribiżew


Try this

Pattern tokenPattern = Pattern.compile("[\\p{L}]+");

[\\p{L}]+ this prints group of letters

like image 26
Rajani Avatar answered Oct 11 '22 14:10

Rajani