Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regular expression for French names

Tags:

java

regex

I need to modify regular expression to allow all standard characters, French characters, spaces AND dash (hyphen) but only one at a time.

What I have right now is:

import java.util.regex.Pattern;

public class FrenchRegEx {

    static final String NAME_PATTERN = "[\u00C0-\u017Fa-zA-Z-' ]+";

    public static void main(String[] args) {

        String name;

        //name = "Jean Luc"; // allowed
        //name = "Jean-Luc"; // allowed
        //name = "Jean-Luc-Marie"; // allowed
        name = "Jean--Luc"; // NOT allowed

        if (!Pattern.matches(NAME_PATTERN, name)) {
            System.out.println("ERROR!");
        } else System.out.println("OK!");
    }
}

and it allows 'Jean--Luc' as a name and that is not allowed.

Any help with this? Thanks.

like image 649
Nenad Bulatović Avatar asked May 17 '26 16:05

Nenad Bulatović


2 Answers

So, you want a pattern which is a 0 or more hyphens, separated by 1 or more other characters. It's just a matter of writing the pattern that way:

"[\u00C0-\u017Fa-zA-Z']+([- ][\u00C0-\u017Fa-zA-Z']+)*"

This also assumes you don't want names to start or end with a hyphen or space, nor that you want more than one space in a row, and that you also want to disallow a space to follow or proceed a hyphen.

You need to disallow consecutive hyphens. You may do it with a negative lookahead:

static final String NAME_PATTERN = "(?!.*--)[\u00C0-\u017Fa-zA-Z-' ]+";
                                    ^^^^^^^^

To disallow any of the special chars to be consecutive, use

static final String NAME_PATTERN = "(?!.*([-' ])\\1)[\u00C0-\u017Fa-zA-Z-' ]+";

Another way is to unroll the pattern a bit to match strings where the special char(s) can appear in between letters, but cannot appear consecutively (i.e. if you need to match Abc-def'here like strings):

static final String NAME_PATTERN = "[\u00C0-\u017Fa-zA-Z]+(?:[-' ][\u00C0-\u017Fa-zA-Z]+)*";

or to only allow 1 special char that can only appear in between letters (i.e. if you nee to only allow strings like abc-def, or abc'def):

static final String NAME_PATTERN = "[\u00C0-\u017Fa-zA-Z]+(?:[-' ][\u00C0-\u017Fa-zA-Z]+)?";

Note that you do not need anchors here because you are using the pattern inside a .matches() method that requires a full string match.

NOTE: you may further tune the patterns by moving special chars that may appear anywhere in the string from the [-' ] character class to the [\u00C0-\u017Fa-zA-Z] character classes, like [\u00C0-\u017Fa-zA-Z], but watch out for -. It should be placed at the end, near ].

like image 37
Wiktor Stribiżew Avatar answered May 19 '26 06:05

Wiktor Stribiżew