Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep for lines containing "standard" US characters only

Tags:

regex

grep

I'm trying to figure out how to grep for lines that are made up of A-Z and a-z exclusively, that is, the "American" alphabet of letters. I would expect this to work, but it does not:

$ echo -e "Jutland\nJastrząb" | grep -x '[A-Za-z]*'
Jutland
Jastrząb

I want this to only print "Jutland", because ą is not a letter in the American alphabet. How can I achieve this?

like image 786
pzkpfw Avatar asked Aug 30 '25 18:08

pzkpfw


2 Answers

You can use perl regex:

$ echo -e "Jutland\nJastrząb" | grep -P '^[[:ascii:]]+$'
Jutland

It's experimental though:

-P, --perl-regexp
      Interpret  the  pattern as a Perl-compatible regular expression (PCRE).  This is experimental and
      grep -P may warn of unimplemented features.

EDIT

For letters only, use [A-Za-z]:

$ echo -e "L'Egyptienne\nJutland\nJastrząb" | grep -P '^[A-Za-z]+$'
Jutland

like image 82
mrzasa Avatar answered Sep 02 '25 14:09

mrzasa


You need to add LC_ALL=C before grep:

printf '%b\n' "Jutland\nJastrząb" | LC_ALL=C grep -x '[A-Za-z]*'

Jutland

You may also use -i switch to ignore case and reduce regex:

printf '%b\n' "Jutland\nJastrząb" | LC_ALL=C grep -ix '[a-z]*'

LC_ALL=C avoids locale-dependent effects otherwise your current LOCALE treats ą as [a-zA-Z].

like image 33
anubhava Avatar answered Sep 02 '25 16:09

anubhava