Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use sed delete Unicode in some range?

Tags:

regex

sed

unicode

I want to remove Unicode in some range, e.g.:

echo "abcABC123" | sed 's/[\uff21-\uff3b]//g'

expect "abc123", but get:

sed: -e expression #1, char 20: Invalid range end

or use:

echo "abcABC123" | sed 's/[A-Z]//g'

get:

sed: -e expression #1, char 14: Invalid collation character

like image 822
user2524314 Avatar asked Jan 09 '18 07:01

user2524314


1 Answers

Unicode support in sed is not well defined. You may be better off using command line perl:

echo "abcABC123" | perl -CS -pe 's/[\x{FF21}-\x{FF3B}]+//g'

abc123

It is important to use -CS flags here to be able to get correct UTF8 encodings for input/output/error.

like image 190
anubhava Avatar answered Oct 05 '22 21:10

anubhava