Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

removing multibyte characters from a file using sed

Tags:

sed

multibyte

i need to remove all multibyte characters from a file, i dont know what they are so i need to cover the whole range.

I can find them using grep like so: grep -P "[\x80-\xFF]" 'myfile'

Trying to do a simular thing with sed, but delete them instead.

Cheers

like image 958
odtf Avatar asked Aug 19 '10 11:08

odtf


People also ask

How do you remove something from SED?

Deleting line using sed To delete a line, we'll use the sed “d” command. Note that you have to declare which line to delete. Otherwise, sed will delete all the lines.

What is multibyte char?

Each byte sequence represents a single character in the extended character set. Multibyte characters are used in character sets such as Kanji. Wide characters are multilingual character codes that are always 16 bits wide. The type for character constants is char ; for wide characters, the type is wchar_t .


1 Answers

Give this a try:

LANG=C sed 's/[\x80-\xFF]//g' filename
like image 98
Dennis Williamson Avatar answered Oct 23 '22 05:10

Dennis Williamson