Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove non-ascii chars using sed

I want to remove non-ascii chars from some file. I have already tried these many regexs.

sed -e 's/[\d00-\d128]//g'  # not working

cat /bin/mkdir | sed -e 's/[\x00-\x7F]//g' >/tmp/aa

but this file contains some non-ascii chars.

[root@asssdsada ~]$ hexdump /tmp/aa |more
          00 01 02 03 04 05 06 07 - 08 09 0A 0B 0C 0D 0E 0F  0123456789ABCDEF

00000000  45 4C 46 B0 F0 73 38 C0 - C0 BC BC FF FF 61 61 61  ELF..s8......aaa
00000010  A0 A0 50 E5 74 64 50 57 - 50 57 50 57 D4 D4 51 E5  ..P.tdPWPWPW..Q.
00000020  74 64 6C 69 62 36 34 6C - 64 6C 69 6E 75 78 78 38  tdlib64ldlinuxx8
00000030  36 36 34 73 6F 32 47 4E - 55 42 C8 C0 80 70 69 42  664so2GNUB...piB
00000040  44 47 BA E3 92 43 45 D5 - EC 46 E4 DE D8 71 58 B9  DG...CE..F...qX.
00000050  8D F1 EA D3 EF 4B 86 FC - A9 DA 79 ED 63 B5 51 92  .....K....y.c.Q.
00000060  BA 6C FC D1 69 78 30 ED - 74 F1 73 95 CC 85 D2 46  .l..ix0.t.s....F
00000070  A5 B4 6C 67 DA 4A E9 9A - 4B 58 77 A4 37 80 C0 4F  ..lg.J..KXw.7..O
00000080  F3 E9 B2 77 65 97 74 F9 - A2 C0 F2 CC 4A 9C 58 A1  ...we.t.....J.X.
like image 658
user87005 Avatar asked Feb 28 '13 10:02

user87005


People also ask

How do I get rid of non ASCII characters?

Bring up the command palette with CTRL+SHIFT+P (Windows, Linux) or CMD+SHIFT+P on Mac. Type Remove Non ASCII Chars until you see the commands. Select Remove non Ascii characters (File) for removing in the entire file, or Remove non Ascii characters (Select) for removing only in the selected text.

How do I select non ASCII characters in SQL?

ASCII character set is captured using regex [A-Za-z0-9]. You can use this regex in your query as shown below, to find non-ASCII characters. mysql> SELECT * FROM data WHERE full_name NOT REGEXP '[A-Za-z0-9]'; You can also customize regex to include certain characters.

How do I remove the ASCII character in Unix?

The Unix/Linux “tr” command The tr command is one of the true “filters” in the Unix operating system, because it works only on input/output streams, and not on files. The -d flag is what tells tr to delete the characters you supply.

How do I remove non ASCII characters from a string in Python?

encode() to remove non-ASCII characters. Call str. encode(encoding, errors) with encoding as "ASCII" and errors as "ignore" to return str without "ASCII" characters.


1 Answers

This doesn't seem to work with sed. Perhaps tr will do?

tr -d '\200-\377'

Or with the complement:

tr -cd '\000-\177'
like image 195
Thor Avatar answered Nov 16 '22 01:11

Thor