Replace fullwidth punctuation characters with normal width equivalents [duplicate]

Question

file1 contains some ：s (that's fullwidth) I'd like to turn into regular :s (that's our regular colon). How do I do this in bash? Perhaps a python script?

tchrist · Accepted Answer

With all due respect, python isn’t the right tool for this job; perl is:

perl -CSAD -i.orig -pe 'tr[：][:]' file1

or

perl -CSAD -i.orig -pe 'tr[\x{FF1A}][:]' file1

or

perl -CSAD -i.orig -Mcharnames=:full -pe 'tr[\N{FULLWIDTH COLON}][:]' file1

or

perl -CSAD -i.orig -Mcharnames=:full -pe 'tr[\N{FULLWIDTH EXCLAMATION MARK}\N{FULLWIDTH QUOTATION MARK}\{FULLWIDTH NUMBER SIGN}\N{FULLWIDTH DOLLAR SIGN}\N{FULLWIDTH PERCENT SIGN}\N{FULLWIDTH AMPERSAND}\{FULLWIDTH APOSTROPHE}\N{FULLWIDTH LEFT PARENTHESIS}\N{FULLWIDTH RIGHT PARENTHESIS}\N{FULLWIDTH ASTERISK}\N{FULLWIDTH PLUS SIGN}\N{FULLWIDTH COMMA}\N{FULLWIDTH HYPHEN-MINUS}\N{FULLWIDTH FULL STOP}\N{FULLWIDTH SOLIDUS}][\N{EXCLAMATION MARK}\N{QUOTATION MARK}\N{NUMBER SIGN}\N{DOLLAR SIGN}\N{PERCENT SIGN}\{AMPERSAND}\N{APOSTROPHE}\N{LEFT PARENTHESIS}\N{RIGHT PARENTHESIS}\N{ASTERISK}\N{PLUS SIGN}\N{COMMA}\{HYPHEN-MINUS}\N{FULL STOP}\N{SOLIDUS}]' file1

ajk · Answer

I'd agree that Python is not the most effective tool for this purpose. While the options presented so far are good, sed is another good tool to have around:

sed -i 's/\xEF\xBC\x9A/:/g' file.txt

The -i option causes sed to edit the file in place, as in tchrist's perl example. Note that \xEF\xBC\x9A is the UTF-8 equivalent of the UTF-16 value \xFF1A. This page is a useful reference in case you need to deal with different encodings of the same Unicode value.

Replace fullwidth punctuation characters with normal width equivalents [duplicate]

Tags:

python

bash

unicode

Mike

2 Answers

tchrist

ajk

Recent Activity

Donate For Us

Replace fullwidth punctuation characters with normal width equivalents [duplicate]

Tags:

python

bash

unicode

Mike

2 Answers

tchrist

ajk

Related questions

Recent Activity

Donate For Us