Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip/remove non-ascii character with sed

Tags:

sed

Chip,Dirkland,DrobæSphere Inc,[email protected],usa

I've been trying to use sed to modify email addresses in a .csv but the line above keeps tripping me up, using commands like:

sed -i 's/[\d128-\d255]//' FILENAME

from this stackoverflow question

doesn't seem to work as I get an 'invalid collation character' error.

Ideally I don't want to change that combined AE character at all, I'd rather sed just skip right over it as I'm not trying to manipulate that text but rather the email addresses. As long as that AE is in there though it causes my sed substitution to fail after one line, delete the character and it processes the whole file fine.

Any ideas?

like image 365
xref Avatar asked Dec 20 '11 06:12

xref


People also ask

How do I remove non-ASCII characters from a string?

Use . replace() method to replace the Non-ASCII characters with the empty string.

How do I remove an extended ASCII character in Unix?

The Unix/Linux “tr” command The tr command is one of the true “filters” in the Unix operating system, because it works only on input/output streams, and not on files. The -d flag is what tells tr to delete the characters you supply.

What is a non-ASCII character?

Non-ASCII characters are those that are not encoded in ASCII, such as Unicode, EBCDIC, etc. ASCII is limited to 128 characters and was initially developed for the English language.


1 Answers

This might work for you (GNU sed):

echo "Chip,Dirkland,DrobæSphere Inc,[email protected],usa" |
sed 's/\o346/a+e/g'
Chip,Dirkland,Droba+eSphere Inc,[email protected],usa

Then do what you have to do and after to revert do:

echo "Chip,Dirkland,Droba+eSphere Inc,[email protected],usa" | 
sed 's/a+e/\o346/g'
Chip,Dirkland,DrobæSphere Inc,[email protected],usa

If you have tricky characters in strings and want to understand how sed sees them use the l0 command (see here). Also very useful for debugging difficult regexps.

echo "Chip,Dirkland,DrobæSphere Inc,[email protected],usa" | 
sed -n 'l0'
Chip,Dirkland,Drob\346Sphere Inc,[email protected],usa$
like image 111
potong Avatar answered Sep 18 '22 12:09

potong