Chip,Dirkland,DrobæSphere Inc,[email protected],usa
I've been trying to use sed to modify email addresses in a .csv but the line above keeps tripping me up, using commands like:
sed -i 's/[\d128-\d255]//' FILENAME
from this stackoverflow question
doesn't seem to work as I get an 'invalid collation character' error.
Ideally I don't want to change that combined AE character at all, I'd rather sed just skip right over it as I'm not trying to manipulate that text but rather the email addresses. As long as that AE is in there though it causes my sed substitution to fail after one line, delete the character and it processes the whole file fine.
Any ideas?
Use . replace() method to replace the Non-ASCII characters with the empty string.
The Unix/Linux “tr” command The tr command is one of the true “filters” in the Unix operating system, because it works only on input/output streams, and not on files. The -d flag is what tells tr to delete the characters you supply.
Non-ASCII characters are those that are not encoded in ASCII, such as Unicode, EBCDIC, etc. ASCII is limited to 128 characters and was initially developed for the English language.
This might work for you (GNU sed):
echo "Chip,Dirkland,DrobæSphere Inc,[email protected],usa" |
sed 's/\o346/a+e/g'
Chip,Dirkland,Droba+eSphere Inc,[email protected],usa
Then do what you have to do and after to revert do:
echo "Chip,Dirkland,Droba+eSphere Inc,[email protected],usa" |
sed 's/a+e/\o346/g'
Chip,Dirkland,DrobæSphere Inc,[email protected],usa
If you have tricky characters in strings and want to understand how sed
sees them use the l0
command (see here). Also very useful for debugging difficult regexps.
echo "Chip,Dirkland,DrobæSphere Inc,[email protected],usa" |
sed -n 'l0'
Chip,Dirkland,Drob\346Sphere Inc,[email protected],usa$
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With