Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Unicode Line Separator "U+2028" in Bash

Tags:

bash

I have a text file with a unicode line separator (hex code 2028).

I want to remove it using bash (I see implementations for Python, but not for this language). What command could I use to transform the text file (output4.txt) to lose the unicode line separator?

See in vim below: enter image description here

like image 217
canadian_scholar Avatar asked May 14 '13 20:05

canadian_scholar


3 Answers

Probably this tr command should also work:

tr '\xE2\x80\xA8' ' ' < inFile > outFIle

Working solution: Thanks to OP for finding this:

sed -i.old $'s/\xE2\x80\xA8/ /g' inFile
like image 196
anubhava Avatar answered Oct 18 '22 07:10

anubhava


I noticed that in your screenshot, you have already opened file in vim, then why not just do the substitution in vim?

in vim you could do

:%s/(seebelow)//g

the (seebelow) part, you could type:

ctrl-vu2028

like image 3
Kent Avatar answered Oct 18 '22 08:10

Kent


You can probably use sed:

sed 's/\x20\x28//g' <file_in.txt >file_out.txt

To overwrite the original file:

sed -i 's/\x20\x28//g' file.txt

Edit: (See chepner's comment) You should make sure that you have the correct bytes, depending on the encoding, and then use sed to delete them. You could use e.g. od -t x1 for looking at the hex dump and figuring out the encoding.

like image 1
Sir Athos Avatar answered Oct 18 '22 06:10

Sir Athos