Removing Unicode Line Separator "U+2028" in Bash

Question

I have a text file with a unicode line separator (hex code 2028).

I want to remove it using bash (I see implementations for Python, but not for this language). What command could I use to transform the text file (output4.txt) to lose the unicode line separator?

See in vim below: enter image description here

anubhava · Accepted Answer

Probably this tr command should also work:

tr '\xE2\x80\xA8' ' ' < inFile > outFIle

Working solution: Thanks to OP for finding this:

sed -i.old $'s/\xE2\x80\xA8/ /g' inFile

Kent · Answer

I noticed that in your screenshot, you have already opened file in vim, then why not just do the substitution in vim?

in vim you could do

:%s/(seebelow)//g

the (seebelow) part, you could type:

ctrl-vu2028

Sir Athos · Answer

You can probably use sed:

sed 's/\x20\x28//g' <file_in.txt >file_out.txt

To overwrite the original file:

sed -i 's/\x20\x28//g' file.txt

Edit: (See chepner's comment) You should make sure that you have the correct bytes, depending on the encoding, and then use sed to delete them. You could use e.g. od -t x1 for looking at the hex dump and figuring out the encoding.

Removing Unicode Line Separator "U+2028" in Bash

Tags:

bash

canadian_scholar

3 Answers

anubhava

Kent

Sir Athos

Recent Activity

Donate For Us

Removing Unicode Line Separator "U+2028" in Bash

Tags:

bash

canadian_scholar

3 Answers

anubhava

Kent

Sir Athos

Related questions

Recent Activity

Donate For Us