I have a bunch of Arabic, English, Russian files which are encoded in utf-8. Trying to process these files using a Perl script, I get this error: <pre class="prettyprint"><code>Malformed UTF-8 character (fatal) </code></pre> Manually checking the content of these files, I found some strange characters in them. Now I'm looking for a way to automatically remove these characters from the files. Is there anyway to do it?

This command: <pre class="prettyprint"><code>iconv -f utf-8 -t utf-8 -c file.txt </code></pre> will clean up your UTF-8 file, skipping all the invalid characters. <pre class="prettyprint"><code>-f is the source format -t the target format -c skips any invalid sequence </code></pre>

How to remove non UTF-8 characters from text file

Tags:

linux

text

bash

character-encoding

utf-8

I have a bunch of Arabic, English, Russian files which are encoded in utf-8. Trying to process these files using a Perl script, I get this error:

Malformed UTF-8 character (fatal)

Manually checking the content of these files, I found some strange characters in them. Now I'm looking for a way to automatically remove these characters from the files.

Is there anyway to do it?

200

asked Oct 21 '12 16:10

Hakim

1 Answers

This command:

iconv -f utf-8 -t utf-8 -c file.txt

will clean up your UTF-8 file, skipping all the invalid characters.

-f is the source format -t the target format -c skips any invalid sequence

answered Oct 16 '22 23:10

Palantir

Related questions
                            
                                Remove empty lines in a text file via grep
                            
                                How to remove all white spaces from a given text file
                            
                                Installing OpenSSH on the Alpine Docker Container
                            
                                Take a screenshot via a Python script on Linux
                            
                                How to count lines of code including sub-directories [duplicate]
                            
                                Why do people use tarballs?
                            
                                Use of floating point in the Linux kernel
                            
                                Compiling multithread code with g++
                            
                                How to view symbols in object files?
                            
                                What command do I use to see what the ECDSA key fingerprint of my server is?
                            
                                Ubuntu - Run command on start-up with "sudo"
                            
                                How to run a script at a certain time on Linux? [closed]
                            
                                setup cron tab to specific time of during weekdays
                            
                                Linux Shell Script For Each File in a Directory Grab the filename and execute a program
                            
                                cut or awk command to print first field of first row
                            
                                How to get child process from parent process
                            
                                Running jmap getting Unable to open socket file
                            
                                Can't clone a github repo on Linux via HTTPS
                            
                                dup2 / dup - why would I need to duplicate a file descriptor?
                            
                                How can I determine whether a specific file is open in Windows? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With