Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a file from ASCII to UTF-8?

I'm trying to transcode a bunch a files from ASCII to UTF-8.

For that, I tried using iconv:

iconv -f US-ASCII -t UTF-8 infile > outfile

-f ENCODING the encoding of the input

-t ENCODING the encoding of the output

Still that file didn't convert to UTF-8. It is a .dat file.

Before posting this, I searched Google and found information like:

ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded. The bytes in the ASCII file and the bytes that would result from "encoding it to UTF-8" would be exactly the same bytes. There's no difference between them.

Force encode from US-ASCII to UTF-8 (iconv)

Best way to convert text files between character sets?

Still the above links didn't help.

Even though it is in ASCII it will support UTF-8 as UTF-8 is a super set, the other party who is going to receive the files from me need file encoding as UTF-8. He just need file format as UTF-8.

Any suggestions please.

like image 847
Ram Avatar asked Feb 07 '15 08:02

Ram


1 Answers

I'm a little confused by the question, because, as you indicated, ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded.

If you're sending files containing only ASCII characters to the other party, but the other party is complaining that they're not 'UTF-8 Encoded', then I would guess that they're referring to the fact that the ASCII file has no byte order mark explicitly indicating the contents are UTF-8.

If that is indeed the case, then you can add a byte order mark using the answer here:

iconv: Converting from Windows ANSI to UTF-8 with BOM

If the other party indicates that he does not need the 'BOM' (Byte Order Mark), but is still complaining that the files are not UTF-8, then another possibility is that your initial file is not actually ASCII, but rather contains characters that are encoded using ANSI or ISO-8859-1.

Edited to add the following experiment, after comment from Ram regarding the other party looking for the type using the 'file' command

Tims-MacBook-Pro:~ tjohns$ echo 'Stuff' > deleteme
Tims-MacBook-Pro:~ tjohns$ cat deleteme
Stuff
Tims-MacBook-Pro:~ tjohns$ file -I deleteme
deleteme: text/plain; charset=us-ascii
Tims-MacBook-Pro:~ tjohns$ echo -ne '\xEF\xBB\xBF' > deleteme
Tims-MacBook-Pro:~ tjohns$ echo 'Stuff' >> deleteme
Tims-MacBook-Pro:~ tjohns$ cat deleteme
Stuff
Tims-MacBook-Pro:~ tjohns$ file -I deleteme
deleteme: text/plain; charset=utf-8
like image 190
Timothy Johns Avatar answered Sep 26 '22 19:09

Timothy Johns