Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-16 to UTF-8 conversion (for scripting in Windows)

what is the best way to convert a UTF-16 files to UTF-8? I need to use this in a cmd script.

like image 687
Grzenio Avatar asked Nov 05 '08 14:11

Grzenio


People also ask

How do you convert UTF-16 to UTF-8?

You can explicitly convert UTF-16 data to UTF-8 data and UTF-8 data to alphanumeric data by using the DISPLAY-OF intrinsic function. You can also explicitly convert UTF-8 data to UTF-16 data by using the NATIONAL-OF intrinsic function.

Can Windows read UTF-8?

On Windows, the native encoding cannot be UTF-8 nor any other that could represent all Unicode characters. Windows sometimes replaces characters by similarly looking representable ones (“best-fit”), which often works well but sometimes has surprising results, e.g. alpha character becomes letter a.


4 Answers

You can do this easily with built-in PowerShell cmdlets, which you can invoke from cmd:

C:\> powershell -c "Get-Content mytext.txt | Set-Content -Encoding utf8 mytext_utf8.txt"

Edit: obviously if you're already in powershell, this would be simplified. Using aliases would also simplify things:

> gc mytext.txt | sc -Encoding utf8 mytext_utf8.txt
like image 88
Ben Collins Avatar answered Oct 15 '22 11:10

Ben Collins


There is a GNU tool recode which you can also use on Windows. E.g.

recode utf16..utf8 text.txt
like image 23
Kaarel Avatar answered Oct 18 '22 03:10

Kaarel


An alternative to Ruby would be to write a small .NET program in C# (.NET 1.0 would be fine, although 2.0 would be simpler :) - it's a pretty trivial bit of code. Were you hoping to do it without any other applications at all? If you want a bit of code to do it, add a comment and I'll fill in the answer...

EDIT: Okay, this is without any kind of error checking, but...

using System;
using System.IO;
using System.Text;

class FileConverter
{
  static void Main(string[] args)
  {
    string inputFile = args[0];
    string outputFile = args[1];
    using (StreamReader reader = new StreamReader(inputFile, Encoding.Unicode))
    {
      using (StreamWriter writer = new StreamWriter(outputFile, false, Encoding.UTF8))
      {
        CopyContents(reader, writer);
      }
    }
  }

  static void CopyContents(TextReader input, TextWriter output)
  {
    char[] buffer = new char[8192];
    int len;
    while ((len = input.Read(buffer, 0, buffer.Length)) != 0)
    {
      output.Write(buffer, 0, len);
    }
  }
}
like image 15
Jon Skeet Avatar answered Oct 18 '22 01:10

Jon Skeet


Certainly, the easiest way is to load the script into notepad, then save it again with the UTF-8 encoding. It's an option in the Save As dialog box..

like image 10
Tor Haugen Avatar answered Oct 18 '22 01:10

Tor Haugen