Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a batch file stored in utf-8 to something that works via another batch file and run it

I have a program I use to create a batch file. My problem is that the program's output is UTF-8 so as soon as any diacritical marks like é,à,ö,Ä are in my batch file it fails. It seems I can't figure out a way to convert my output to anything but UTF-8 in the program that creates the batch file.

So I was thinking of creating two bach files. The actual one and another that converts the actual one from UTF-8 to ANSI (Windows Codepage 1252, or maybe cp 850) and then executes it after that. Of course I'd add a chcp xxxx as the first command of the actual batch file.

So my question is is there an alternative to iconv on Windows - or how does one convert a UTF-8 text file to a windows codepage using a second batch file. Is there anything built into Win XP and up that I could use or is there a free and redistributable tool I might use for this?

Note:

chcp 65001

does not work for batch files.

EDIT 1:

on windows XP I created two batch files to test the first answer.

1.bat encoded to UTF-8 without BOM contains:

chcp 1252
cd üöä

2.bat also encoded to UTF-8 without BOM - but without any special characters contains:

chcp 1252
type "1.bat" >"ansi_file.bat"

The resulting ansi_file.bat created when one executes 2.bat will still be utf-8 encoded and not ansi encoded.

EDIT 2:

The mentioned reverse process works.

chcp 1252
echo ü > ansi.txt
cmd /u /c type ansi.txt > unicode.txt

but neither of the following subsequent lines

cmd /a /c type unicode.txt > back2ansi.txt
type unicode.txt > back2ansi_v2.txt

gets me back to ANSI. I tried this both on Win XP and Win 7. Can anyone help?

NOTE:

I'm aware of how to use the Windows Script Host and VBS. I'd like to avoid depending on the script host though. The VBS method is detailed here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa368046%28v=vs.85%29.aspx

EDIT 3:

The text file created containing a unicode ü above is not utf-8

The Windows unicode file is HEX:

FC 00 20 00 0D 00 0A 00

UTF-8 without BOM would be HEX:

C3 BC 20 0D 0A

The VBS solution linked to only works with the unicode form but fails on the UTF-8 form. I need to convert UTF-8 to another code page so not even that one seems to work for me...

like image 663
C.O. Avatar asked Oct 29 '12 21:10

C.O.


1 Answers

You have stated you don't want to rely on the script host, but there is no native batch command that can do what you want. You are going to have to use something beyond pure batch. The script host is native to Windows, so I should think it would not be a problem.

The following UTF8toANSI.vbs script converts UTF-8 (with or without BOM) into ISO-8859-1, (basically the same as code page 1252). It is adapted from VB6/VbScsript change file / write file with encoding to ansii.

Option Explicit

Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0

Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
  Dim strText

  With CreateObject("ADODB.Stream")
    .Open
    .Type = adTypeBinary
    .LoadFromFile UTF8FName
    .Type = adTypeText
    .Charset = "utf-8"
    strText = .ReadText(adReadAll)
    .Position = 0
    .SetEOS
    .Charset = "iso-8859-1"
    .WriteText strText, adWriteChar
    .SaveToFile ANSIFName, adSaveCreateOverWrite
    .Close
  End With
End Sub

UTF8toANSI WScript.Arguments(0), WScript.Arguments(1)

The VBS script would need to be in your current directory or your path.

A batch script to convert and run your UTF8 encoded script could look something like this:

@echo off
UTF8toANSI "utf8.bat" "ansi.bat"
ansi.bat


Original Answer: below is my original answer that works for UTF-16 with BOM, but not for UTF-8

The output of internal commands is automatically converted to ANSI if output is piped or redirected to a file.

chcp 1252
type "utf_file.bat" >"ansi_file.bat"

The process can go in reverse if CMD is started with the /U option, but unfortunately the unicode header bytes will be missing. But of course that is a non-issue for your situation.

like image 148
dbenham Avatar answered Nov 14 '22 21:11

dbenham