Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Ruby to replace text in a VC++ resource file, when the encoding is all wacked out?

I have a plain managed VC++ project in a solution. It has a resource file, app.rc, that is used to store the assembly info (version, product, copyright, etc). If I open the file in my text editor, it says it's a Unicode (UTF-16 LE BOM). And Visual Studio only displays it correctly if I choose Unicode - Codepage 1200.

VS_VERSION_INFO VERSIONINFO
 FILEVERSION 0,0,0,0
 PRODUCTVERSION 0,0,0,0
 FILEFLAGSMASK 0x3fL
#ifdef _DEBUG
 FILEFLAGS 0x1L
#else
 FILEFLAGS 0x0L
#endif
 FILEOS 0x40004L
 FILETYPE 0x2L
 FILESUBTYPE 0x0L
BEGIN
    BLOCK "StringFileInfo"
    BEGIN
        BLOCK "040904b0"
        BEGIN
            VALUE "CompanyName", "My Company, LLC"
            VALUE "FileVersion", "0.0.0.0"
            VALUE "InternalName", "myassembly.dll"
            VALUE "LegalCopyright", "Copyright (C) 2011 My Company, LLC"
            VALUE "OriginalFilename", "myassembly.dll"
            VALUE "ProductName", "The Product"
            VALUE "ProductVersion", "0.0.0.0"
        END
    END
    BLOCK "VarFileInfo"
    BEGIN
        VALUE "Translation", 0x409, 1200
    END
END

I am using Ruby/Rake to configure and run local builds. Part of that build is replacing the version in the assembly resources. I would read in the file, gsub the version numbers, and write the file back out. However, when I read in the file, I get garbage

irb(main):001:0> File.open("source\\myproject\\app.rc").read
=> "\xFF\xFE/\x00/\x00 \x00M\x00i\x00c\x00r\x00o\x00s\x00o\x00f\x00t\x00 \x00V\x00i\x00s\x
00u\x00a\x00l\x00 \x00C\x00+\x00+\x00 \x00g\x00e\x00n\x00e\x00r\x00a\x00t\x00e\x00d\x00 \x
00r\x00e\x00s\x00o\x00u\x00r\x00c\x00e\x00 \x00s\x00c\x00r\x00i\x00p\x00t\x00.\x00\n\x00\n
\x00/\x00/\x00\n\x00\n\x00#\x00i\x00n\x00c\x00l\x00u\x00d\x00e\x00 \x00\"\x00r\x00e\x00s\x

Ruby thinks it's an IBM437 encoding.

irb(main):006:0> File.open("source\\myproject\\app.rc").external_encoding
=> #<Encoding:IBM437>

And if I write the contents out to another file, I get some strange encoding (my text editor swears it's Unicode (UTF-16 BE BOM)).

䴀椀挀爀漀猀漀昀琀 嘀椀猀甀愀氀 䌀⬀⬀ 最攀渀攀爀愀琀攀搀 爀攀猀漀甀爀挀攀 猀挀爀椀瀀琀⸀ഊ
਀⼀⼀ഊ
਀⌀椀渀挀氀甀搀攀 ∀爀攀猀漀甀爀挀攀⸀栀∀ഊ
਀ഊ

How am I supposed to open/read this file properly?


Ok, so I can read the file properly now if I provide the correct encodings.

irb(main):003:0> File.open("source\\myproject\\app.rc", "rb:UTF-16LE")
=> #<File:source\myproject\app.rc>

However, I cannot substitute the version strings out yet.

irb(main):003:0> c = File.open("source\\myproject\\app.rc", "rb:UTF-16LE").read
irb(main):007:0> c.gsub("0.0.0.0", "0.0.5.0")
Encoding::CompatibilityError: incompatible encoding regexp match (US-ASCII regexp with UTF-16LE string)

But, I can't provide an encoding to gsub. Do I have to encode the pattern and replacement strings as well?

like image 297
Anthony Mastrean Avatar asked Mar 09 '12 14:03

Anthony Mastrean


1 Answers

Reading and Writing the File

So, first thing I tried was looking for how to read/write UTF-16LE files in Ruby. I found this question and answer, which recommends always opening files in Text file mode (t) on Windows.

When dealing with text files, you should always pass the t modifier. It doesn't make any difference on most operating systems (which is why, unfortunately, most Rubyists forget to pass it), but it is crucial on Windows, which is what you appear to be using.

So, I did that

irb(main):002:0> File.open("source\\myproject\\app.rc", "rt:UTF-16LE")
ArgumentError: ASCII incompatible encoding needs binmode

I don't know what binmode is, but it might have something to do with the Binary file mode (b). So, let's try that instead.

irb(main):003:0> File.open("source\\myproject\\app.rc", "rb:UTF-16LE")
=> #<File:source\myproject\app.rc>

Eureka! However, I still see some crazy control characters and other unprintables (\n).

\r\n//\r\n\r\nVS_VERSION_INFO VERSIONINFO\r\n FILEVERSION 0,0,0,0\r\n PRODUCTVERSION 0,0,0
,0\r\n FILEFLAGSMASK 0x3fL\r\n#ifdef _DEBUG\r\n FILEFLAGS 0x1L\r\n#else\r\n FILEFLAGS 0x0L
\r\n#endif\r\n FILEOS 0x40004L\r\n FILETYPE 0x2L\r\n FILESUBTYPE 0x0L\r\nBEGIN\r\n    BLOC
K \"StringFileInfo\"\r\n    BEGIN\r\n        BLOCK \"040904b0\"\r\n        BEGIN\r\n

Replacing the Strings

So, you'll notice that doing a simple gsub like this produces an encoding error.

irb(main):004:0> c.gsub("0.0.0.0","0.0.5.0")
Encoding::CompatibilityError: incompatible encoding regexp match (US-ASCII regexp with UTF-16LE string)

If you read the docs, gsub's first argument is turned into a Regexp, which is shown to be encode-able! So, let's try that...

irb(main):005:0> c.gsub("0.0.0.0".encode("UTF-16LE"),"0.0.5.0".encode("UTF-16LE"))
=> myproduct.dll\"\r\n            VALUE \"ProductName\", \"My Product\"\r\n            VALU
E \"ProductVersion\", \"0.0.5.0\"\r\n        END\r\n    END\r\n    BLOCK \"VarFileInfo\"\r
\n    BEGIN\r\n        VALUE \"Translation\", 0x409, 1200\r\n    END\r\nEND\r\n\r\n#endif

You can see some of the replacements working in the snippet I provided.

like image 120
Anthony Mastrean Avatar answered Nov 17 '22 08:11

Anthony Mastrean