Character encoding of Localizable.strings, generated by genstrings

Question

In my "ViewController.swift", I have a localized string:

TheOutLabel.text = NSLocalizedString("hello", comment: "The \"hello\" word")

In the Terminal, to generate the "Localizable.strings" file, I typed:

cd Base.lproj/; genstrings ../*.swift; cat Localizable.strings

and got the following result:

??/* The \"hello\" word */
"hello" = "hello";

When typing od -c Localizable.strings, I get:

0000000  377 376   /  \0   *  \0      \0   T  \0   h  \0   e  \0      \0
0000020    \  \0   "  \0   h  \0   e  \0   l  \0   l  \0   o  \0   \  \0
0000040    "  \0      \0   w  \0   o  \0   r  \0   d  \0      \0   *  \0
0000060    /  \0  
  \0   "  \0   h  \0   e  \0   l  \0   l  \0   o  \0
0000100    "  \0      \0   =  \0      \0   "  \0   h  \0   e  \0   l  \0
0000120    l  \0   o  \0   "  \0   ;  \0  
  \0  
  \0

When I type file Localizable.strings, it says:

Localizable.strings: Little-endian UTF-16 Unicode c program text

When I open the file with "emacs", it does not display these characters, and when I type M-x describe-current-coding-system RET, it says:

Coding system for saving this buffer:
  U -- utf-16le-with-signature-unix (alias: utf-16-le-unix)

So, it seems that these octal characters \377 and \376 at the beginning of the file look like kind of a utf-16-le BOM, which explains why each character is followed by a \0 (UTF-16 is twice bigger than UTF-8 in this case).

Is this normal/useful/harmful?

Also, the standard *nix tools (grep, sed, awk) don't handle nicely utf-16 files:

grep '=' Localizable.strings 
Binary file Localizable.strings matches

grep -a '=' Localizable.strings | sed -e 's/ = //'
"hello" = "hello";

Also, I edited Localizable.strings to replace "hello"; by "Hello";. Then "SourceTree" (my "git" client) is unable to display the difference unless I do, as proposed in Can I make git recognize a UTF-16 file as text?:

echo '*.strings diff=localizablestrings' > .../.git/../.gitattributes
echo '[diff "localizablestrings"]' >> .../.git/config
echo '  textconv = "iconv -f utf-16 -t utf-8"' >> .../.git/config

Apple's Internationalization and Localization Guide says:

Note: If Xcode warns you that the Localizable.strings file appears to be Unicode (UtF-16), you can convert it to Unicode (UTF-8) using the File inspector.

So, should I remove / ignore the BOM?

It seems there is no genstrings option to generate an UTF-8 file.

Should I convert the file?

Nick Moore · Accepted Answer

The genstrings tool is hard-coded to output strings files in the encoding "UTF-16LE with BOM". I prefer to keep my strings files in UTF-8 and I use the following shell script to generate them:

#!/bin/zsh
function convert {
    for file in $@; do
        print "Converting $file to UTF-8"
        iconv -f utf-16 -t utf-8 $file > temp   
        rm $file; mv temp $file
    done
}
genstrings -o en.lproj *.m
convert en.lproj/*.strings

Character encoding of Localizable.strings, generated by genstrings

Tags:

character-encoding

ios

swift

unicode

genstrings

duthen

1 Answers

Nick Moore

Recent Activity

Donate For Us

Character encoding of Localizable.strings, generated by genstrings

Tags:

character-encoding

ios

swift

unicode

genstrings

duthen

1 Answers

Nick Moore

Related questions

Recent Activity

Donate For Us