In my "ViewController.swift", I have a localized string:
TheOutLabel.text = NSLocalizedString("hello", comment: "The \"hello\" word")
In the Terminal, to generate the "Localizable.strings" file, I typed:
cd Base.lproj/; genstrings ../*.swift; cat Localizable.strings
and got the following result:
??/* The \"hello\" word */
"hello" = "hello";
When typing od -c Localizable.strings
, I get:
0000000 377 376 / \0 * \0 \0 T \0 h \0 e \0 \0
0000020 \ \0 " \0 h \0 e \0 l \0 l \0 o \0 \ \0
0000040 " \0 \0 w \0 o \0 r \0 d \0 \0 * \0
0000060 / \0 \n \0 " \0 h \0 e \0 l \0 l \0 o \0
0000100 " \0 \0 = \0 \0 " \0 h \0 e \0 l \0
0000120 l \0 o \0 " \0 ; \0 \n \0 \n \0
When I type file Localizable.strings
, it says:
Localizable.strings: Little-endian UTF-16 Unicode c program text
When I open the file with "emacs", it does not display these characters, and when I type M-x describe-current-coding-system RET
, it says:
Coding system for saving this buffer:
U -- utf-16le-with-signature-unix (alias: utf-16-le-unix)
So, it seems that these octal characters \377 and \376 at the beginning of the file look like kind of a utf-16-le BOM, which explains why each character is followed by a \0 (UTF-16 is twice bigger than UTF-8 in this case).
Is this normal/useful/harmful?
Also, the standard *nix tools (grep
, sed
, awk
) don't handle nicely utf-16 files:
grep '=' Localizable.strings
Binary file Localizable.strings matches
grep -a '=' Localizable.strings | sed -e 's/ = //'
"hello" = "hello";
Also, I edited Localizable.strings to replace "hello";
by "Hello";
. Then "SourceTree" (my "git" client) is unable to display the difference unless I do, as proposed in Can I make git recognize a UTF-16 file as text?:
echo '*.strings diff=localizablestrings' > .../.git/../.gitattributes
echo '[diff "localizablestrings"]' >> .../.git/config
echo ' textconv = "iconv -f utf-16 -t utf-8"' >> .../.git/config
Apple's Internationalization and Localization Guide says:
Note: If Xcode warns you that the Localizable.strings file appears to be Unicode (UtF-16), you can convert it to Unicode (UTF-8) using the File inspector.
So, should I remove / ignore the BOM?
It seems there is no genstrings
option to generate an UTF-8 file.
Should I convert the file?
The genstrings
tool is hard-coded to output strings files in the encoding "UTF-16LE with BOM". I prefer to keep my strings files in UTF-8 and I use the following shell script to generate them:
#!/bin/zsh
function convert {
for file in $@; do
print "Converting $file to UTF-8"
iconv -f utf-16 -t utf-8 $file > temp
rm $file; mv temp $file
done
}
genstrings -o en.lproj *.m
convert en.lproj/*.strings
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With