msgfmt “invalid multibyte sequence” error on a Polish text is corrected by manually editing the MIME Content-Type charset in the template file. Is there some command or option for the xgettext, msginit, msgfmt sequence for setting the MIME type?
cat >plt.cxx <<EOF
// plt.cxx
#include <libintl.h>
#include <locale.h>
#include <iostream>
int main (){
setlocale(LC_ALL, "");
bindtextdomain("plt", ".");
textdomain( "plt");
std::cout << gettext("Invalid input. Enter a string at least 20 characters long.") << std::endl;
}
EOF
g++ -o plt plt.cxx
xgettext --package-name plt --package-version 1.2 --default-domain plt --output plt.pot plt.cxx
sed --in-place plt.pot --expression='s/CHARSET/UTF-8/'
msginit --no-translator --locale pl_PL --output-file plt_polish.po --input plt.pot
sed --in-place plt_polish.po --expression='/#: /,$ s/""/"Nieprawidłowo wprowadzone dane. Wprowadź ciąg przynajmniej 20 znaków."/'
mkdir --parents ./pl_PL.utf8/LC_MESSAGES
msgfmt --check --verbose --output-file ./pl_PL.utf8/LC_MESSAGES/plt.mo plt_polish.po
LANGUAGE=pl_PL.utf8 ./plt
Just give full locale name and msginit will set charset correctly
msginit --no-translator --input=xx.pot --locale=ru_RU.UTF-8
results in
"Language: ru\n"
"Content-Type: text/plain; charset=UTF-8\n"
There is no argument for setting the output character encoding directly, but this should in pratice not be a problem, as your PO editor will automatically use an appropriate character encoding when saving the PO file (one that supports all the characters used in the translation), and replace CHARSET
in the file with the name of the encoding. If it doesn’t, file a bug.
The only problem would be if the POT file contained non-ASCII characters, but xgettext
does have a --from-code
argument for this, which specifies the encoding of the input files. If the input contains non-ASCII characters and --from-code
is set to the correct encoding, the output POT file will have the character encoding set to UTF-8 (this need not be equal to the input character encoding). However, if the input files only contain ASCII characters, --from-code=UTF-8
will unfortunately have no effect.
msginit
does in fact automatically set the character encoding to something ‘appropriate’ for the chosen target locale. However, the list of locale to character encoding pairs seems outdated; UTF-8 is now really the best choice for all languages.
An alternative would be to use pot2po
instead of msginit
. This always uses UTF-8 automatically, AFAICS. However, unlike msginit
, it does not automatically fill out the plural forms of the PO file, which may or may not be a problem (some think it is the job of the PO editor to do this).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With