Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting gettext strings from Javascript and HTML files (templates)

I know that there are at least two ways to extract gettext strings from .js files - using the gettext parser in python mode (which I've heard has some shortcomings) and Babel, which is written in python.

Is there way to extract gettext strings from HTML files - or to be more precise - Javascript templates (underscore, mustache, etc...). As far as I know, nor Babel or gettext do it.

A friend of mine tried to adapt Babel to do it, but it had some serious issues with missing some translations etc...

UPDATE: A friend directed me a bit with this so now it seems I can extract all the strings the way I want. The only thing I am missing is the "translator comments". The command I am using is this:

find . -iname '*.html' -o -iname '*.js' | xargs xgettext --language=Python --from-code=utf-8 --keyword=pgettext:1c,2 --keyword=npgettext:1c,2,3

This will include pgettext and npgettext in the keywords

UPDATE 2: I discovered that to extract gettext messages that are inside HTML tag attributes, I have to insert a line break between the JS part. For example, I had to convert this:

<a href="" title="<%= ST.i18n.gettext('Click to add another row') %>"></a>

Into this:

<a href="" title="
<%= ST.i18n.gettext('Click to add another row') %>"></a>

xgettext in Python mode will NOT extract the gettext message if it is on the same line. This is a quick hack that seems to work for me though.

UPDATE 3: It seems that xgettext in PHP mode extracts messages from HTML with no issues (at least with Undsrscore templates), and that also applies to the translator comments.

find ../app -iname '*.html' | xargs xgettext --language=PHP --from-code=utf-8 -c --keyword=gettext --keyword=ngettext:1,2 --keyword=pgettext:1c,2 --keyword=npgettext:1c,2,3 -o translations.po

This way, I can keep normal formatting in my template files:

<a href="" title="<%= ST.i18n.gettext('Click to add another row') %>"></a>
like image 776
ragulka Avatar asked Aug 19 '12 12:08

ragulka


1 Answers

Babel's message extraction is extensible, and you need to create dedicated extractors for new types.

You don't specify what 'serious issues' you (or your friend) found, so it's hard to help you here in more detail, but any issues with specific formats come down to faulty extraction code.

Babel supports loading extractors from eggs using entry_points, and as a result there is a large list of such extractors listed on PyPI (the linked search lists anything related to Babel, but a large number of those are extractors; there is no Trove classifier for Babel extractors yet). You can use additional PyPI searches for specific template systems to see if there are Babel extractors for those.

like image 80
Martijn Pieters Avatar answered Nov 12 '22 10:11

Martijn Pieters