Translation tooling

From Creative Commons
Revision as of 08:42, 27 November 2016 by CCID-georgeharipopescu (talk | contribs) (Corrected the link to Transifex)
Jump to: navigation, search

Adding or changing strings

For details on this, read the "structure of our translation toolchain"

Extracting translations

Instead of providing a master.po file, the same information is pulled automagically from cc.engine's templates, in the content of the trans tags.

  1. Make modifications to cc.engine templates, commit, push, etc.
  2. In cc.i18n (using either buildout or virtualenv):
  3. git pull origin master
  4. run ./runcheckout.sh && ./extract.sh
  5. git add cc/i18n/po/en/cc_org.po
  6. git commit -m "Extracting new strings for translation"
  7. git push origin master

...done!

Push source file up to transifex

  1. ssh a7.creativecommons.org
  2. sudo su cronuser
  3. cd /home/cronuser/transifex.net_i18n_checkout/
  4. git pull
  5. tx push -s

That last command will push the source file (english .po file you committed) up to transifex.

Structure of our translation toolchain

Where are our translations?

First of all, we maintain our translations on [1]. Our affiliates mostly handle our translations.

What tools do we use?

Translations are in gettext format.

We used to use zope's i18n toolchain for translations. Things used to be in "logical key" format, where there was a symbolic representation of each translation (almost like a variable that mapped to the string). We switched to english keys because that's what most of the world does, and doing otherwise required an insanely complex and fragile system that we spent a ton of time maintaining.

These days it's pretty simple... just mark a string for translation by wrapping it in gettext() or _() or whatever. Then we can auto-extract things.

If you read the "extracting translations" section above, or even ran the commands, you may have wondered, "Whoa, that ran like magic! All of these translations just got pulled out! How the hell did that work?"

The answer is pretty simple! We use Babel to extract strings.

When you run ./runcheckouts.sh, it checks out all the packages we extract translations from. And ./extract.sh extracts all the translations from them by reading babel.ini to find out all the stuff it should extract.

Most of the extractors are pretty standard (jinja2 and python are bundled by jinja2 and babel respectively), but we've defined our own for extracting from RDF in cc/i18n/tools/extractors.py (defined as an entry point in setup.py)

So anyway, transifex has a client that we use to push up the new translations with. Anyway, just see above for that.

There are actually two cronjobs that run, translations related. A few times an hour new translations are pulled down, and a new translation tarball is built. (They're currently separate scripts but maybe they could be merged?

These commands can be found in the cronuser crontab.

 # Pull changes from Tx.net and push them to our repos
 5 * * * * ~/bin/sync_i18n_with_transifex.sh > /dev/null
 10 * * * * ~/bin/sync_i18n-ccsearch_with_transifex.sh > /dev/null
 
 # Update cc.i18n tarball
 */15 * * * * /usr/bin/ionicer && nice -n 19 bash /var/www/staging.creativecommons.org/make_i18n_sdist.sh > /dev/null

One more thing to note: we have a translations statistics tool that's run every time the sdist is built. It writes out a csv file that keeps several bits of information, including percentages. We have a translation threshold at the top of cc/i18n/util.py ... translations have to be above this level to show up in the "available languages" box on various pages of cc.engine!