Difference between revisions of "Translation tooling"
(→Extracting translations) |
m (Corrected the link to Babel) |
||
(6 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | + | = Adding or changing strings = | |
+ | |||
+ | For details on this, read the section "Structure of our translation toolchain". | ||
== Extracting translations == | == Extracting translations == | ||
Line 6: | Line 8: | ||
# Make modifications to cc.engine templates, commit, push, etc. | # Make modifications to cc.engine templates, commit, push, etc. | ||
− | # In cc.i18n (using either buildout or virtualenv) run ./runcheckout.sh && ./extract.sh | + | # In cc.i18n (using either buildout or virtualenv): |
+ | # git pull origin master | ||
+ | # run ./runcheckout.sh && ./extract.sh | ||
# git add cc/i18n/po/en/cc_org.po | # git add cc/i18n/po/en/cc_org.po | ||
# git commit -m "Extracting new strings for translation" | # git commit -m "Extracting new strings for translation" | ||
Line 22: | Line 26: | ||
That last command will push the source file (english .po file you committed) up to transifex. | That last command will push the source file (english .po file you committed) up to transifex. | ||
+ | |||
+ | = Structure of our translation toolchain = | ||
+ | |||
+ | == Where are our translations? == | ||
+ | |||
+ | First of all, we maintain our translations on Transifex - https://www.transifex.com/nkinkade/CC/deeds-choosers/. Our affiliates mostly handle our translations. | ||
+ | |||
+ | == What tools do we use? == | ||
+ | |||
+ | Translations are in gettext format. | ||
+ | |||
+ | We used to use zope's i18n toolchain for translations. Things used to | ||
+ | be in "logical key" format, where there was a symbolic representation | ||
+ | of each translation (almost like a variable that mapped to the | ||
+ | string). We switched to english keys because that's what most of the | ||
+ | world does, and doing otherwise required an insanely complex and | ||
+ | fragile system that we spent a ton of time maintaining. | ||
+ | |||
+ | These days it's pretty simple... just mark a string for translation by | ||
+ | wrapping it in gettext() or _() or whatever. Then we can auto-extract | ||
+ | things. | ||
+ | |||
+ | If you read the "extracting translations" section above, or even ran | ||
+ | the commands, you may have wondered, "Whoa, that ran like magic! All | ||
+ | of these translations just got pulled out! How the hell did that | ||
+ | work?" | ||
+ | |||
+ | The answer is pretty simple! We use [http://babel.pocoo.org/en/latest/ Babel] | ||
+ | to extract strings. | ||
+ | |||
+ | When you run ./runcheckouts.sh, it checks out all the packages we | ||
+ | extract translations from. And ./extract.sh extracts all the | ||
+ | translations from them by reading babel.ini to find out all the stuff | ||
+ | it should extract. | ||
+ | |||
+ | Most of the extractors are pretty standard (jinja2 and python are | ||
+ | bundled by jinja2 and babel respectively), but we've defined our own | ||
+ | for extracting from RDF in cc/i18n/tools/extractors.py (defined as an | ||
+ | entry point in setup.py) | ||
+ | |||
+ | So anyway, transifex has a client that we use to push up the new | ||
+ | translations with. Anyway, just see above for that. | ||
+ | |||
+ | There are actually two cronjobs that run, translations related. A few | ||
+ | times an hour new translations are pulled down, and a new translation | ||
+ | tarball is built. (They're currently separate scripts but maybe they | ||
+ | could be merged? | ||
+ | |||
+ | These commands can be found in the cronuser crontab. | ||
+ | |||
+ | # Pull changes from Tx.net and push them to our repos | ||
+ | 5 * * * * ~/bin/sync_i18n_with_transifex.sh > /dev/null | ||
+ | 10 * * * * ~/bin/sync_i18n-ccsearch_with_transifex.sh > /dev/null | ||
+ | |||
+ | # Update cc.i18n tarball | ||
+ | */15 * * * * /usr/bin/ionicer && nice -n 19 bash /var/www/staging.creativecommons.org/make_i18n_sdist.sh > /dev/null | ||
+ | |||
+ | One more thing to note: we have a translations statistics tool that's | ||
+ | run every time the sdist is built. It writes out a csv file that | ||
+ | keeps several bits of information, including percentages. We have a | ||
+ | translation threshold at the top of cc/i18n/util.py ... translations | ||
+ | have to be above this level to show up in the "available languages" | ||
+ | box on various pages of cc.engine! | ||
+ | |||
+ | [[Category:CC_Tech_Handbook]] |
Latest revision as of 14:12, 8 December 2016
Contents
Adding or changing strings
For details on this, read the section "Structure of our translation toolchain".
Extracting translations
Instead of providing a master.po file, the same information is pulled automagically from cc.engine's templates, in the content of the trans tags.
- Make modifications to cc.engine templates, commit, push, etc.
- In cc.i18n (using either buildout or virtualenv):
- git pull origin master
- run ./runcheckout.sh && ./extract.sh
- git add cc/i18n/po/en/cc_org.po
- git commit -m "Extracting new strings for translation"
- git push origin master
...done!
Push source file up to transifex
- ssh a7.creativecommons.org
- sudo su cronuser
- cd /home/cronuser/transifex.net_i18n_checkout/
- git pull
- tx push -s
That last command will push the source file (english .po file you committed) up to transifex.
Structure of our translation toolchain
Where are our translations?
First of all, we maintain our translations on Transifex - https://www.transifex.com/nkinkade/CC/deeds-choosers/. Our affiliates mostly handle our translations.
What tools do we use?
Translations are in gettext format.
We used to use zope's i18n toolchain for translations. Things used to be in "logical key" format, where there was a symbolic representation of each translation (almost like a variable that mapped to the string). We switched to english keys because that's what most of the world does, and doing otherwise required an insanely complex and fragile system that we spent a ton of time maintaining.
These days it's pretty simple... just mark a string for translation by wrapping it in gettext() or _() or whatever. Then we can auto-extract things.
If you read the "extracting translations" section above, or even ran the commands, you may have wondered, "Whoa, that ran like magic! All of these translations just got pulled out! How the hell did that work?"
The answer is pretty simple! We use Babel to extract strings.
When you run ./runcheckouts.sh, it checks out all the packages we extract translations from. And ./extract.sh extracts all the translations from them by reading babel.ini to find out all the stuff it should extract.
Most of the extractors are pretty standard (jinja2 and python are bundled by jinja2 and babel respectively), but we've defined our own for extracting from RDF in cc/i18n/tools/extractors.py (defined as an entry point in setup.py)
So anyway, transifex has a client that we use to push up the new translations with. Anyway, just see above for that.
There are actually two cronjobs that run, translations related. A few times an hour new translations are pulled down, and a new translation tarball is built. (They're currently separate scripts but maybe they could be merged?
These commands can be found in the cronuser crontab.
# Pull changes from Tx.net and push them to our repos 5 * * * * ~/bin/sync_i18n_with_transifex.sh > /dev/null 10 * * * * ~/bin/sync_i18n-ccsearch_with_transifex.sh > /dev/null # Update cc.i18n tarball */15 * * * * /usr/bin/ionicer && nice -n 19 bash /var/www/staging.creativecommons.org/make_i18n_sdist.sh > /dev/null
One more thing to note: we have a translations statistics tool that's run every time the sdist is built. It writes out a csv file that keeps several bits of information, including percentages. We have a translation threshold at the top of cc/i18n/util.py ... translations have to be above this level to show up in the "available languages" box on various pages of cc.engine!