Translate Toolkit & Pootle

Tools to help you make your software local

User Tools


Functional Specifications

Glossary

  • WordForge A project to create processes and tools to improve the localisation of Free and Open Source Software.
  • Pootle - A Distributed translation management system (by the way, it also has a translation editor)
  • Pootle server - anything in Pootle that is not the editor
  • Pootle editor - the on-line translation editor integrated in Pootle.
  • Conversion Filters or Filters are computer programs that convert non-standard files that contain source messages (and in many case their translations) to standard XLIFF or PO formats (Inbound Conversion Filters) or, once they have been translated, XLIFF and PO files back to the non-standard format required by an application (Outboud Conversion Filters).
  • Project Liaison - the person or script that talks between the upstream project and pootle
  • Translator - a person who translates
  • Reviewer - a person who reviews translations
  • Translation Manager - a person who manages translations in their language
  • TM - translation memory
  • TMX - Translation Memory eXchange format. A standard developed by LISA.
  • LISA Localisation Industry Standards Association
  • Glossary - a list a terms in English and a Target language.
  • TBX - TermBase eXchange. A standard format developed by LISA for glossaries.
  • PO - Portable Object. The standard format for Gettext which is used by most Free Software. It stores the original messages, translation and other information about the messages.
  • POT or PO templates - PO files that only contain original messages (no translations). They usually have the .POT extension.
  • XLIFF - XML Localization Interchange File Format. A new XML based standard format for storing localisation data, inclusing original messages, translation, process data, glossary information, translatin memory information, etc.
  • XLIFFT or XLIFF templates - XLIFF files that do not have any translation nor language-dependent data in them.
  • Project - a set of files for a particular version of a given piece of software prepared or being translated to a specific language (such as OpenOffice 2.0.2 for Spanish).
  • Project template set - a set of template files for a particular version of a given piece of software (such as OpenOffice 2.0.2 XLIFFT or POT files).
  • Instantiation of a Project. Action of creating a new copy of all the template files of a Project template set, assigning it to a specific language. After making the copy, and placing the files in the right place, the file extensions are changed to denote that the copy is not a template (but a set of working files, the files are initialised with data denoting the language. After this, if there is an old version of the same Project in the system, the Project is initialised.
  • Initialisation of a Project. Once a new instance of a file or project is created, and if an older version of the Project exists in Pootle, the files are initialised by copying to them all the relevant information (translations, process data, etc.) from the old files. Then the old files are erased from Pootle (but kept as back-up) and replaced by the new ones. Then we can say that the Project has been upgraded to a new version.
  • Meta-data - To be defined.
  • Goal - an objective set by a project liason, translation manager or individual (can be date, file, directory or completeness ie percentage based).
  • Trunk - the main current or development version of a project also called HEAD
  • Branch - a soon to be released or stable version

Pootle

Is a webbased tool designed to make translation and translation management easier and designed to increase the quality of translation.

What does Pootle do and does not do by itself

As a first functionality, Pootle is a file server for translatable files in PO or XLIFF format. The files are loaded to the server classified by Projects (for us a Project is a specific version of a piece of software that needs to be translated to a given language). Translators, reviewers and translation managers can either edit the files on-line (using Pootle's On-line Editor)or download them to their computer, translate them with an Off-line Translation Editor and then upload them again to the Pootle, replacing the old un-translated or partially translated files.

Translatable files for different FOSS projects are converted to PO or XLIFF format (if needed) using Conversion Filters (Inbound Conversion Filters in this case). Files are then uploaded to Pootle. Once they are translated and reviewed, they are converted back to the original format in which the FOSS project needs the data (using again the Conversion Filters, but now the Outbound Conversion Filters) so they can be integrated in the source, producing localised applications.

In addition, Pootle can use Glossary and Translation Memory (a database of previously translated messages) to improve the contents of PO or XLIFF files that are handed to translators and reviewers, facilitating the job of the translator, and ensuring quality of translations. Quality is ensured through a number of Tests that can either run by Pootle when files are uploaded back by the translator, or by an Off-line Translation Editor that integrates the tests.

All the components of the WordForge project are modular. They can either be integrated with Pootle or with other applications (such as Off-line Translation Editors). Glossary Management Tools integrate with Pootle by generating (and accepting back) glossaries stored in Open Standard formats, such as TBX, a standard specifically developed for sharing glossaries. Conversion Filters and Translation Memory Tools process PO or XLIFF files, and can be called by Pootle as external applications. Translation Memory Tools use also Open Standard file formats designed for sharing translation data information (TMX).

Pootle also manages information that is used to follow the translation process, such as when or by whom a translation was done, what is its deadline, if there are any detected errors in the translations, or any non-compliance with the glossary. Process information strongly simplifies the job of the Reviewer and the Translation Manager, who need to check the quality of the translations that have already been finished by the Translator.

And all this brings us to what Pootle really is, and what it calls from outside applications. It is probably better to start with what Pootle is not.

  • Pootle does not include a Glossary Management Tool, but can use glossaries in TBX format files generated by an external Glossary Management Tool. It can also maintain a list of proposed terms (proposed by Translators or Reviewers) for new inclusions in the Glossary (also in TBX format). The person in charge of the Glossary might decide to include them or not (using the external Glossary Management Tool), and generate a new Glossary TBX file to be used by Pootle in the future. The WordForge project does include a Glossary Management Tool, but it will be independent form Pootle
  • Pootle does not include a Translation Memory Tool, but it dynamically calls and external Translation Memory Tool that uses information stored in TMX files to enrich specific PO or XLIFF files before they are handed by Pootle to Translators. It does not either include a Translation Memory Management Tool that can handle translation memory information and generate TMX files. Pootle can, nevertheless, create a TMX file from information extracted from in PO or XLIFF files that it stores. The WordForge project does include a Translation Memory Management Tool, but it will be independent form Pootle.
  • Pootle does not include an Off-line Translation Editor, but it produces enriched PO and XLIFF files that can be translated by Translators or Reviewers using an Off-line Translation Editor. It also can received files that have been translated off-line, enrich them again with information from tests and store them.
  • Pootle does not include Tests for testing quality, adequacy glossary-compliance of translations, but it calls an External Test Application to process PO and XLIFF files that are uploaded by translators, before storing them.
  • Pootle does not include Conversion Filters, but they are called by Pootle to convert to PO or XLIFF formats files that originally have other formats, so they can be stored and managed by Pootle. It also uses them to convert to the original format PO or XLIFF files that have already been translated.



But, besides not doing many things by itself, Pootle does have a number of in-house functionalities:

  • Pootle is a Translation Management System. It manages files that have to be translated or that have already translated, as well as processing information about them. The Information about the translation process generated or managed by Pootle allows control of the state of each file or project at any stage, either from a global point of view or down to the smallest detail, as it includes information about when and how each single message was translated, when it is expected back, who reviewed it and when, if it complies with the glossary, if there are any technical problems with the translation that have been detected, etc. As part of this management, Pootle produces statistics that can be followed by Reviewers and Translation Managers, to understand the state of each project.
  • Pootle is a file server that allows translators reviewers and Translation managers to either edit files on-line or to download them, work with them off-line with an Off-line Translation Editor, and upload them back once the work is done.
  • Pootle is a back-up system for all versions of translated files, keeping a copy of each prior version of the file. All process information in Pootle is kept inside the translatable files. No databases are managed.
  • Even more, Pootle is a Distributed Translation Management System. As it is web-based, it allows Translators and Reviewers to work from different locations through the Internet, while keeping strong management of the translation process if needed. For teams that have low connectivity, a local copy of the Pootle can be installed (for day to day work), and then synchronized with the central server.
  • Pootle also includes an On-line Translation Editor that allows Translators, Reviewers and Translation Managers to work on-line, either doing full translations (less usual), correcting or upgrading files. This On-line Translation Editor uses all the statistics that Pootle has, plus all the information that can be generated by the external programs that Pootle can make use of.

Pootle Server

File types

Pootle stores four major types of files:

  • Translatable file templates (XLIFFT or POT). These are normal XLIFF or POT files that contain no translations and which have not yet been assigned to a particular language or filled with glossary or translation memory information. They might contain process information such as when the files were generated, by whom and when and by whom they where imported into Pootle.
  • Translatable files (XLIFF or PO). These are XLIFF files that have been assigned to a specific language. They might not yet be translated, but they might also contain translations, glossary information, translation memory information and/or process data information.
  • Glossary files (TBX). Files that contain English/language glossaries for a specific language. Or files that contain proposed glossary terms to be integrated in the main glossary for that language.
  • Translation memory files (TMX). They contain prior translations for a particular language. TMX files can be project-specific or generic.

Nevertheless, there are other forms of information that probably need to be stored:

  • Project Public Information Files. Language team leaders and/or project leaders could maintain an introductory page for volunteer translators in that language, which would not only suggest work that needs to be done, but would give them a start: links to project-specific help files, contact details for language teams and projects, quick tips. Pootle could well become an effective recruiting ground for new translators, as long as the process is handled positively and makes the best use of their available time.
  • Statistics about Projects (sets of files). This information might be either generated on demand or maintained in a separate Statistics file. Statistics about each file will be maintained in each file, making generating this information easy (and avoiding having to assure that the information is re-made after any file in the group is changed).
  • Original templates in the particular software's own format (not XLIFF or PO, one for each version of each software) will also be stored. Storage of these skeleton files allows Pootle to use the Coversion filters to create translated files in the projects own format. By storing the files we allow Poole to automate much of the format regeneration process allowing it to integrate with the upstream project in its resired format.
  • Configuration files for data such as “which tests makes sense to apply to translation to a given language”, or data for those tests (such as which Unicode character or set is used to mark the end of a sentence). Other configurations include: users, permisions and roles; user settings; goals. Configuration files might correspond to a language or (hierarchically) to a specific Project.

Pootle takes a file based approach for the core data (PO and XLIFF files). Supporting data (TBX and TMX) does not need to reside in is base format and could be placed in some data store. Meta files (Statistics, etc) could be placed in a Database if needed. Pootle uses technologies such as PyLucene to provide database level performance. With a file based approach and some thought out into pregenerating files it is possible to have flat file performance.

Priorities could also be an interesting concept to manage. A volunteer tanslator reaches Pootle and wants to translate something. Can we respond to the question: what should I start with? Priorities for volunteers could quite well be very different from priorities for professional translators. The response to this question must be a combination of information in the Project information file, statistics on already translated materials and goals established inside the files.

About Projects

A Project as a set of files for particular version of a given piece of software translated or being translated to a specific language (such as OpenOffice 2.0.2 for Spanish).

Template files are uploaded to the system by the Project liaison, only once for each version of a given software (the Liaison never uploads the same set of files twice). The template files are grouped into a Project template set (the set of XLIFF or PO template files that belong to the same version of the same software). These Templates files can be used to either:

  • Create a new instance of a Project (for example IMP H3 4.0.4 for Lao language, where IMP was never before included in Pootle for Lao). In this case it just creates a new set of empty files (copy of the original templates), renames the file extensions from XLIFFT to XLIFF and from POT to PO, and includes some data in the files indicating for which language they have been created.
  • Update an older version of the same software (for example IMP H3 4.0.4 for Lao language when IMP 3.4 has already been translated and exists on this Pootle server). In this case, each one of the old files is upgraded using the files from the new set (other methods of upgrade/initialisation, such as through translation memory, are also possible). The new Project (new version) replaces the old Project for the same software and language.

Sometimes two different versions of the same software are maintained (different branches). They are considered as different Projects, such as OpenOffice 1.1.5 and OpenOffice 2.0, when both were being developed separately (1.1.5 for fixes of the stable version and 2.0 including big changes for future advanced development).

FIXME {JS Based on a prior comment form CS, I think that creating a concept of Group could be a good idea. At the top level we would have a group, which could be OpenOffice or Debian, and inside that, the real projects. Please see structure below

Pootle
 |
 |- OpenOffice
 |       |
 |       |- OpenOffice 2.0.3
 |       |        |
 |       |        |- French
 |       |        |- Khmer
 |       |        
 |       |- OpenOffice HEAD
 |       |        |
 |       |        |- French
 |       |        |- Xhosa
 |       |
 |- Debian
 |       |
 |       |- Debian Installer HEAD
 |       |        |
 |       |        |- Vietnamese
 |       |        |- Khmer
 |       | 
 |       |- Another Debian appli 1.0
 |       |        |
 |       |        |- English ZA
 |       |        |- Lao
 |           
 |             

If we consider a language based view, it would be:

Pootle
 |
 | Vietnamese
 |   |- OpenOffice
 |   |       |
 |   |       |- OpenOffice 2.0.3
 |   |       |        
 |   |       |- OpenOffice HEAD
 |   |  
 |   |- Debian
 |           |
 |           |- Debian Installer HEAD
 |           | 
 |           |- Another Debian appli 1.0
 |
 |- Khmer
 |   |
 |   |- OPenOffice
 |   |       |

FIXME {CS In terms of practical management, we need capacity for super-projects. Large, independent projects like OpenOffice, Debian, Gnome and KDE will be easier to manage if there is a super-project, with all its related projects listed and organized under it. Language-team leaders and translation co-ordinators will need to be able to cross project boundaries within that super-project. The distinction between OpenOffice 1.1.5 and OpenOffice 2.0 should not be the same as that between OpenOffice and Debian. We need an effective hierarchical structure that can be expressed in terms of projects [from super-project] or languages. If the meta-data structure below is capable of actual manipulation by such categories, that is what we need. A Gnome translation manager doesn't want to have to deal with Gnome HEAD, Gnome 2.12 and Gnome 2.14 always separately.}

Projects can be static (such as OpenOffice 2.0.2) or dynamic, such as OpenOffice HEAD. In the first case the project is only upgraded to a new version when the Translator Manager for that language presses the Upgrade button for that Project. In the second case, when the Project liaison upgrades a new version of template files for that software, upgrade to that version takes place automatically. Pootle uses external version control to protect against data loss during upgrades.

Projects are linked together to common meta data. So it is possible to list all Projects that relate to a given language (Thai), or to a given upstream FOSS piece of software (Mozilla/GNOME).

About Process

The page on Analysis of Process and WorkFlow studies what phases a translation and localisation process can go through, analysing the possible players who might participate in the process, the rights that each player must have in order to participate in the process and, finally, the creation of workflows for localisation composed of different phases. The page on Process information on XLIFF files proposed a way of encoding the different phases inside this format, and the changes in the information about the translations that must take place when new phases take place.

A file in Pootle will undergo several of the following processes:

  1. Creation of the XLIFF file. The creation will be done by some tool and responsible person (or automatic process). Information about who and when created the file, together with other useful bits (defined in the Process Information page) will be included in the file, together with the initialisation of the counters that indicate how many messages are there, how many are translated (0), etc.
  2. Introduction into the Pootle server. This happens either by action of the Project liaison or by some automatic mechanism, depending on if it is a project that relates to a stable version of a program (manual introduction) or it is a project that maintains the HEAD version of a piece of software (automatic introduction and upgrade, hopefully). FIXME {CS Further details needed: triggers and methods.} The file - in XLIFFT or POT format - will be part of a Project Template Set, the set of XLIFF or PO template files that belong to the same version of the same software. If there is a specific deadline by which the FOSS project requires the file back, it might also be introduced here FIXME {javier: how?, where?}.
  3. Instantiation of the file to a language. A copy of the file is assigned to a specific language. If there is an older version of this file for this language in Pootle, all the available information in the old file (translations, process, etc.) will be preserved in the new file. The work will be done string by string, carrying the existing translations and process data for each string (Translation Memory information for translated messages will probably not be carried to the new file). Once this is done, statistics about the state of the file are generated and placed in the header counters, and the process will be recorded in the file as a phase. This process will take place for all the files in a Project at the same time.
  4. Include a subset of the glossary in the XLIFF file. This would include inside the XLIFF file all the terms from the glossary that appear in any of the source messages of the XLIFF file. To facilitate processing by the editor, we will code in the file which glossary words appear in each source message of the file. If there are already translations in the file, it is not necessary to place glossary terms that are present, but already translated correctly in the file, unless the terms have changed (this means that all messages need to be checked, glossary words that are not used correctly need to be included, and messages that have them must be marked as fuzzy, unless there is an indication in the trans-unit that says that a reviewer approved usage of non-standard terminology for that message). Glossary terms for fuzzy messages should be included. This process might take place using processor idle time. The date and name of the glossary file that was used for this process is stored in the XLIFF file (reprocessing will be required later only if the date is different).
  5. Using processor idle time, include Translation Memory information in the file. The information that is included is based on a TMX file that is in Pootle. XLIFF (or PO) and TMX files are passed to an external process that decides which information will be placed in the file, and in which fields (as translation, as assistance), including the rate of matching, if different than 100%. For messages that are placed directly in the “translation” field (even if they are fuzzy), all available information in the TMX file will also be placed (who translated, when and date of TMX file). FIXME {CS We need to review this in practice. How much information, how is it displayed and where? Cluttering up the interface with information we don't often use may well be counter-productive. At the least, it will confuse the eye. One of the reasons I use Pootle is that it's interface is clean, intuitive. Let's not get data-itis here. Perhaps extra information could be available as a popup?} When the XLIFF (or PO) file is returned to Pootle, the date of the TMX file is stored in the XLIFF (or PO) file (reprocessing will be required later only if this date is different). FIXME {DB Understand but needs clearer mind to clarify} FIXME {CS Would a muddy one do? ;)}
  6. Goal setting. The Translation manager may open the Project or specific files and set a specific goal, which might just be a deadline for the Project/file or something more complex (to be defined). FIXME {CS Examples: completing certain priority files in a project, reaching a certain percentage of completion by a certain date, specific translators completing specific tasks.} The goal is information that is stored in the XLIFF file (this would be harder to do with PO files). FIXME {CS If this can't be stored in the main PO file header [note: consult gettext gurus on the TP list], how about adding extra translator comments [# ] to the first string in the file? These could be removed from the output file if required.} FIMXE {DB Some goals relate to groups of XLIFF files so a bit hard to store inside.} FIXME {CS Meta-data. As long as goals are accessible as both meta-data and from inside specific files, both types of goal, specific and shared, can be used.}
  7. Exporting the file for off-line editing (same process as preparing the file for on-line editing with the Pootle editor). Process that takes place on-demand. If there is no glossary or TM information, or if the dates of the TBX or TMX present in Pootle are posterior to the dates stored in the XLIFF file, redo the glossary and TM inclusion processes. Download or free the file for translation. Export “as it is” could also be an option. FIXME {DB clear mind needed} FIXME {CS Essentially, this export option substitutes for presenting the file to Pootle for online editing. The same processes have been applied. The file may be presented, or exported, in a variety of formats if required, and initialized as required, carrying the same internal standard and extra information. Exporting is only an alternative view of online editing. Include it with online editing, as an option, and there's no need to describe it separately.}
  8. Importing a file into Pootle after off-line editing. Pootle will first send the file to run specific tests (pre-configured for that language). These tests will include in their results (if needed) in the XLIFF/PO files themselves. When the files are back, Pootle will regenerate the statistical data for those files, include it in the files and then generate statistical data for the Project. This should not be too expensive (CPU), as statistical data for each file will be already in the file (does not need to be recalculated for files that where not touched). FIXME {CS Performance is important here. If a user imports a file for editing, s/he wants it available a.s.a.p for that purpose. Could any processes not essential to translation input be performed parallel while the user inputs the first translations? Could viewing the stats be alternative to starting the file straight away? Many translators will simply want to get on with the file. Managers, on the other hand, will want to collate information and check everything is OK.}
  9. On-line editing. The on-line editor works only on one file at a time, but it permits seeing or following the statistics of a whole project at once. FIXME {CS At the same time? Are the stats a pop-up? Abandoning your place int the file to look at the stats takes time and is thus inefficient. Incidentally, we need a way to mark places in a file, so you can return to a specific string.} After preparing a file (see above), a copy of it is passed to the editor. The Pootle editor is capable of working in the file, translating testing messages unitarily once they have been translated. It gives very specific information about the health of the file that the reviewer needs (compliance to glossary, untranslated messages, problems detected by the tests, etc. The editor can be configured to run only, or to show the results of certain tests (specific for a language or for a Reviewer) The editor-only also permits extracting glossary information that has been added by off-line (or other on-line) translation, approving it and exporting it to a TBX file (glossary proposals file). After the file is edited, it is returned to the file system. Tests are run again and statistics for the file and the Project are updated. FIXME {DB review stopped} FIXME {CS Post-editing processes? Export/committal/create TMX/store file and perform other action etc. Pootle presents options to the user at this stage.}
  10. Review. The process of review can be considered separate from the Pootle editor, given that one is a tool and another one a human process (which will probably use that tool). FIXME {CS Permission to perform different functions will be assigned by team/project leader. Mutual review is thus also possible. Pootle both displays review comments with the relevant string, and summarizes them for the user, much like msgfmt error output.}
  11. Export to translation memory. When the translation of a file or Project is approved by the Reviewer, its contents are added to the TMX file. In case there are new translations for messages that where already translated, the old translation will also be kept in the TMX file for reference (there should not be too many of these cases). FIXME {CS This is part of maintaining TM and glossaries. It must be possible to maintain more than one TM or glossary, for different purposes. Thus we must be able to add the reviewed file to specific TM or glossaries, more than one in fact. It would be particularly useful to be able to add sections of a file to specific TM/glossaries. For example, the long iso-codes files maintain a list of official names of languages, countries and states. Many program files also try to maintain separate lists, which they make us translate, despite being encouraged to use iso-codes as a plugin. A translator or team/project leader trying to maintain a TM or glossary of language/country/state names would want to add such sections of the program files to that specific TM/glossary.}

For all this it is necessary to define which Process Information will be coded in the XLIFF, TMX and TBX files, and how this coding will take place. FIXME {CS We need to get input and feedback on this from both translators and team/project leaders. Can we display user-specific sets of this information? For example, a translator may not want to see most of that info., if focussing only on translating a part of a file during her/his lunch-break.}

Version Control

See also version control for specifics about how to implement version control and sharing.

In order to protect against data loss (vandalism, bad translations) we will use a version control system. The version control system will be used internally by Pootle and will remain transparent to a Pootle user, unless of course a system problem arises. Pootle being file based will work from the latest version. Files held under version control will allow users to see changes that have been made and allow users to revert to old versions.

NOTE The part of version control that relates to sharing data with upstream projects is a separate issue.

FIXME {CS SVN is an excellent choice, and extends to the key issue of currency of files. Different projects have different procedures for getting/committing files. Some use SVN, some use CVS, Debian (for example) uses CVS, SVN and email. We need to be able to interact with their procedures, to get and commit current files. How are we going to do that? Until we have currency, Pootle is not really part of the main flow of translation work, but is simply somewhere you can use if you have first, got the current file, then uploaded it, and are willing to download it and commit it manually afterwards. It's more of a bottleneck right now than something that facilitates the translation process. I want this to change.}

The Editor

Pootle includes a translation editor allowing translators to translate applications. Using AJAX we can make Pootle behave more like a desktop application then a webbased tool. Thus you could use Pootle as a desktop translation replacement and it saves us having to work on functionality integration.

FIXME {CS This has real potential for Pootle. Otherwise, translator need to swap between translation interfaces. There's less mind-share associated in using only one interface. Extending Pootle to the desktop will make Pootle less work to use, and will improve opportunities for users with bad or unaffordable Net access.}

The translation interface needs to maximise space for the translation while giving useful feedback. FIXME {CS Yes! Don't clutter.} The following are included:

  • Translation widgets
  • Glossary lookup and flagger
  • Translation memory lookup
  • Context (Previous and Next translations)
  • Visual progress indicator
  • Parallel translations

Translation Widgets

Where you actually input the translations. The default would be:

  • Source text area
  • Your translation area
  • Comments area

The source text is uneditable unless you wish to make a change that will be reported as an error to the programmer (You will need to give a comment explaining why it should be so). If Pootle cannot deliver your error report (unkown contact info) then it will store that information so that other might benefit and until such time as it can be delivered to the programmers

Source text is usually in English, however many translators are not English speakers so you can also display the source text in another language. If no translation is available in the other language then you can make use of some of the online translation tools allowing you to see a rough translation of the English. You can also view multiple translations in multiple languages (see the parallel translation section below)

The widget for your language is of course always editable allowing you to enter your text as needed. If you cannot type your script using your current keyboard then a character selector is available. However character selectors are tedious and a pointer to good instructions on how to get your languages input methods working is preferable.

FIXME {CS The introductory page, also available as or with the general information page for that language team, should include information on how to access the most effective input methods, keyboard layouts etc. for that language, for as many OS/distros. as possible. A translator spending what available time s/he has, inputting translations one character at a time, is incredibly frustrating and inefficient and plain unnecessary. It does happen, and we can avoid it by including this information, which would also maintain contact information, mailing lists etc., for discussing any input-related or other problems for that language and for the Pootle interface in general. Translators must be made to feel welcome, accepted and encouraged to ask questions and contribute.}

If you have created a translation that you know to the untrained eye looks wrong, for instance if you decide to correct it so that in the running application it is more effective, then the comment widget can be opened and you can give feedback to future translators. The same comment window can be opened to allow you to make a comment that is available to all translators. To avoid confusion we will separate these functions or create a clear visual cue as you want global comments to be in English while a comment specific to your language can be in your own language. For instance, you saw a piece of text e.g. “Select DOMAIN” and on investigation you discovered that DOMAIN is a variable and should not be translated, then you can make a comment that will be shared with others. If possible Pootle will return general comments to the programmer for inclusion, while language comments will be embedded in the translation file, if these are upstreamed then they will be shared with non-Pootle translators

AJAX is used so that when you select save you see the next translatable string quickly. Thus moving forward and backwards is quick and you do not wait for the page to refresh before you can translate the next item.

Glossary lookup and flagger

When you are translating, and Pootle detects a word that appears in the glossary, it will highlight that word in the source text, and place it and its equivalent in your language in the glossary lookup. So every word that has a glossary entry defined that appears in the source text will be there.

Pootle can reference multiple glossaries, the user or the language team can select which ones to enable and give a priority order. If glossaries exist for your language these may be uploaded to the glossary server.

Using your arrow keys you can select a glossary word and have it copied to your current cursor position in your translation. The user may override which keys are used. By using AJAX we hope to make this seamless.

FIXME {CS This could be the same as the Mark Place feature we need. We must be able to see a list of live links of these different kinds of marks [Mark Place, Mark Unfinished, Mark Query, Mark Needs Review …], and to search by them. They could be categories in the main Mark list.} FIXME {DB Not sure exactly what CS wants here. Perhaps a method to push items into another process. Not sure if this requirement would go beyond the workflow ideas. Good idea though}

If no translation exists in your language, then these entries will also be marked, but in such a way as to show that you still need to add a word for this. Such entries will also be placed in the glossary lookup but as blank entries. In a similar way as you copied the glossary entry to your translation, you can highlight and copy a word from the translation file to the glossary. You may need to add more information for the glossary entry in which case a screen will appear allowing you to add this information. Your contribution to the glossary will still need to be reviewed by the glossary team.

If words or phrases appear in the source text that you feel should appear in the glossary you can highlight them and nominate them for inclusion.

Translation memory lookup

“How did I translate this before?” When you are presented with a piece of text, Pootle will also populate the TM lookup with potential translations. These will be your own, those of your team or those from other applications. You can classify the order in which these appear. For instance, if you trust another translator over your own translations, then you can make those appear first. The list will also give an indication of how well the text matched, (less than 100% since otherwise the match would have been automatically populated from TM/glossaries)FIXME {DB Should the translator not be required to sign off on a 100% match? Never trust a machine}, and you will probably need to make some editing changes.

By scrolling through the available translations in the lookup widget, you can view them and choose to copy one to your editing widget. You can at any time copy from the TM lookup widget, but this will overwrite your current editing.

Context

File location context

You need to see the translations before and after the active translation, in the file, so that you can add (if possible) to the context of your current translation. You can have a user specified amount of context (you might want to adjust this if you have a small screen).

Message application context

FIXME {DB This is related to context comments and queries}

Ideally, this will be supplied by the developer comments and translator comments displayed for that string. If not, a query widget could allow you to contact the developers and query the meaning of that string. FIXME {CS I think my favourite, among countless examples, concerns the huge GIMP PO file, which is completely bare of context. I queried the verbose string “H”. (That's it: the entire string.) The developer told me I should have enough experience of the application to know what that meant! (Hot? High? Horrible lack of context?) Getting context is not always straight-forward, so we need to be able to ask for it.}

Visual progress indicator

Across the top of the screen you can see the visual progress indicator. This looks like a ruler and shows you where you are located in the current translation file. It also highlights the blocks that are currently reserved for your translation, those that have been completed by you and those that have been reviewed and accepted. Your current translation speed together with those of your team members is shown. This could be your project team eg OpenOffice.org or your language team eg Zulu. Your translation speed ranking and other stats are displayed. If your file is one of many in a larger project then you will see in context of the larger body. If you are working on an abstract goal then your progress indicator will also indicate that progress.

Other statistics shown will be the ratio of the number of words in the glossary and those defined by your team.

If you can imagine a multicolour ruler that looks a bit like a slide rule. The slide part indicates you current place in the workload and the breadth of your reserved translations. Your reserved translations will appear in your colour but grey. Once you translate them they appear in a stronger colour and once they have been reviewed you will see them in the full colour. The other people on your team will appear in other colours so you can see a patchwork of work complete.

So at a glance you can see how fast you are, how well your team is doing and how the global effort is progressing.

FIXME {CS Is speed of translation really something we should be emphasizing? Surely quality is what we want. Especially for new translators, and for those who have little available time, comparisons of this kind will be intimidating, and discouraging. I strongly suggest that the display be configurable, so the translator can choose how much comparison, what kinds of information, s/he wants displayed. Completing any string as well as possible should be encouraged.}

FIXME {DB I think I now agree with CS on this. We need to look at this and try to create something that doesn't emphasise speed of quality}

Parallel Translations

Very often you are not sure of the exact meaning of a phrase even with context information. However, you might be able to read other languages. In which case it might be helpful to see how other people translated the text. For instance in OpenOffice.org all strings are translated into English and German by a fulltime translation coordinator. Thus the German translation could provide a very good alternate translation since it would have been validated by both the programmer and the coordinator.

The parallel translation appear above the source text but after the context information

Live quality checking

Variable, tag and escape tracking

Translators often make errors by translating variables or leaving out XML tags. The tracker provides a list of these items and removes them once they are added to the translation. Thus the list should be empty before the translation can be completed.

Live alerts are needed for the above, also for variables not matching, missing spaces at the beginning or end of the string, missing or extra \n (PO), missing end tags or broken tags (XML, XLIFF) [if any of these aren't covered by the above paragraph]. These are common errors. It saves time to spot them while translating.}

Start and end punctuation

These are also tracked live to ensure that they are correctly inserted. They are able to track different forms of punctuation for different languages, dialets and scripts.

Spelling and others

Some checks will not be implemented live. In this case they are checked when the translation is submitted or on request.

FIXME {CS It's important to realize that spelling-checking is not useful for some languages. In Vietnamese, which is a mono-syllabic language based on accented vowels which represent the tones that differentiate meaning in words, there are very few possible combinations of letters and accents which are not valid words. Only a grammar checker, or a configurable checker simply looking for phrases showing common errors, would be of any use.}

Test configuration

Some tests don't work in certain languages (e.g. spell checking in Vietnamese). The user can make configuration changes to switch off certain tests or to configure the operation of the test for that language.

Custom checks

A user can add custom checks (really just simple regex searches) which fail of pass depending on relationships between items found in the source or target string

Feedback to translators

Without leaving the interface, a translator can request feedback from another translator. The feedback is only sent if Pootle has the information required to request the feedback. If it does not, then the query is stored until such time as it has that information, and the translator is made aware of this status. The translator's message is sent via any number of media. The translator moves on, and can choose to mark the message as translated or untranslated. S/he will be flagged and taken to the message once the response is obtained.

FIXME {CS This will work very well where teams are concerned, but for a single translator it only reinforces the loneliness and lack of help. If no other translator is available via Pootle, perhaps the translator can input an external contact or mailing list to query instead.}

Themeable

Although themeabilyt is a general requirement for Pootle so that it can feel at home in anyones project, in the editor their is a specific needs for themeing beyond looking good. We need to be able to theme the editor so that it is usable by people with disabilities, although most people working using the interface will want some kind of customisability.

Specific themes with high contrast and large font size could cover disbailities.

The ability customise the folling is needed:

  • Colours (background, text)
  • Font sizes

Translators' Console

The console is the first place the translator goes to, and it acts as a hub for all his/her work. From here s/he can:

  • Enter any project and view files
  • See a list of all his/her teams, goals and rôles
  • See statistics for his/her teams and others
  • See his/her personal ranking and team ranks
  • Resume translating on a task
  • See feedback messages from programmers or reviewers, and go to the associated string
  • Join new teams or answer requests to join
  • See team and language messages
  • Manage requests for his/her team leader roles
  • See lists of new projects added to his/her and other Pootle servers

FIXME {CS

  • See a complete list of all files/projects to which s/he has been assigned, with their current status.
  • See very clearly in that list, and be able to display as a sub-set, all the files which have been updated and thus require editing.
  • Mark files with priorities, or rearrange them in a hierarchy (drag up/down a list).
  • Assign estimated time required to edit each file in that sub-set.
  • Choose a file to work on, from that sub-set, and be able to return directly to it at any time from the current file.

Thus the translator can oversee the status of all his/her files, distill a group of files needing current attention, classify them, and work on them one-by-one, while keeping an eye on the goal group.}

The Team Manager

All translators are in some way part of a team. Usually they would be in a team translating OpenOffice.org into Zulu or a team translating Fedora into Khmer. Some translators work alone but their output goes towards helping their language.

In Pootle we will see functionality that makes it much easier to work as a team and much easier to expand the concept of team.

  • User, languages and roles
  • Creating teams
  • Creating goals
  • Setting team goals or work spaces
  • Measuring team goals
  • A helping hand
  • A place to chat
  • Moving up
  • Defining process

Users, language and roles

Each user is required to login to Pootle and no one is allowed to simply edit text (c.f. bug feedback in which it is allowed). Users specify which language they wish to translate. The language coordinator then authorises them and allows them to work on certain projects. Only once authorized can they contribute to their language on this instance of Pootle.

If they are the first user in their language then by default they are the language coordinator and are either approved by the project coordinator or the Pootle administrator.

Within the group of translators we also have reviewers. A reviewer reviews :). You also need to be authorised into this position. Some projects might set a project bar before you are eligible for reviewership (e.g. translated 1000 strings). Mutual reviewing could be authorized, or enabled as a optional process. Simply “Would you please cast your eye over those strings for me?” is very useful and catches typos and simple errors. This would be between translators of similar experience or translators who work together.

Creating teams

A team consists of more than one person. A team has a name and can have certain goals. There are one ore more team leaders and they can change the goals, invite people onto a team or accept membership applications to the team. The team can set minimum entry requirements, this is useful in the case where there are many translators and the team leaders simply want to see a list of who would be eligible to join the team. Really its not that complicated :) just simple mechanisms.

Each team can have an introductory page. Each existing team has an overall goal, as does each Translation Project. New translators coming to Pootle will read this page. New projects and languages will be asked for their goal. Some general goals, could be shown as suggestions if required.

Teams without goals work on… nothing. So we need to create goals

Creating goals

A goal is some target that you aim for. A goals can have the following criteria:

  • A delivery date
  • Certain files that must be completed to a given %
  • Certain directories that must be completed to a given %
  • They can span projects
  • They can have priorities within themselves (they might be able to contain subgoals)

Goals can be set by:

  • The upstream project maintainer
    • Minimum acceptance of translation eg 80% complete, all 1 & 2 word strings translated
    • Completed by this date
  • A language coordinator or team leader
    • We will do OpenOffice.org before Mozilla
    • Writer before Calc
  • An individual

Goals are by default shared if set by the upstream maintainer. They may be shared if they are set by a language coordinator. They are not shared if they are set by an individual.

Setting team goals or workspaces

You have a team and you have a goal or set of goals. You need to associate your team with a goal. Once that is in place then the associated priorities within the goal will be used to determine what work is to be done by your team.

We define a workspace as a work area with no associated goal. So a workspace could be KDE, GNOME, etc… or a custom selection of projects and files. There may be some priorities but generally a translators or Pootle is free to choose what to translate. You could think of a translator who is not part of a team would have a workspace of anything that is on the Pootle server (if they have the correct rights). So a workspace is a goalless set of files. A workspace can thus be associated with a team, ie the KDE workspace could belong to the KDE team. Anyone who is part of the KDE team can translate in the KDE workspace. The KDE team itself might set some other goals that need first priority. Once these goals are met the team can work on anything within the workspace. This also has the side effect of protecting KDE workspace files from arbitrary translation. It also would allow meshing of the concept of teams as defined by the KDE process itself.

FIXME {DB needs thought on mutually exclusive idea} Goals or workspaces need not be mutually exclusive. But rights dictate what a person can do to certain files.

Measuring team goals

Goals can be measured. We would like to give feedback to participants so that they are motivated and we'd like to give feedback to the language manager as to whether the goal is attainable, if not then they can adjust their goals.

Typical feedback would be:

  • At your current rate you will complete the task on such a date
  • You will miss the deadline
  • You need N more translators to make your target
  • These are the speeds of the various translators

A helping hand

Similar to Project Gutenberg's distributed proof readers it is possible to place new translators into a beginners section. This would be a standard named workspace which the language coordinator can move files into. Typically these would be less used files such as games, incidental toys and non standard apps. The idea is that any translation performed in this area is marked as beginner work. Reviewers are encouraged to give positive feedback and to explain corrections that they have made to the translators work.

FIXME {CS The more you can follow the example of DP, the better. They have an excellent compound process, a really enthusiastic and supportive community, and produce a huge amount of work from people contributing their bits and pieces of available time. I was with DP before my brain damage got to the point that I couldn't reliably pick errors in English, and I can't think of a better example for Pootle. Encouraging messages, plenty of support information and personal contact is essential to their success. Much of this is based around their forums, but they also have some really ingenious and effective procedures. If we can combine experienced translators and managers with new people as well as they do, Pootle will flourish and not just incidentally, provide an opportunity for people with possibly little experience and available time, to contribute something worthwhile and form an enjoyable community. Contact via Jabber IM, including groupchats, some used as tutorials for specific techniques, was a very successful step for DP.}

A place to chat

Teams can discuss work and place calls for people to join. News of targets reached. This is not a direct component of Pootle but the login should be seamless and shared between Pootle and the discussion boards

FIXME {CS As above. We can use the forums also to provide information, FAQs etc. Tutorials or specific discussions via Jabber or IRC could be scheduled. Forums for specific uses will be useful. Certainly own-language forums should be possible. A good deal of the team communication and translation queries could take place in the forums. Links to the forums, and to specific information, could be on each Team Page, and introductory pages. Encourage people to compete with each other if they want to (this works extremely well for DP), and to form teams which don't necessarily mirror language or project groups. Number of strings translated can be used for fun competitions, and to encourage pride in one's achievements and in one's team. See DP for more details. These processes break down barriers, and encourage a lot of contribution.}

Moving up

By tracking the number of translations performed and by monitoring quality of translation it is possible to automatically create a status measure. This can be used to automatically move people into different classes of translator. The team can choose to make this automatic, semi-automatic (requires human review) or non-automatic (ideal for existing teams with established roles)

FIXME {CS This could be part of the voluntary competition/encouragement mentioned above. Celebrate achievement: build fun and interest around it. Encourage people to try for certain goals, and reinforce their progress.}

Defining Process

The translation process is fixed within reason. There are a few that are defined:

  • Same reviewer (only one translator)
  • Named reviewer (only one person reviews)
  • Team Reviewer (Any of N people can review)
  • Multiple reviews (Translations go through multiple review stages for critical projects)

By keeping the flows fixed we can cover the needs of most teams without having to implement a large workflow system.

However, it should be relatively easy to implement any variation on the theme.

The language coordinator or team leader will choose which process to follow and place people into the different roles.

The Upstream Projects Friend

It is critical that it is easy for upstream projects to add their work to Pootle and to get translations from Pootle

They need to be able to:

  • Define information about the project, goals, branches, etc
  • Upload work
  • Integrate Pootle into their website
  • Receive notification when targets are reached
  • Retrieve work that meets target
  • Receive feedback for their programmers

Addressing upstream projects concerns

Upstream projects usually have these concerns over tools like Pootle:

  • currency of files, and
  • quality of work.

The first will be integrated into the Pootle workflow, and the second is shown in our registration and supervision procedures.

Defining the project

Projects have names, descriptions, websites, etc. This data needs to be made available to translators so that they understand the project and can thus decide if they want to translate the software.

Other data would include:

  • Licensing details (Who has copyright for the translations? Is the software all open-source? Where can we check this information?)
  • Project translation procedures, howtos, status display, contacts and mailing lists, and a brief description of how the project works.

The project can be defined in Pootle using a web interface or a flat file that is stored in the projects CVS and is submitted to Pootle using a simple updating tool. The second option makes it possible to update descriptions as part of a build process.

Also in these files would be descriptions of branches, thus allowing stable and unstable branches for translation. Or allowing legacy branches to remain and be updated.

Upload work

At any time the project coordinator can upload new translatable files. These are merged with existing files on the named branch. When merges are performed teams or translators associated with these files are informed of this status change. This is via email or an IM protocol.

Similarly to the information update a script can be used to upload the latest translations. This would allow for instance a nightly build to upload translations to Pootle.

The important idea is that Pootle works with their current strategies eg CVS scripts to make it easier for the project to adopt Pootle. In some ways using Pootle to help them manage their files, ie enhance what they alreay do

Pootle integration into websites

Pootle is themeable. Thus even though the Pootle server is hosted elsewhere it should be possible to theme and template Pootle so that it looks like the upstream projects own server.

Notification of target attainment

When a team meet a target, an email or IM message is sent to the project liaison to inform them that a certain language has met the target requirements. They can then take action or write scripts to take automatic action and import translations.

This is also automatically announced on the appropriate forum, and shown as a newsflash on an entry page. As any goal met is an achievement and the credit creates community.

Retrieve work that meets target (Optional)

Similar to the upload process it is possible for a project liaison to download all translations for languages that meat a certain target. This could be the publicly stated target or could be a hidden target. It is also possible for them to override this consideration if a certain language specifically requests inclusion of their translations for beta testing.

Submit completed work

In order to not be another cog in the works Pootle must be able to perform the roles that certain project require of translators. That is send files via email. Submit files through SVN/CVS (To achieve this - we might sync to the machine of the person with CVS access and allow them to then perform the correct submission of completed work). The main aim of this is to make integration of Pootle seem seemless to an upstream project, or to appear as if there is no connection at all making adoption easier.

The long term aim is to have features in place that allow Pootle to be integrated into the workings of current projects without much or any changes.

Feedback to programmers

Programmers can write cryptic messages and also forget to define what data will be contained in a variable. The feedback mechanism which operated from the editor allows Pootle to send requests for clarity to programmers. The programmer either adds this to the code or replies with clarification or does both. This occurs via email or IM. The clarification is attached to the message and forwarded to the requester. This thus reduces the number of queries and focuses the programmer on providing comments where needed. It also plays a useful role in educating programmers about localisation needs.

The widget can optionally supply a template string, to save time, and also suggests appropriate ways to make or phrase queries.

Pootle will manage unsuccessful queries: address not known, auto-responders.

The Collaborator

Pootle needs to work with existing systems but mostly it needs to work between itself. Pootle needs to share TM and TBX data between instances. When Pootle is run in a standalone mode it needs to be able to reserve and download files for offline localisation and resubmit those entries.

See also: version control

Things that Pootle(s) are doing during this are:

  • Reserving blocks of translations for specific times
  • Transferring TBX and TMX data related to the section of work
  • Transferring all TBX and TMX is required
  • Doing live queries between Pootle instance for relevant TBX and TMX data
  • Allowing authentication of users who are not registered on the server
  • Instituting a role of master and slave to allow translators to work on one server yet upstream the data to a master server
  • Go online and resynchronize data with a master or slave

Things that Pootle and upstream are doing are:

  • Checking in new translations
  • Updating and merging translations from upstream
  • Communicating upstream in their prefered method: CVS, SVN, email

Pootle to improve quality

Translation quality relates to continuous improvement of existing translations. Plus improvement and quality assurance of TM and TBX data. By a process of managing changes that increase quality we can filter improvement back into the existing work.

Measuring Improvement

This is another type of achievement we will emphasize; the number of strings added to glossaries/TMX and assessed as quality translations. Improving the shared resources should be seen as equally or even more important than simply completing translation strings in a file. It's not easy to compare these things, but the more we emphasize and appreciate effort, the more will be contributed.

User will be able to see graphically how much of the global standard glossary is translated, reviewed, approved. The same would apply to the global language TM.

Changes flag backwards

If an improvement or correction is made to the TM or to the glossary these must be able to be filtered back into the original texts. Changes to TM would require comments so that translators can understand what has been changed and why. The original text would be marked for review with a note on the change in the TM or Glossary. These are then sent straight to the review step in the translation process. Some teams are small or operate very independently so this step of jumping it straight into review should be optional. Marking the item for review will also not take the current translation our of service just as is normal in the translation process. Optionally the person improving the TM or glossary could review all the changes that is if they have sufficient rights.

Glossary cultivation

In order for these resources to be useful they require continual weeding. The glossaries need to be reviewed to ensure that words are consistent. Changes to glossary terms should flag all instances of that use of that word. The glossary review process also allows glossary managers to review terms that translators have suggested be added to the language glossary.

The glossary creation should be able to suggest potential glossary words based on frequency of words within a project/domain. The glossary words that need to be supplied should be only those that occur within the project that is being translated. This process should happen before translation commences.

The top level glossary reviewers are also in a position to accept new words into a language or standard glossary. They would for instance create or examine the suggested glossary words. These are extracted automatically when a new project is uploaded. The reviewer will eliminate words that should not be in the glossary. They also review words that translators suggest should be in the glossary. These are often people seperate from the translators or language specific glossary maintainers.

As a glossary is very useful and we have mechanisms to flag glossary changes within the translations we make the glossary available at all stages. The user might see that an item is/is not reviewed but they always see the complete glossary.

TM management

TM management has many similarities with glossary management. Every time a translator translates something they create new TM data. The TM management allows translators to search for the occurrence of certain words or phrases and mark these for review or correction. Some of this is automated from the glossary management role.

The TM manager can also rate contributions. Placing a professional translators work on a higher footing or downgrading a translator for consistently incorrect translations.

The TM management also allows a translator to monitor contributions as they are made during a translate@thon. They can quickly correct spelling or grammar and provide feedback as needed.

Incorrect translations should be corrected at source. Translations that are sourced from other Pootle servers should be marked as incorrect and that information fed back to the source Pootle server.

The TM manager should be able to track the quality of translations froma certain translator and observe their skills level. This allows differing level of feedback depending on how green the translator is. TM feedback should be through the normal feedback mechanism and be private to encourage and not embarass contributors and to offer it as a learning experience in the same vein as PGDP.net.

Translate Toolkit

The translate toolkit is a command line set of tools used by Pootle but also used on their own by many developers. The toolkit will still continue in its own right but offer more functionality.

Format Interchangeability

Currently the Gettext PO format is the basis of all localisation in FOSS. However, the XLIFF standard from OASIS is an emerging standard. Therefore the toolkit will be abstracted so that either PO or XLIFF can be used. This allows the tools to continue being useful for localisers while they remain on PO but also allow migration to the richer XLIFF standard.

New formats

There are no plans in the wordforge project to add new formats. But consideration of what new formats to add would be based on:

  1. Need - does someone need it now
  2. Impact - would this make a lot of new things localisable, and
  3. Effort - how hard would it be to add this format

Converters

The toolkit has specific converters to process files from one domain to another. Eg Mozilla to PO. The converters are being abstracted in such a way that the formats should be pluggable allowing any format to be converted without having to create a convertor helper application. This should make it possible to create a converter by simply creating a class that understands the format eg Framemaker with a layer that can supply the data required by the converter and thus you create a converter for FrameMaker to PO or XLIFF.

Embracing other convertors

There are other convertors that exist for some formats not covered by the toolkit (e.g. po4a which can transform a number of FOSS specific file types to PO). One option is to reuse the code or wrap such tools so that the toolkit can use those formats. One problem is that they might be PO specific and not be able to do XLIFF. The same list of determinants applies here as it does to adding new formats.

Understanding the complications of new languages

Some comments show that filters can be improved with language-specific information. An example is in this Language Specific Comments page.

Off-line Translation Editor