Translate Toolkit & Pootle

Tools to help you make your software local

User Tools


Google Summer of Code 2009 - Ideas

The following are projects ideas for Google Summer of Code 2009. The project has applied to be accepted as a mentoring organisation and these are project ideas that we have gathered together.

Our main aim has been to identify work that can be completed in 3 months, that is useful to us as the project but that is also challenging and interesting.

Our software include:

  • Pootle - this is an online translation and translation management tool used by many projects including OpenOffice.org and Creative Commons.
  • Virtaal - this is a desktop translation tool built for productivity and ease of use. Although it is new, we've had some very good feedback, and it builds on the powerful API of the Translate Toolkit.
  • Translate Toolkit - this is a set of tools and libraries that allows files to be transformed into translatable formats. Allows translatable formats to be managed and manipulated. The other tools use this as a library to allow management and manipulation of translatable formats.

Generic skill requirements:

  • Python - you must be able to code in Python. Experience in another OO language will help, but might make the work harder for you.
  • Experience in computational linguistics is useful in some projects but most do not need any specific language requirement.
  • Experience in localisation is helpful as you can then understand the needs of a localiser.

Included in each project are:

  • A grading
  • A description of the task
  • Where to poke in the code
  • Further reading
  • Possible mentors

If you want to discuss any of these projects then try us on IRC freenode.net #pootle or mail the Translate development mailing list.

If you want to apply as a student, you may also want to check out theofficial student application guide from Google.

You might also want to take a look at the list of ideas from 2008.

Complete the Mac port of Virtaal

Grade: Medium

Description: Virtaal already runs on the Mac, but there are several areas where we need to improve. Packaging has received some attention, but is not finished. Several platform specific bugs still exist. We could still do better with integration with the platform.

You would need improve our code to generate the application bundle, look at platform issues like file associations, user interface translations, and integration issues like the dock. Some features don't yet work correctly on the Mac, and some issues with dependencies might need to be considered.

You might also want to look at providing Virtaal and the Translate Toolkit through macports or similar distribution systems as a bonus.

Poke the code:

Further reading:

Possible mentors:

Custom GUI for spell checking in Virtaal

Grade: Medium

Description: Virtaal already provides spell checking by means of Gtkspell. Gtkspell is a bit limited in terms of support for Windows and OSX, and is not really extendible. We would like to provide richer functionality to users to only spell check translatable text, and to ignore accelerators, for example.

Your task would be to implement a GUI for spell checking similar to Gtkspell with the same level of functionality as a start (using enchant, supporting the personal word list, providing suggestions on right click). Then we need to add support for ignoring accelerators, and to define regions to be spell checked.

A successful candidate will probably look into the API in the Translate Toolkit for dealing with placeables to ensure that only translatable text is passed to the spell checker.

A magnificent success would be integration with the MS Office spell checker over COM (we have python code to do that) and/or integration with the platform spell checker on OSX (Enchant has some initial support for this without build scripts).

Poke the code:

Further reading:

Possible mentors:

Visual string differences for Virtaal

Grade: Medium

Description: Virtaal has a powerful system for providing suggestions from several translation memory sources. Pootle shows suggestions by indicating differences to the current translation. A useful improvement to the Virtaal GUI would be to indicate differences between the current text and the suggestion to indicate insertions, deletions, etc. This way a translator can instantly see which parts of the suggestion needs to be adapted to reuse it. This is also useful for other sources of strings.

You would need to teach the TM widget in Virtaal and/or Pootle how to highlight differences using something like difflib. (Pootle already does this for suggestions.) This will involve some coding with GTK+ (Virtaal) and/or HTML (Pootle) to present the differences in a pleasing way to users.

Now you should implement support for showing the previous translation for the formats that support it (#|msgid in PO, for example). Alt-trans in XLIFF is probably easily doable at this stage.

To ensure a uniform experience for users, it will be necessary to ensure that Pootle and Virtaal offers a similar experience to users with similar colours, etc.

Poke the code:

Further reading:

Possible mentors:

Improved interactivity for Pootle

Grade: Medium

Description: Pootle has proved to be a simple system popular with many small language teams and for several projects. Online translation has unfortunately always been a bit slow due to network latencies, especially from countries with lesser internet connectivity. The addition of some clever AJAX to some pages will help make Pootle feel much more interactive, and might even lessen the load on the server a bit.

A start to this project could be to provide Pootle statistics in JSON form for AJAX code (and other clients) to be able to obtain it from Pootle easily. Pages showing statistics could then test if stats are available when the page is built. If not, we rather put in an AJAX call to do it later while ensuring that the page is still sent to the user quite quickly.

The main part of the project will be to allow continuous editing on the translate page of Pootle with AJAX queries helping to keep the data available by sending submissions asynchronously and prefetching data necessary for translating the next units. A proper implementation will have to support all features of the translate page, including terminology and translation memory, translator and developer notes, suggestions, etc.

Poke the code:

Further reading:

Possible mentors:

Segmentation for Virtaal

Grade: Medium

Description: Segmentation is the process of taking a block of text and breaking it into segments, such as sentences. While initially this looks simple, you might find problems as soon as you start using non-trivial text. Abbreviations in English could confuse a simple method, for example.

The main advantage of segmenting is that it allows us to use translation memory at a sentence level. Thus in a block of text you might have 3 sentences and 1 of which will match 100% while the others might match less and need to be reviewed. If you had not segmented you would probably not have matched anything.

The Translate Toolkit already has a simple tool for sentence segmentation, called posegment. This will give you some idea of where to start to do the segmentation in different languages. For Virtaal, you would have to use this information to indicate the current segment in the current string and allow a user to interact with it (for example with Ctrl+down and TM lookup).

Your main tasks in this project will be to:

  • Provide a GUI to display the currently active segment
  • Enable some current string level actions to work on a segment level instead and define the user interaction for these cases (like copying source to target, TM lookup and reuse).
  • Allow the user to correct the segmentation where the automatic method went wrong by altering the bounds of the segment as detected by the automatic method.
  • Extend the current TM server and/or API to be more aware of segment issues, and probably to store strings segmented and unsegmented.

Optional further tasks:

  • Add proper support for the <seg> tag in TMX and the <sub> tag in XLIFF.
  • Implement the SRX standard that allows segmentation rules to be specified in XML.
  • Use pyICU in the toolkit to allow us to use their segmentation rules (or find some similar established segmentation software, or expand the existing segmentation software in the toolkit)

Poke the code:

Further reading:

Possible mentors:

Improved XLIFF features

Grade: Medium

Description: The XLIFF standard is an XML based standard for localisation. It can store various state information and can be adapted to manage a translation workflow. Furthermore XLIFF can contain suggestions in <alt-trans> tags that could be reviewed in an editor and removed as the unit is updated.

By workflow we mean the simple process that moves from untranslated → translated → reviewed → approved. There are also processes for updating existing translations. These can be more complex where the review is 'authoritative' (the reviewer can make changes) vs. 'non-authoritative' (these are simply suggestions to the translator who then decides if she wishes to fix them).

This work would involve defining the possible states for XLIFF and other storage formats and defining an API that will make it easier for our tools to access and manipulate these in useful workflows for translators. Whereas currently most of our tools enforce a fuzzy/not fuzzy way of thinking about units, we should now have a list of states that are applicable to the format being used.

This is not a workflow engine. Our goal is not to make a workflow editor, but to create a set of standard workflows that meet the needs of current translations and exposes the inherent workflow of the file format.

Your main aim is to stay focused on the basics of unit states for the major formats (PO, XLIFF, TS) and deliver a solution that allows basic interaction with it in Virtaal and/or Pootle.

Furthermore you can look at helping users of Virtaal and/or Pootle to deal with suggestions in <alt-trans> tags by cleaning them, or removing them as they are used.

Poke the code:

  • phase - is useful to understand some tools used to manage process

Further reading:

Possible mentors:

General Improvements (Feature additions) to Pootle

Grade: Medium

Description: While working with Pootle at OLPC, we have come across a number of feature requests, most (if not all) can be implemented within the GSoC timeframe. Some of the most high priority ones among them are

  • Ability for the translation admin to merge the translations with the latest POT for her project.
  • Ability for the Pootle admin to easily set permission for languages on a global basis (ie: give user foo admin rights for all Spanish translation projects)
  • Support for validation of translated strings on submission (equivalent of msgfmt --check, but for individual strings),
  • Integration with http://open-tran.eu/ (use the XML RPC interface of the site to generate suggestions)
  • Ability for language administrator to get in touch with members of the translation team

Poke the code:

Further reading:

  • Other reading on the topic that would help

Possible mentors:

  • Sayamindu Dasgupta

Project Ideas Template

Grade: Easy, Medium, Hard

Description:

Poke the code:

Further reading:

  • Other reading on the topic that would help

Possible mentors: