Translate Toolkit & Pootle

Tools to help you make your software local

User Tools


VCS Revamp

2255 arises wider discussion about how we should be dealing with VCS systems.

We could see two very different directions to support VCSs:

  • Support VCSs natively in the code (Pootle's approach)
  • Support file operations that don't require the software to interact directly with VCSs. (Transifex's approach)
  • Dwayne's approach…

Basically the first approach builds a layer that supports VCS natively, whereas the second approach builds a layer that supports push/pull/update operations from the command-line.

Use cases

For simplicity, these use cases will always talk about .po and .pot files to keep things simple.

Getting a project into Pootle the first time

A developer might have an existing project, with existing .po files, and one or more .pot files in VCS.

Status quo: server admin has to do a checkout/clone, possibly setup authentication with the VCS and add symlinks. Adding a new project then (or rescanning) should pick up all languages and files.

Wanted: Setup a new project with VCS integration without command line access to the server (for example: a URL (and subdirectory) on the web interface, or, upload stuff with a client application).

Get translation work from Pootle into VCS

The developer wants to have the latest work done on Pootle committed to the .po files in VCS. In the simplest case a new .po file is somehow constructed and committed over the current version.

Status quo:

  • Pootle users with the commit permission is able to commit individual .po files from the web UI when they want to.
  • Can commit multiple files/languages/projects with a management command commit_to_vcs (manually or via cron)

Wanted:

  • Commit all files in one go (on the web)
  • Commit all files of multiple languages in one go (on the web) (?)

Pootle users want/need to get upstream changes

If a developer or other team members commit directly to the .po files in VCS, the changes should at some stage reflect on Pootle for translators/reviewers working there.

Status quo:

  • Pootle users with the commit permission is able to update all files in the translation project from the web UI when they want to.
  • Although the UI provides only the option to update all files at once, the code actually does it file-by-file.
  • For each file update, we get a clean copy from VCS, update it to get the most recent copy to identify the changed units in VCS, and then these get preference over any Pootle translations for these units. The existing translations in Pootle (in case of conflict) becomes suggestions. This ensures that VCS updates are always conflict free (in the sense of VCS conflicts).
  • When updating file-by file, in a VCS like git we actually update the whole repo (even possibly including other languages), and therefore make the update operation for the next file less meaningful. Also, since we are not aware that the VCS is trying to update another file, we might introduce VCS conflicts in case the merge was not successful.
  • Can update multiple languages/projects with a management command update_from_vcs (manually or via cron)

Wanted:

  • Faster update
  • Correct updating in case of VCSs like git.

Should we be merging as often or even automatically? (db) since we see VC as king it would make sense to updating continuously either through a regular check or an API that responds to a post commit hook.

New files added in Pootle

New files added in Pootle should be committed to VCS at some stage.

Status quo: Pootle master commits new files to VCS if they are created by updating to templates.

Wanted: Nothing more?

New files added in VCS

(Bug 2255)

New files added in VCS should reflect in Pootle at some stage (if the language is added).

Status quo: New files need to get onto the file system and we need to rescan.

Wanted: When updating the project/translation project, any new files should automatically show up in Pootle.

Files removed in Pootle

Files removed in Pootle should be removed from VCS at some stage.

Status quo: Someone will have to remove from VCS, and someone with appropriate privileges need to remove it from Pootle as well.

Wanted:

  • Some way to remove from VCS when removing from Pootle
  • Confirmation

Files removed from VCS

Files removed from VCS should be removed from Pootle at some stage.

Status quo: Someone will have to remove from VCS, and someone with appropriate privileges need to remove it from Pootle as well.

Wanted: When updating the project/translation project, any files found to be removed from VCS should automatically be removed in Pootle. We need to take care not to remove _other_ files (like pootle-terminology).

(db) our concern might be loss of TM through mistakes, so we might want to retire such strings in the db so that they can form part of any local TM or be used again if the strings return.

Code changes in VCS

Code changes in VCS could add/remove/change strings which would eventually affect the .pot file. Some developers might want these to be reflected in Pootle immediately.

Status quo: We don't really provide much help in such cases. Someone would have to generate the .pot file, upload it to pootle and update to templates.

Wanted: ??? Lots of things would be nice, but this is probably not priority right now.

See VCS status

A user would like to see what the status is of their file(s) with regard to the VCS system. This would be something similar to the output of “git status” or similar. For example, this would way that there are changes that are not checked in, or that there are changes in the VCS that are not yet reflected in Pootle (need to update).

Status quo: We don't provide any help here

Wanted: Some simple (graphical?) representation would be great.

Advantages and disadvantages

Pootle

Advantages:

  • Translators commit their own work, so the work is committed only when the translator decides her work is ready to be used.
  • The translator/committer is mentioned in the commit message.
  • Translators can use VCS for specific checkpoints, before a big change, for example.
  • Easier to show source code (in theory)
  • We can easily import all other files, not just the pot/template.
  • We can merge translation files with VCS even when somebody committed to VCS directly bypassing Pootle.
  • Updating might be quick if the changes are small, although there is no way to no ahead of time.

Disadvantages:

  • Not easy to setup new projects, the administrator needs to log into the server and manually clone/checkout.
  • The push/pull procedure is manual for each translation project.
  • Translators have to commit their own work, which can lead to forgetting about the extra step or other security risks.
  • It is necessary to create a separate user for committing purposes.
  • We need to support almost every VCS within the code.
  • Certain repositories can be huge to pull, specially in DVCSs.
  • In DVCs project developers might be required to create separate special repositories only for being able to use them in Pootle.
  • Directory layouts have to follow certain conventions/structures.
  • It adds a dependency on the command line tools for the VCS.

Transifex

Advantages:

  • Easy to setup
  • Any source file hosted on any web server on the Internet can be used as a source for translation.
  • The software doesn't need to deal directly with VCSs; the developer deals with that as if she was committing code.
  • Even if the source files are in a DVCS, it is not necessary to create separate repositories for localization files.
  • Developers can use any directory layout for their localization files.
  • Updates to the pot/template might be faster.

Disadvantages:

  • Takes resources as a unit: what about projects with lots of files and directories used as source? (e.g. Mozilla products).
  • Doesn't allow to work on translating on a *language* both at the repository level and the application level at the same time.
  • Still (probably) doesn't provide a way to convert easily between pootle's file naming convention and the repository's own (possibly) weird naming convention.
  • It requires the developer to get to know another tool (the client application) and might complicate build infrastructure

Consequences

Common

  • If we want to react to changes in the POT file, users will depend on a cron job, or we need to use django substitutes
  • Our version control code need to learn more about working with directories/clones rather than being completely file based.

Fetching POT from URL

  • We can't easily support any interactions involving anything but .pot files. A commandline client for developers (see below) can address some of these issues.

A commandline client

  • Need network API between client and server
  • Development work on the server side
    • Need authentication mechanism for client to use (something probably exists that we can use)
  • Development work on the client side
    • Need authentication mechanism for client to use (something probably exists that we can use)
    • Updating directories and detecting all changes
  • Have to find a release mechanism. Inside the Translate Toolkit makes sense (it will depend on it anyway). This might force us to release Toolkit more often and document the matching versions well.
  • Some documentation for the client
  • If complex project-wide updates with merges are done over the network, we need to see if time-outs are a problem

Separate checkout

  • Store/use repositories not as part of the podirectory
    • Code / documentation to handle upgrade to the new system
    • Hooks also need to be adapted to work with names relative to the podirectory/checkout directory instead of absolute paths. We need to document this as well.
  • On commit and update, we need to handle the files living in two locations.

Conclusions

Overall Transifex's approach seems to offer more advantages and less hassle for developers than Pootle's current approach. Adds more simplicity to the code and is scriptable and automatizable on the developer's side.

On the other hand, Pootle's current release schedule is tight if we want to implement a solution similar to Transifex's. It is not easy and straightforward to change Pootle's code to work in that way. Plus, we would need to implement a separate command-line client that communicates with the Pootle server. It is a lot of work.

Taking all of this into account, probably the best way to go is to try to fix bug 2255 without extensively changing the current architecture. Afterwards, early in the next release cycle, a change to the whole architecture would make more sense, getting the advantages of the chosen approach in the next release.