Translate Toolkit & Pootle

Tools to help you make your software local

User Tools


File System

How do we layout the filesystem on Pootle.

DB vs File System

We are in agreement that our goal is to store translation information in XLIFF files, as it is our future format, and therefore no conversion will be needed for generating them (except when they are imported into Pootle from the FOSS project).

We have this debate always and often. Yes we could put all the translation messages in a database. That would make it faster right? Not necisarily as the one thing we want to be able to do quickly is text matching. No not SQL LIKE statements proper fuzzy matching.

One role of Pootle is to deliver files to users. So why not deliver files from a filesystem? This decision makes it easier for us to rely on Revision Control Systems to manage the versioning. And other file based solutions to manage distribution.

We might use a DB to store meta information and information in transition. In fact we do some of that already with PyLucene.

On Thu, 2006-02-09 at 09:06 +0700, Javier SOLA wrote: I think that this is great basic work. If we get XLIFF and TMX in basic

classes, and then dedicate the efforst to the file system for PO files
(but always considering that it also needs to manage XLIFF in the
future, and that it might be interesting to label a project as PO or
XLIFF (with internal DB or through directory structure).

I think we might want to label what the source format is. ie PO for GNOME.

I still err on file system not DB. But of course with the filters you could always add po2sql :)

File system Structure

Transalatable files will be stored using the operating system's file system, considering that the software piece is the most important. The levels will be:

  1. Piece of software
  2. Version
  3. Language

And, inside the language directories we will find the translatable files, which in many cases are not all in one single directory, but structured in subdirectories. For example, OpenOffice files are normally in one directory that has subfolders for each OpenOffice subproject…

/
|
| -> /Firefox 
| -> /OpenOffice
        |
        |/OpenOffice/2.0
        |/OpenOffice/2.0.2
              |
              | /OpenOffice/2.0.2/lo
              | /OpenOffice/2.0.2/km
                     |
                     | /OpenOffice/2.0.2/km/avmedia
                     | /OpenOffice/2.0.2/km/basctl
                          | /OpenOffice/2.0.2/km/basctl/source

But Pootle will have to allow user to look at the files from the point of view of a language-centred project, taht is, looking at a given language and seeing which projects are being translated to that language in this Pootle server, and then which version. It will be transparent and this view will be given by Pootle using information stored in the Lucene DB.

We also considered doing the file system with language at the top level, but if we use a Lucene to present information in Pootle to the user, this does not really matter.

  • In version 0.9 of Pootle only PO files will be stored in Pootle.
  • After version 0.9, only XLIFF files will be stored but they will be downloadable also in PO format, with Pootle doing the conversion from XLIFF to PO on the fly.

Meta Information Storage

How do we store the meta information around projects and files? Currently this is either stored in flat files or in PyLucene.

Alternatives:

  • Database
  • Files (additional or within the PO/XLIFF files)
  • Lucene

Opinions

If we use a database we have to make sure its a standalone one that doesn't require another service for Pootle

Files allows us to use CVS directly to store versioning information

Lucene allows us to use what we already use for text searching to store our meta data. Its very quick, unstructured and can be revuilt quickly from the files. No dump and load. Just dump and reload from files. Slow to start but lightning quick. Lucene is really Files + Quick Indexing.

FIXME {DB I still prefer the Lucene way although I think we need to use fewer files or hidden files to track our change info. I don't like the DB idea as it means structure that have to be changed between versions its volatile it requires admin.}