Translate Toolkit & Pootle

Tools to help you make your software local

User Tools


Doing stuff with a Pootle log file

It is very difficult to track activity in Pootle. No detailled log of activities is kept.

For example, you cannot ask the log when a certain string was translated, or who translated it. You can't ask the log for a list of strings translated by a certain translator. You can't see which translator reviewed which other translator's suggestions.

At least there is the HTTP log. This log contains POSTs and GETs, and much information can be deduced from this log.

This is my attempt to write some procedures for generating certain statistics from a Pootle log, which may be useful for reporting back to a sponsor or funder.

Lines versus strings

The HTTP log does not keep record of the number of words translator, or indeed of the type of operation performed by the translator.

However, most lines with POST operations indicate places where a translator pressed the “Submit” button. Therefore, the lines with POST can be taken as a good indication of the activity in a certain project, a certain language or by a certain translator.

Another useful piece of information in the HTTP log is record of user name activations. Registrations are not logged, but activations are.

Anatomy of a log file line

The typical log file line consist of the following segments:

  • date
  • time
  • a 3rd field
  • username (eg walterl)
  • a 5th and 6th field
  • the word POST or GET
  • the URL, which usually contains the language name, project name and file name
  • one or two more fields

Initial data extraction

Get all POSTs

Since POST operations indicate most activity, we first grab all lines that contain the word “POST”:

grep POST pootle.log > posties.txt

Most operations are done on the posties.txt file.

Day by day activity

One way to get daily activity from the posties.txt file is to create a file for each day, with all of that day's activity in that file. I'm sure clever scripting can do this, but I just use a long BAT file.

per_day_.zip

These daily files can then be processed further. One thing that is nice to know is the number of POSTs on a certain day. This can be counted by counting the number of lines in each date file. I use a simple AutoIt script for that.

numlines.zip

To use the numlines script, create a file named files.txt that contain the full paths of all the files you want to process (one per line). The output is a text file that contains the name of each file and the number of lines in that file.

Activity per user

It is also useful to know what each user has done, and when. The start for such information is to create a file per user with all their lines in the file.

First, get a list of all users. The page yourpootleserver/admin/users.html has that information. Next, open a copy of the posties.txt file and do this find/replace:

find: .1[space]-[space]
replace: .1x-x

Now you can use simple grep, in a BAT file, to grab all lines pertaining to specific users. The BAT file should have the following syntax per line:

grep x-xUserName posties.txt > UserName.txt

per_user.zip

And again, use the numlines script to get a report about how many lines were translated per user.

Activity per language

In my case it was useful to split the posties.txt file by language, to see which languages had had little or no activity.

per_language.zip

You can get a list of language codes from your Pootle server home page (view source and remove fluff). Language codes appear in the HTTP log between two forward slashes, eg /af/.

And again, use the numlines script to get a report about how many lines were translated per user.

Dates of user activations

In itself, it may seem dull to know when users had activated their accounts, but keep in mind that registrations are not logged, so activations are important.

Grab a file with the activations as follows:

grep activate.html posties > activations.txt

Then run a modified numlines script to get then in cleaner format.

activations.zip