Translate Toolkit & Pootle

Tools to help you make your software local

User Tools


Specification: history of changes

From bug 2181:

Communities with several translators it would be useful to have a history of changes to the translations. This would make transparent the changes people made.

Originally from https://bugzilla.mozilla.org/show_bug.cgi?id=627908

The bug reporter added:

The main intention of this feature for me was to be able to check the changes of strings in a whole project (like SUMO, AMO or Firefox Input) over time. I (as reviewer and uploader) want to see who has made which changes. So I can go through these changes quickly and do corrections without using other tools outside of Verbatim. History should help for quality and efficiency.

Angle of attack

At least two views of the problem are possible. The are not necessarily mutually exclusive, but a stronger preference for the one over the other might make differences in the design and implementation.

Unit level history

In this view, the problem is modelled as a way to store information about a unit, its history, and all the actions that result in it being the way it is now. It is comparable to the History page of an article on MediaWiki. This helps to answer questions like

  • When was it changed to this translation/state?
  • Who did this?
  • Is this contentious or subject to disagreements? (lots of changes over a short period of time, for example revert wars)

Detailed user activity

In this view, the problem is modelled as a way to keep track of all the translation activities of a user. This is similar to the User contributions page on MediaWiki. This helps to answer questions like

  • Is this user active?
  • What is the quality of this user's contributions? (Also, is this a spammer?)
  • Which projects/files are this user contributing to?

Use cases

Reviewer

(refer to the comments of the original reporter above)

  1. see what changed since an event (login, update, commit) in a project to go and review just those changes
  2. see the history of the current (or some) unit to know who slipped up
  3. see the changes performed by a certain user

Translator

  1. Why is this string the way it is now? Why is this string fuzzy?
  2. I translated this but the translation hasn't been kept. Who made that change?
  3. Show all changes to my translations in some project since I made them
  4. See if something was reverted to learn from it.

Implementation

We should implement this apart from the main PootleStore table. This is then simply a log of changes that can be retrieved when needed. This way none of the existing pages on Pootle should need to use this (probably big) table for normal operation.

The model pootle_statistics.submission is already a table with some of these attributes. It is currently used for the top contributor tables at the bottom of several pages, and to indicate the last activity in a project/language. It links to translation_project, so is able to reasonably easily do the joining required by the bug reporter.

Questions

  • How do we identify a unit? Unit IDs are unique on a server, but what users consider to be “the same string” might get a new ID from time to time (maybe with an overwrite upload, for example).
    • If the uid disappears after an overwrite, its history should be kept and the pointer to the unit should be updated to reflect the new uid. - JRA
      • I think the only case where the id would change and we are aware of that is where we actually will keep it the same. I think if the ID changes, we have no way of knowing. --FW
  • What happens if a unit disappears? In the unit view, this means the history might need to be removed, in another, the activity of the user remains as true as ever.
  • UI/interaction: Where does the user find this?
    • The log of the unit might be available by click on a icon in the unit's toolbar. The history is shown in a fancybox window.
    • User's detailed activity might be shown in the user's profile, by adding a link to the detailed statistics next to the currently shown stats.
  • How much of it is presented at which time, etc. ?
  • Permissions: Who is allowed to view the discussion? Anyone with view permission?
  • Do we want to trim/compact this table at some stage?
    • If a unit is deleted and along with that its history too, this might lead to missing information for the specific contributions made by a user.
  • How do we present this to users in the documentation/marketing? Is this history like in Wikipedia (complete traceability and authorship information), or is it just “some” recent activity. What are the usecases we will present to demonstrate the improvement in announcements, etc?
    • I consider this more like history in Wikipedia rather than recent activity. Recent activity is already shown at some extent.

Data model

We could reuse pootle_statistics.Submission. We already pay the price for updating these for changes on the web interface. For uploads, we only add a single entry for the sake of “last contributor” data, not one for each unit. We could now consider doing one for each unit, but that makes uploads slower, and might only work for merging uploads.

Current fields in pootle_statistics.Submission:

  creation_time       = DateTimeField(auto_now_add=True
  translation_project = ForeignKey('pootle_translationproject.TranslationProject'
  submitter           = ForeignKey('pootle_profile.PootleProfile'
  from_suggestion     = OneToOneField('pootle_app.Suggestion'

This model already allows us to:

  1. sort submissions by date
  2. get the name of the last contributor to a translation project, project or language
  3. filter by profile (submitter)
  4. filter by translation_project with no joining
  5. filter project or language with only a single join

This model doesn't allow for:

  1. tracking suggestions (from_suggestion is only specified when a suggestion is accepted, in other words, when the real unit is affected)
  2. knowing which unit is affected
  3. seeing which fields changed or their values
  4. multiple related changes, if the creation_time of them is not the same https://code.djangoproject.com/ticket/16745

Comments:

  1. from_suggestion doesn't seem to actually be used anywhere

New model:

  creation_time         = DateTimeField(auto_now_add=True
  translation_project   = ForeignKey('pootle_translationproject.TranslationProject'
  submitter             = ForeignKey('pootle_profile.PootleProfile'
remove? from_suggestion = OneToOneField('pootle_app.Suggestion'
  unit                  = ForeignField('pootle_store.Unit', null=True, on_delete=models.SET_NULL)
  field              = # see below
  type               = # see below
  old_value             = TextField( which can be empty
  new_value             = TextField( which can be empty

.field will be one of a list of enumerated integers corresponding to source, target, comment, state to know what changed, or a string with some kind of pseudo Django field specifier, like pootle_store.Unit.target or pootle_store.Unit.state.

.type will be one of a list of enumerated integers corresponding to the type of submission that this was. Possible values include

  • None (normal interactive submission or no information available)
  • Revert (someone used the revert functionality in the UI)
  • Upload
  • Merge
  • Update from templates
  • VCS update
  • Batch manipulation
  • Automatic manipulation (bots, autocorrect, etc.)

Some of these are not possible with the current GUI yet, and for some the proper type might not be set correctly. For example, we don't currently create submission objects for each unit changed in an upload, for example.

This model will allow for:

  1. see a history for a specific unit
  2. see a list of all activity in a translation project (or language or project or the server)
  3. knowing which unit was changed in a submission
  4. seeing a diff or some view on exactly what the change was
  5. querying for units with more than X number of edits in a given period (contentious units)
  6. Show all changes to a certain user's contributions afterwards (requires 2 queries and maybe some extra code to filter)
  7. Better stats on what exactly a user contributed (?) - the diff can say what was added, but this is probably not useful.
  8. Should allow tracking changes to translation_projects (description, files, etc.) but I'm not sure we should go this route.
  9. distinguishing between reverts and other types of changes

This model will not allow for:

  1. tracking changes to suggestions
  2. tracking changes to projects or languages
  3. multiple related changes, if the creation_time of them is not the same (see above)

UI

As a start, probably only the history of target text of a single unit will be implemented.

Single unit history

This should be accessible from the translate page. The link should be somewhere out of the way, since this is unlikely to be used frequently. An alternative is to be grouped with the “report bugs” link, but this clutters to the toolbar of the source text.

A simple view will open in a fancybox containing a single column table. Each row contains the next/previous value for the target. We do diff highlighting from the previous version. In future, we should support revert/restore actions from here.

For interactive editing, the old_value of a target should always correspond with the new_value of a previous submission, therefore showing it in two columns is unnecessary. In case of an upload, VCS operation or similar, the change won't be tracked per unit, and we won't have the information anyway. The old_value from a Submission might be useful, and we know where it fits in (right before the editing time) so we can simply give it an extra row in that case (if it isn't equal to the new_value of the previous entry). However it won't be easy to indicate the effect of uploads/VCS operations. We know when uploads happen, but we don't know what changes they introduce, or even which units were affected at the time.

Dwayne suggested the following, which combines several elements associated with the unit, and provides the user with a way of choosing what to see.

Some ideas on IRC from Igor (edited for flow):

I'm not very keen on hiding human suggestions/amagama suggestions and history behind tabs, because the first two should be visible as these are one of the main tools for translators. History is something that I'd like to see on demand. If we have tabs, this will mean that people will have to constantly switch between two. I would like to see TM and suggestions always, too, and the best thing to see them together to be able to find the perfect source to work with.

He also added: If there is “history mode” which might be useful as a default view on queries like “what changes happened to my strings since last week?”, then we can maybe activate that mode by default when we enter the translate page from such a link/entry point (with a parameter).

History would be available with a single click on a link somewhere near 'Add comment', or, if the comment already exists, below the comment box area. If we sort history by having most recent edits at the top, it will be logical to have the comment area above the history, so that when you add a comment, it will appear at the top of the history (and same applies to translations).

julen: Take into account that with the work in bug 2180 there might be more than a single comment per unit.

iafan: yeah, we will keep an interleaved history of comments and translations. Probably this means that the current comment box might need to go away (i.e. there's no reason to “edit” a comment). You will just add a new one and the last one is which will be displayed by default

Igor's proposed layout is: history (on-demand by default), then suggestions, then TM results. history = translations + comments. I'd display the last comment (if any), the way we do now, but not allow to edit the last comment, just add a new one. So translation unit always shows current (latest) translation the way it is now, and one (latest) comment. All other is 'history' and can be expanded on demand. I'd also display e.g. last five history entries, with the option to 'show more', like we do for context rows.

Based on these ideas, and a further discussion in IRC, the idea was raised to combine more of the elements, with maybe only the history being shown on demand, or when in a certain review mode (which is maybe activated by clicking on a link that tries to answer some of the questions above relating to change history). Depending on implementation issues, some of the suggested changes might not be easy/possible, but let's ignore that for now.

So here is a rough mockup of what things could look like if combined (not quite what Igor proposed):

First implementation

The default view

With history expanded

Feedback (Dwayne)
  • Use Checkbox buttons or maybe Pills instead of the link with Hide and Show logic. Make it easy to group things that are on or off.
  • We seem to waste a lot of vertical space. We could regain some if we:
    • Drop the title e.g. “History” and “amaGama server”. These could be distinguished using row colour or info to the right of the line. If row colour matched the button colour that would also work, might look garish though.
    • Lose the gap between rows. This wastes quite a bit of space. Would mean doing somthing with alternate row colouring though.
    • Moving the history (and future discussion) on/off toggles into the same row as “Add comment” and special chars.
  • Lose the tooltip for the person who made the change. It means mouse navigation, I'd prefer it to be visible all the time. Bottom right makes sense or to the right on the row with the gravatar.
  • Missing date/time info in the history

Links