OpenOffice.org
OpenOffice.org is the leading cross-platform Office suite. Its a large project and a large localisation undertaking. But it is an important component of a localised desktop
Resource
- KhmerOS --- These pages prepared by Javier SOLA are by far your best resource for localising OpenOffice.org
What is your language's LCID
Microsoft defines LCIDs for various locales. You need to know this so that OpenOffice.org can work well on Windows and also so that documents you create can move seamlessly between MS Word and Office Writer as the language identifier is correct.
There are a number of places that you can use to identify the LCID. For most languages they will all agree but in some cases (See 1072/Sutu/Sesotho) it helps to look at all list to help clarify what exactly Microsoft meant.
- Windows XP/Server 2003 includes XP service pack 2
- National Language Support Constants much clearer layout and mentions codepage. Doesn't seem completely up to date though.
What to do first
This is a very large application. If you can do a smaller section of the total and still have a useful product then that will help. We created this rough targeting guide using OpenOffice.org 1.1.3 and podebug
Localisation
lots missing here
gsicheck
The OpenOffice.org guys have a tool for checking the GSI file called gsicheck. But of course you don't want to build the whole of OpenOffice.org simply to get one tool. pofilter will pick up most errors that gsicheck does but its nice to know that your GSI is good before submitting it. To use gsicheck first download the required version:
Then install it and use it
tar xvzf gsicheck-1.7.8_2.0m122.tar.gz cd gsicheck-1.7.8_2.0m122 ./gsicheck -c <GSI/SDF file>
Now go and fix the errors that it detected. You should correct these in your PO files.
AutoCorrect
The OpenOffice.org AutoCorrect file is a zip file called for example, acor_en-US.dat. Søren Thing Pedersen has created csv2acor.py which generates an AutoCorrect file from CSV sources. The autocorrect file contains 3 XML files:
- DocumentList.xml - pairs of mistyped words and their correct spelling
- SentenceExceptList.xml - abbreviations that end with a fullstop that should be ignored when determining the end of a sentence
- WordExceptList.xml - Words that may contain more than 2 leading capital eg. CDs
When using csv2acor.py your need to have 3 files with the same name as above but with a .csv file extension. WordExceptList.csv and SentenceExceptList.csv contain just a list of entries one per line surrounded by double quotes (”). DocumentList.csv is a comma separated list with the mistyped word in the first column and the correct word in the second column, all also surrounded by double quotes.
WordExceptList.xml
If you have an existing spell checking wordlist then use the following to extract potential words:
egrep "^[A-Z][A-Z][a-z]" spell-wordlist > WordExceptList.new
This extracts all words that start with two capitals followed by a lower case letter. Add all the characters valid in your language.
SentenceExceptList.xml
If you have an existing spell checking wordlist then use the following to extract potential words:
egrep "\.$" spell-wordlist > SentenceExceptList.new
This extracts all entries that end in a fullstop.
DocumentList.xml
If you have an existing DocumentList.xml you can convert it to CSV using the following:
sed "s/<block-list:block block-list:abbreviated-name=\"/\"\\n\"/g;s/\" block-list:name=\"/\",\"/g;s/\"\/>//g" < DocumentList.xml > DocumentList.csv
Your'll need to edit DocumentList.csv to remove some of the remaining XML data.
A cleaner method is to use the following XSLT - this way you don't have to clean any XML data (so this is suitable for batch mode):
<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:block-list="http://openoffice.org/2001/block-list"> <xsl:output method="text" encoding="utf-8"/> <xsl:template match="//block-list:block"> <xsl:text>"</xsl:text> <xsl:value-of select="@block-list:abbreviated-name"/> <xsl:text>"</xsl:text> <xsl:text>,</xsl:text> <xsl:text>"</xsl:text> <xsl:value-of select="@block-list:name"/> <xsl:text>"</xsl:text> <xsl:text>
</xsl:text> </xsl:template> </xsl:stylesheet> </xml>
Run this script through any XSLT processor, e.g., for Saxon, type:
java -jar saxon8.jar DocumentList.xml <name-of-xslt> >DocumentList.new
Generating your new AutoCorrect file
Then run csv2acor.py acor_xx-YY.dat where xx-YY is your language and country code.
Spell Checker and Hyphenation in the official build
In order to add your spell checker and hyphenation file to OpenOffice.org CVS you need to do the following:
- Ensure your license is compatible
- Fill in the form at http://external.openoffice.org/
- Fill out an Issue assigned to mh who needs to process the approval for inclusion
Holidays
- wizards/source/schedule/LocalHolidays.xba
Looks like a StarBasic program that allows you to specify holidays, etc.
need to check this more carefully
Child Workspace
OpenOffice developers use what they call child workspaces to make fixes and commit changes. These are usually linked to related bugs in IssueZilla.
Here some instructions to help you track your changes and see if they have been integrated/fixed:
- log on with your openoffice account. Example coni@openoffice.org, and password
- click Childworkspaces
- click Search
- enter localisation% in the Name field
- wait….
Now you see which l10n CWS have been integrated and which not. By clicking on the CWS name you see the list of the bugs registered to that CWS. Once approved by QA you'll exactly know in which milestone the CWS has been integrated.