My last two wishes were a generic version control system library based on git and document-level versioning for Calligra (I’m talking about Calligra here but this applies to many applications, it’s just that Calligra is one where this fits very nicely). Those were the building blocks for this wish: in-document translations.

Say you are a company present in several locations and you want to send a document to your customers. You want to please them, make them know they are loved. One effective way to do that is to write them in their native language.

I don’t know what is your workflow when you have to write a document in several languages but this is mine:

  1. Create original document, generally written in English for proper review by everybody with a say and a vote. Save it as document_english.odt.
  2. Translate to Catalan and save as document_catalan.odt
  3. Translate to Spanish and save as document_spanish.odt

Now when I want to send the document and translations to someone, I need to send several files.

Drawbacks of my workflow:

  • When attaching several documents to an e-mail, it’s easy to forget to send some translation (oh, I forgot to send you document_urdu.odt!)
  • If the original document (document_english.odt) is updated, I need to remember and manually update document_spanish.odt, document_french.odt, etc
  • Oh, and each document has to be translated on its own

Enter my imagination: I want better support for translations in Calligra. It would comprise two parts: in-document translations and automated translation.

In-document translations means instead of having document_english.odt, document_french.odt, etc, we would have a single document.odt that would contain all the translations.

How to do that? By means of the versioning method I wished about yesterday and git branches.

The “master” branch in the document would be the original language the document was written in (say, English) and then, when I want to translate the document to Japanese, I would go to Calligra Words, click on “Create new translation” and choose the language I am translating to. This would create a new branch, essentially a “git checkout -b master japanese”.

Of course it would be better to use ISO 639 codes for the branch names, so that we can show localized language names, i. e. if I am using Calligra in English, I want it to say the translations available in that document are “Original document, Japanese, French” but if I am using Calligra in Spanish I want it to say the translations available are “Documento original, Japons, Francs”.

By using git versioning, it would also be quite easy to introduce some “marks” to know when the original document has been altered and therefore the translations need updating (“git diff” to the rescue 🙂 )

Changing the language of the master branch should also be possible by adding a “make default language” option to Calligra. The default language would be the language in which Calligra opens the document when several translations are available.

Now, let’s go for the second point of my vision: automated translations.

A bit earlier I said to create a new translation I would go to Calligra Words, click on “Create new translation” and a new branch would be created. In addition to that, we could ask the user if he wants to do an initial translation using some automated tool like Apertium (want to translate from Tajik to Persian? Apertium does), Google Translate, Babelfish, etc

And now the twist: I’m not a native English speaker. After each BehindKDE I write, I go to Jonathan Riddell and ask him to read the interview, spellcheck, make sure the words not only are correct but are in the proper order, etc (and he kindly does and never complains, thanks Jonathan!).

Would you not like to have a Jonathan Riddell in your Calligra Words for English translations, an Irina Rempt for Dutch translations, etc? I sure would!

So the idea is, after the automated translation, a small notification would pop and say “hey, I’ve noticed some supersmart bot has translated your document to Bengali. Would you like a human to review the translation and be back to you in 24 hours?”. That would send the document to some professional translator, for instance to Irina or to Prompsit (the company that develops Apertium), which would charge you their fee and Calligra would receive a commission (5%?).

The automated translation has many nitpicks but they all can be worked out and they could even create a business:

  • What services should be in for free translation? That’s not a problem: essentially, anyone that’s good. There should be a default, which provides translation services for the most used languages and provides good translations
  • What services should be in for paid translation? That could be a problem: in the near future, when Calligra overtakes Microsoft Office, every translator will ask to be in 🙂 No, really, we’d need them to offer a proper way to submit documents, notify the user they have received it and progress, etc
  • Privacy. For automated translation, I can either submit the document to the public free translation service, or I can pay a small monthly fee and have an private account on Apertium or Google Translate, so that my document is submitted over SSL and I am assured noone would use it for research or anything.
  • … and more

As a bonus point, a “translation mode” UI could be added to Calligra. It would show the document in the original language and the translated document side-by-side and make editing easy, something like Google Translator Toolkit:

In case you think I’m dreaming: no, I’m not. I have had this in mind for more than a year. Last month I talked about it with Gema Ramirez, the CEO of Prompsit (who has been a friend for I don’t know, 15 years?) and she instantly liked the idea. Maybe this could be material for a shared Apertium-Calligra GSoC?

So here is my third wish: let’s make Calligra the reference tool for users needing translations and for companies providing translation services. Your mission: take my 1,000 words essay and make it real 🙂 I would do it myself but sadly my job and real life leave little spare time for that.

4 Thoughts on “A wish a day 3: document translation in Calligra

  1. The first part of your idea can be accomplished on another (IMHO, simpler) way: hidden sections and split screen view.
    Writer provides hidden sections and you can hide/unhide them with a simple condition than can also be a simple variable, so this is something Calligra Words need to copy. Split screen view is not provided by Writer nor Callibra Words (sigh!), so it is something that could be a selling point.
    So the workflow could be to create one section in the document for every translation and set a condition to hide/unhide those sections. Then, a UI switch that provide the user with a split screen showing the original text and the to-be-translated text it is everything you need.
    When one translation is ready, you hide it and proceed with another one and when the document is ready and you need to send it to anyone else you can hide the not needed translations. Also, an option to save a copy of the document containing only the selected translation will be great.
    The advantage of this way is that you do not need to change anything on the versioning system (a great versioning system for only language will be also great for all languages!), nor introduce variants on the file format: just a change on the UI to provide split screen view and filtering options, and a change on the system to provide hidden section (something that will be useful not only for translations)

  2. Please stop flooding the planet with whises, kde has for it.

  3. You should have a look at OmegaT. It is released under the GPL, does what it should _very_ well and adheres to the standards used by virtually all free translation programs:

  4. Kevin Brubeck Unhammer on Tuesday 29th March 2011 at 13:22:27 said:

    For professional translators, OmegaT is a must. However, if you don’t know/care/have time to find out what XLIFF or TMX means, but just need something to speed up translation tasks that you don’t perform too often, this sounds like a great idea.

    I see MS Office already integrates some machine translation, but the proposal here looks a lot more useful to me.

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.

Post Navigation