bitext2tmx Logo Logo

Bitext2tmx User Guide

Available Documentation


bitext2tmx(bitext2tmx Logo) is a free Computer Aided/Assisted Translation (CAT) related tool to create translation memories in TMX format from bitext of an original and translation of the same textual content. The corresponding generated TMX document can be used in other CAT tools that can leverage the aligned segments for matching and reuse purposes.

To start working with bitext2tmx Logo a user opens two documents in the program. One document is the original version of a text and the other is its corresponding translation. Both documents must be in the same format (plain-text currently), but their file encodings may differ (select encoding when opening documents). Upon opening of the two documents, bitext2tmx Logo will attempt to align them into related segments as best it can. It does this by using various rules, but it does not guarantee that the alignments will be anywhere near perfect. bitext2tmx Logo aids the user in creating a correct alignment of parallel texts, it cannot in most cases do this by itself.

It is also possible to use bitext2tmx Logo to edit TMX documents. In this way TMX documents can be aligned to provide better matches, but more importantly for bitext2tmx Logo users it means they can save there work out to TMX and read it back in to continue editing at a later time without loss of work or data.


The bitext2tmx Logo main window contains a number of embedded windows (editors, views). It has a docking desktop type interface that allows the contained parts to be manipulated to a certain degree so that the user can customize the interface to their own viewing preferences. In the default setup the following are visible:

Bitext Alignments/Segments Table (Alignments)

The Alignments view contains a table of the original and translation segments. Segments in this view can be selected, with the corresponding text shown in its associated segment editor for editing.

Original Segment Editor (Original)

Editor for the original segment.

Translation Segment Editor (Translation)

Editor for the translation segment.

Alignment/Segment Controls (Controls)

This view contains buttons for editing the alignments/segments. The lower left and right side buttons operate on the original and translation segments, respectively. They apply to their particular column of the alignments table. The center buttons work more generally.


bitext2tmx Logo segments text of the opened documents according to a series of internal rules before display in the alignments table. In general, these rules cannot be altered. The only possible way to alter the behavior is by line breaks. Segmentation can be forced to occur on line breaks by choosing Split by Line Break from the Settings menu.

TMX Generation

With the option Save the TMX is generated (in version 1.1). You may want to use the resulting TMX as input for a particular translation memory capable tool; be sure that the language of the source text selected in Bitext2tmx is the same as the one of the language of the source text selected in the application. So, in the head of the TMX generated by Bitext2tmx, must appear in srclang the code of the source language, and it must be the same as the source language selected when a project is created.

Warning: closing the documents before saving will result in loss of all alignment changes (to be fixed in a future version).


Original development on bitext2tmx was done by Susana Santos Antón, with help from Sergio Ortiz-Rojas and Mikel L. Forcada (members of the the Transducens research group at the Departament de Llenguatges i Sistemes Informics, Universitat d'Alacant, Spain).

The program originated inside the project "Finite-state translators based on bitexts harvested from the net" (2004–2006), that was funded by the now defunct Ministry of Science and Technology of Spain through grant number TIC2003-08681-C02.

Ongoing development by Raymond Martin (OmegaT+ project). Contributions have also been made by Sabine Cretella, Valerie Martineau and others from the free software/open source community.