The University of Sheffield
Department of Computer Science

Ariadna Hernandez Plata MSc Dissertation 2014/15

Pivoting for Machine Translation

Supervised by L.Specia

Abstract

Statistical machine translation systems are built from large parallel corpora to produce automatic translations and larger monolingual corpus to build Language Models. The later ensures the highest probability of seen that target string, assuring fluency. However, data is a sparse resource for many language pairs. Building aligned parallel corpora is a highly costly and difficult task. To overcome this situation, various methods and techniques have been developed. Here we present an alternative, pivoting or bridge for Machine Translation, allowing the use of a third language with enough parallel corpora to build translation systems. This project is split into two phases. During the first stage, two pivoting methods are chosen to run experiments, triangulation and synthetic. Using the languages Spanish, German as source and target respectively. Assuming small corpora for this pair. English and French are defined as pivot languages. Results will be analysed and evaluated with BLEU metric. The second stage of the project consists on building quality prediction models with QuEst (a toolkit developed at Sheffield University), comparing them against the scores obtained with the metrics METEOR, TER and smooth version of BLEU. Experimental results demonstrate BLEU improvements for triangulated systems over the direct system built with limited corpora. The combination of translations from different triangulated systems not only improves scores, but human perception of quality.