In rule-based machine translation systems (RBMT), transfer rules perform transformation of
source language structure into its equivalent target language structure. The grammatical, syntactic,
and systematic differences between two languages, have led to the creation of these rules. The
rules are applied deterministically to the input left-to-right, according to longest match. In this
thesis we describe experiments applied using a two of machine learning methods (maximum
entropy and support vector machine) for learning a model to distinguish between ambiguous
selection of structural transfer rules in a rule-based machine translation (MT) system. Herein, the
transfer rules function by matching a source language pattern of lexical items and applying a
sequence of actions. There can, however, be more than one potential sequence of actions for each
source language pattern. Our model consists of a set of classifiers for either maximum entropy (or
logistic regression) or a support vector machine, one trained for each source language pattern,
which select the highest probability sequence of rules for a given sequence of patterns. We perform
experiments on the Kazakh–Turkish language pair — a low-resource pair of morphologically-rich
languages — and compare our model to two reference MT systems, a rule-based system where
transfer rules are applied in a left-to right longest match manner and to a state-of-the-art system
based on the neural encoder–decoder architecture. Our system out forms both of these reference
systems in three widely used metrics for machine translation evaluation. We also found that the
maximum entropy acquired the best achievement than support vector machine. |