Título: Speeding up target-language driven part-of-speech tagger training for machine translation
Autores: Sánchez Martínez, Felipe
Pérez Ortiz, Juan Antonio
Forcada Zubizarreta, Mikel L.
Fecha: 2013-03-25
2013-03-25
2006-11
Publicador: RUA Docencia
Fuente:
Tipo: info:eu-repo/semantics/conferenceObject
Tema: Machine translation
Part-of-speech taggers
Target-language driven
Lenguajes y Sistemas Informáticos
Descripción: When training hidden-Markov-model-based part-of-speech (PoS) taggers involved in machine translation systems in an unsupervised manner the use of target-language information has proven to give better results than the standard Baum-Welch algorithm. The target-language-driven training algorithm proceeds by translating every possible PoS tag sequence resulting from the disambiguation of the words in each source-language text segment into the target language, and using a target-language model to estimate the likelihood of the translation of each possible disambiguation. The main disadvantage of this method is that the number of translations to perform grows exponentially with segment length, translation being the most time-consuming task. In this paper, we present a method that uses a priori knowledge obtained in an unsupervised manner to prune unlikely disambiguations in each text segment, so that the number of translations to be performed during training is reduced. The experimental results show that this new pruning method drastically reduces the amount of translations done during training (and, consequently, the time complexity of the algorithm) without degrading the tagging accuracy achieved.
Work funded by the Spanish Ministry of Science and Technology through project TIC2003-08681-C02-01, and by the Spanish Ministry of Education and Science and the European Social Found through research grant BES-2004-4711.
Idioma: Inglés

Artículos similares:

Choosing the correct paradigm for unknown words in rule-based machine translation systems por Sánchez Cartagena, Víctor Manuel,Esplà Gomis, Miquel,Sánchez Martínez, Felipe,Pérez Ortiz, Juan Antonio
Using external sources of bilingual information for on-the-fly word alignment por Esplà Gomis, Miquel,Sánchez Martínez, Felipe,Forcada Zubizarreta, Mikel L.
10