Título: Choosing the correct paradigm for unknown words in rule-based machine translation systems
Autores: Sánchez Cartagena, Víctor Manuel
Esplà Gomis, Miquel
Sánchez Martínez, Felipe
Pérez Ortiz, Juan Antonio
Fecha: 2013-03-26
2013-03-26
2012-06
Publicador: RUA Docencia
Fuente:
Tipo: info:eu-repo/semantics/conferenceObject
Tema: Machine translation
Rule-based
Unknown words
Lenguajes y Sistemas Informáticos
Descripción: Previous work on an interactive system aimed at helping non-expert users to enlarge the monolingual dictionaries of rule-based machine translation (MT) systems worked by discarding those inflection paradigms that cannot generate a set of inflected word forms validated by the user. This method, however, cannot deal with the common case where a set of different paradigms generate exactly the same set of inflected word forms, although with different inflection information attached. In this paper, we propose the use of an n-gram-based model of lexical categories and inflection information to select a single paradigm in cases where more than one paradigm generates the same set of word forms. Results obtained with a Spanish monolingual dictionary show that the correct paradigm is chosen for around 75% of the unknown words, thus making the resulting system (available under an open-source license) of valuable help to enlarge the monolingual dictionaries used in MT involving non-expert users without technical linguistic knowledge.
This work has been partially funded by Spanish Ministerio de Ciencia e Innovación through project TIN2009-14009-C02-01, by Generalitat Valenciana through grant ACIF/2010/174 from VALi+d programme, and by Universitat d’Alacant through project GRE11-20.
Idioma: Inglés

Artículos similares:

Using external sources of bilingual information for on-the-fly word alignment por Esplà Gomis, Miquel,Sánchez Martínez, Felipe,Forcada Zubizarreta, Mikel L.
10