Título: A combining approach to find all taxon names (FAT)
Autores: Sautter, Guido
Böhm, Klemens
Agosti, Donat
Fecha: 2006-06-22
Publicador: Biodiversity Informatics
Fuente:
Tipo: info:eu-repo/semantics/article
Peer-reviewed Article
info:eu-repo/semantics/publishedVersion
Tema: digital library; systematics; Named Entity Recognition; Taxonomic Name Extraction;
Descripción: Most of the literature on natural history is hidden in millions of pages stacked up in our libraries. Various initiatives aim now at making these publications digitally accessible and searchable, applying xml-mark up technologies. The unique biological names play a crucial role to link content related to a particular taxon. Thus discovering and marking them up is extremely important. Since their manual extraction and markup is cumbersome and time-intensive, it needs be automated. In this paper, we present computational linguistics techniques and evaluate how they can help to extract taxonomic names auto-matically. We build on an existing approach for extraction of such names (Koning et al. 2005) and combine it with several other learning techniques. We apply them to the texts sequentially so that each technique can use the results from the preceding ones. In particular, we use structural rules, dynamic lexica with fuzzy lookups, and word-level language recognition. We use legacy documents from different sources and times as test bed for our evaluation. The experimental results for our combining approach (FAT) show greater than 99% precision and recall. They reveal the potential of computational linguis-tics techniques towards an automated markup of biosystematics publications.
Idioma: Inglés

Artículos similares:

Global Biodiversity Informatics: setting the scene for a “new world” of ecological forecasting por Canhos, Vanderlei Perez; Centro de Referência em Informação Ambiental, CRIA,Souza, Sidnei de; Centro de Referência em Informação Ambiental, CRIA,Giovanni, Renato De; Centro de Referência em Informação Ambiental, CRIA,Canhos, Dora Ann Lange; Centro de Referência em Informação Ambiental, CRIA
Interpretation of Models of Fundamental Ecological Niches and Species’ Distributional Areas por Soberon, Jorge; CONABIO,Peterson, A. Townsend; Natural History Museum, KU
Place prioritization for biodiversity content using species ecological niche modeling por Sánchez-Cordero, Víctor; Departamento de Zoologia, Instituto de Biologia, Universidad Nacional Autonoma de Mexico.,Cirelli, Verónica; Departamento de Zoologia, Instituto de Biologia, Universidad Nacional Autonoma de Mexico.,Munguial, Mariana; Departamento de Zoologia, Instituto de Biologia, Universidad Nacional Autonoma de Mexico.,Sarkar, Sahotra; Section of Integrative Biology and Department of Philosophy, University of Texas
Environmental Information: Placing Biodiversity Phenomena in an Ecological and Environmental Context por Chapman, Arthur D; Australian Biodiversity information Services,Muñoz, Mauro E.S.; Centro de Referência em Informação Ambiental (CRIA),Koch, Ingrid; Centro de Referência em Informação Ambiental (CRIA),
Resolving taxonmic discrepancies: Role of Electronic Catalogues of Known Organisms por Chavan, Vishwas Shravan; National Chemical Laboratory,Rane, Nilesh Sunil; National Chemical Laboratory,Watve, Aparna; National Chemical Laboratory,Ruggiero, Michael; Integrated Taxonomic Information System, US Geological Survey, Smithsonian Institution, Washington DC, USA
Bioinformatics, the Clearing-House Mechanism and the Convention on Biological Diversity por Silva, Marcos R.; Secretariat of the Convention on Biological Diversity
TaxonGrab: Extracting Taxonomic Names From Text por Koning, Drew; American Museum of Natural History,Sarkar, Indra Neil; American Museum of Natural History,Moritz, Thomas; American Museum of Natural History
Mammals of the World: MaNIS as an example of data integration in a distributed network environment por Stein, Barbara R; Museum of Vertebrate Zoology,Wieczorek, John R.; Museum of Vertebrate Zoology
10