L
Título: An inverted index generator for CINDI
Autores: Li, Hudong
Fecha: 2003
Publicador:
Fuente: Ver documento
Tipo: Thesis
NonPeerReviewed
Tema:
Descripción: Human maintained search engines are expensive, slow to update, and cannot cover all the web pages. Automated search engines that rely on keyword matching usually return too many low quality results, with most users only looking at the first few tens of the search results. Because search engine development has gone on at companies with little publication of technical details, it is a challenging task to develop a search engine. The use of hypertextual information can help to improve search quality. This report addresses the question of how to build an inverted index for a search system that can use the additional information presented in hypertext to produce better search results. This report is part of the work of the Concordia INdexing and DIscovery (CINDI) Digital Library System. In this report, we summarize the research work I have done; we present some implementation issues for the project; and present the data structures that can be used in indexing web pages. The design decision was driven by the desire to have a reasonable compact data structure, and the ability to fetch a record in few disk seeks during a search. This project has been implemented in C++ on Linux platform.
Idioma: No aplica