Título: Trie methods for text and spatial data on secondary storage
Autores: Shang, Heping
Fecha: 1995
Publicador: McGill University - MCGILL
Fuente:
Tipo: Electronic Thesis or Dissertation
Tema: Computer Science.
Descripción: This thesis presents three trie organizations for various binary tries. The new trie structures have two distinctive features: (1) they store no pointers and require two bits per node in the worst case, and (2) they partition tries into pages and are suitable for secondary storage. We apply trie structures to indexing, storing and querying both text and spatial data on secondary storage. We are interested in practical problems such as storage compactness, I/O efficiency, and large trie construction.
We use our tries to index and search arbitrary substrings of a text. For an index of 100 million keys, our trie is 10%-25% smaller than the best known method. This difference is important since the index size is crucial for trie methods. We provide methods for dynamic tries and allow texts to be changed. We also use our tries to compress and approximately search large dictionaries. Our algorithm can find strings with k mismatches in sublinear time. To our knowledge, no other published sublinear algorithm is known for this problem.
Besides, we use our tries to store and query spatial data such as maps. A trie structure is proposed to permit querying and retrieving spatial data at arbitrary levels of resolution, without reading from secondary storage any more data than is needed for the specified resolution. The trie structure also compresses spatial data substantially. The performance results on map data have confirmed our expectations: the querying cost is linear in the amount of data needed and independent of the data size in practice. We give algorithms for a set of sample queries including geometrical selection, geometrical join and the nearest neighbour. We also show how to control query cost by specifying an acceptable resolution.
Idioma: en