Título: Spatial Data Modeling and Mining using a Graph-based Representation
Autores: Pech Palacio, Manuel Alfredo
Fecha: 2005-12-12
Publicador: CIRIA
Fuente:
Tipo: Electronic Thesis or Dissertation
Tesis
Tema: Ciencias de la Computación
Descripción: Motivation Several approaches have been developed for mining spatial data (i.e., generalization-based methods, clustering, spatial associations, approximation and aggregation, mining in image and raster databases, spatial classification and spatial trend detection). However, we argue that these approaches do not consider all the elements found in a spatial database (spatial data, non-spatial data and spatial relations among the spatial objects) in an extended way. Some of them focus first on spatial data and then on non-spatial data or vice versa, and others consider restricted combinations of these elements. We think that it is possible to enhance the generated results of the data mining task by mining them as a whole and not as separate elements (they are related elements). A graph representation provides the flexibility to describe these elements together and this is the motivation to explore the area of graph-based spatial knowledge discovery. Proposal Our idea is to create a unique graph-based model to represent spatial data, non-spatial data and the spatial relations among spatial objects. We will generate datasets composed of graphs with a set of these three elements. We consider that by mining a dataset with these characteristics a graph-based mining tool can search patterns involving all these elements at the same time improving the results of the spatial analysis task. A significant characteristic of spatial data is that the attributes of the neighbors of an object may have an influence on the object itself. So, we propose to include in the model three relationship types (topological, orientation, and distance relations). In the model the spatial data (i.e., spatial objects), non-spatial data (i.e., non-spatial attributes), and spatial relations are represented as a collection of one or more directed graphs. A directed graph contains a collection of vertices and edges representing all these elements. Vertices represent either spatial objects, spatial relation types between two spatial objects (binary relation), or non-spatial attributes describing the spatial objects. Edges represent a link between two vertices of any type. According to the type of vertices that an edge joins, it can represent either an attribute name or a spatial relation name. The attribute name can refer to a spatial object or a non-spatial entity. We use directed edges to represent directional information of relations among elements (i.e., object x covers object y) and to describe attributes about objects (i.e., object x has attribute z). We propose to adopt the Subdue system, a general graph-based data mining system developed at the University of Texas at Arlington, as our mining tool. Subdue discovers substructures using a graph-based representation of structural databases. The substructures (a connected subgraph within the graphical representation) describe structural concepts in the data (i.e., patterns). The discovery algorithm follows a computationally constrained beam search. The algorithm begins with the substructure matching a single vertex in the graph. On each iteration, the algorithm selects the best substructure and incrementally expands the instances of the substructure. An instance of a substructure in the input graph is a subgraph that matches (graph theoretically) that substructure. A special feature named overlap has a primary role in the substructures discovery process and consequently a direct impact over the generated results. However, it is currently implemented in an orthodox way: all or nothing. If we set overlap to true, Subdue will allow the overlap among all instances sharing at least one vertex. On the other hand, if overlap is set to false, Subdue will not allow the overlap among instances sharing at least one vertex. So, we propose a third approach: limited overlap, which gives the user the capability to set over which vertices the overlap will be allowed (vertices representing remarkable elements that refer, for instance, to a spatial object in a spatial database or to some characteristic defining a particular topic of a dataset). We visualize directly three motivations issues to propose the implementation of the new algorithm: search space reduction, processing time reduction, and pattern oriented search. Contribution The contribution to the discovery knowledge in the spatial data domain, described in this dissertation, is the development of a new approach for spatial data modeling and mining using a graph-based representation. This contribution includes the following results: " A new graph-based data representation for spatial, non-spatial data and spatial relations. " A new algorithm to discover substructures using a limited overlap approach in the Subdue system. " A prototype system implementing the proposed model.
Idioma: Español