Título: Improving the efficiency of algebraic subspace clustering through randomized low-rank matrix approximations
Autores: FABRICIO OTONIEL PEREZ PEREZ
Fecha: 2013-01
Publicador: INAOE
Fuente:
Tipo: info:eu-repo/semantics/masterThesis
info:eu-repo/semantics/acceptedVersion
Tema: info:eu-repo/classification/Análisis de los datos/Data analysis
info:eu-repo/classification/Reducción de datos/Data reduction
info:eu-repo/classification/Algoritmos determinísticos/Deterministic algorithms
info:eu-repo/classification/Algoritmos aleatorios/Randomized algorithms
info:eu-repo/classification/Subespacio agrupación/Subspace clustering
info:eu-repo/classification/Aproximación polinomial/Polynomial approximation
info:eu-repo/classification/Álgebra lineal numérica/Numerical linear algebra
info:eu-repo/classification/Valor singular de descomposición/Singular value decomposition
info:eu-repo/classification/Métodos Monte Carlo/Monte Carlo methods
info:eu-repo/classification/Análisis de componentes principales/Principal component analysis
info:eu-repo/classification/cti/1
info:eu-repo/classification/cti/12
info:eu-repo/classification/cti/1203
Descripción: In many research areas, such as computer vision, image processing, pattern recognition, or systems identification, the segmentation of heterogeneous high-dimensional data sets is one of the most common and important tasks. Based on the subspace clustering approach, the Generalized Principal Component Analysis (GPCA) is an algebraic-geometric method that attempts to perform this task. However, due to GPCA requires performing matrix decompositions whose computational cost is cubic with respect to the size of the matrix (in the worst case), the data segmentation becomes expensive when such size is very large. Consequently, the present thesis work is intended to support our initial hypothesis: it is possible to find matrix decompositions via randomized schemes that not only reduce the computational costs, but also they maintain the effectiveness of their results. This allows GPCA to manipulate both large and heterogeneous high-dimensional data sets, and thus GPCA can enter into domains where its applicability has been partially or totally restricted.
Idioma: eng