Título: Data mining with relational database management systems
Autores: Zou, Beibei, 1974-
Fecha: 2005
Publicador: McGill University - MCGILL
Fuente:
Tipo: Electronic Thesis or Dissertation
Tema: Computer Science.
Descripción: With the increasing demands of transforming raw data into information and knowledge, data mining becomes an important field to the discovery of useful information and hidden patterns in huge datasets. Both machine learning and database research have made major contributions to the field of data mining. However, there is still little effort made to improve the scalability of algorithms applied in data raining tasks. Scalability is crucial for data mining algorithms, since they have to handle large datasets quite often. In this thesis we take a step in this direction by extending a popular machine learning software, Weka3.4, to handle large datasets that can not fit into main memory by relying on relational database technology. Weka3.4-DB is implemented to store the data into and access the data from DB2 with a loose coupling approach in general. Additionally, a semi-tight coupling is applied to optimize the data manipulation methods by implementing core functionalities within the database. Based on the DB2 storage implementation, Weka3.4-DB achieves better scalability, but still provides a general interface for developers to implement new algorithms without the need of database or SQL knowledge.
Idioma: en