Título: Data mining using relational database management system
Autores: Ma, Xuesong, 1975-
Fecha: 2005
Publicador: McGill University - MCGILL
Fuente:
Tipo: Electronic Thesis or Dissertation
Tema: Computer Science.
Descripción: With the wide availability of huge amounts of data and the imminent demands to transform the raw data into useful information and knowledge, data mining has become an important research field both in the database area and the machine learning areas. Data mining is defined as the process to solve problems by analyzing data already present in the database and discovering knowledge in the data. Database systems provide efficient data storage, fast access structures and a wide variety of indexing methods to speed up data retrieval. Machine learning provides theory support for most of the popular data mining algorithms. Weka-DB combines properties of these two areas to improve the scalability of Weka, which is an open source machine learning software package. Weka implements most of the machine learning algorithms using main memory based data structure, so it cannot handle large datasets that cannot fit into main memory. Weka-DB is implemented to store the data into and access the data from DB2, so it achieves better scalability than Weka. However, the speed of Weka-DB is much slower than Weka because secondary storage access is more expensive than main memory access. In this thesis we extend Weka-DB with a buffer management component to improve the performance of Weka-DB. Furthermore, we increase the scalability of Weka-DB even further by putting further data structures into the database, which uses a buffer to access the data in database. Furthermore, we explore another method to improve the speed of the algorithms, which takes advantage of the data access properties of machine learning algorithms.
Idioma: en