Metabuscador

Inicio Atrás
Título:	Extracting semantic information from Wikipedia using human computation and dimensionality reduction
Autores:	West, Robert
Fecha:	2010
Publicador:	McGill University - MCGILL
Fuente:
Tipo:	Electronic Thesis or Dissertation
Tema:	Applied Sciences - Computer Science
Descripción:	Semantic background knowledge is crucial for many intelligent applications. A classical way to represent such knowledge is through semantic networks. Wikipedia's hyperlink graph can be considered a primitive semantic network, since the links it contains usually correspond to semantic relationships between the articles they connect. However, Wikipedia is rather noisy in this function. We propose Wikispeedia, an online human-computation game that can effectively filter this noise, furnishing data that can be leveraged to define a robust measure of semantic relatedness between concepts. While the resulting measure is very precise, it has the limitation of being sparse, i.e., undefined for many pairs of concepts. Therefore, we develop algorithms based on principal component analysis to increase coverage to the set of all pairs of Wikipedia concepts. These methods can also be generalized to other sparse measures of semantic relatedness, which we demonstrate by applying our approach to the Wikipedia adjacency matrix. Building on the same techniques, we finally propose an algorithm for finding missing hyperlinks in Wikipedia, which results in increased human usability. Des connaissances d'arrière-plan sémantiques sont essentielles pour de nombreuses applications intelligentes. Les réseaux sémantiques constituent une façon classique de représenter de telles connaissances. On peut comprendre le graphe défini par les hyperliens de Wikipédia comme un réseau sémantique primitif, car les liens qu'il contient correspondent habituellement à des relations sémantiques entre les articles qu'ils joignent. Cependant, si on considère Wikipédia comme un réseau sémantique, le niveau de bruit est relativement élevé. Nous proposons Wikispeedia, un jeu de calcul humain en ligne qui peut effectivement filtrer ce bruit, en fournissant des données que nous utilisons pour définir une mesure de proximité sémantique entre les concepts. Bien que la mesure qui s'ensuit soit très précise, elle est creuse, c'est-à-dire indéfinie sur de nombreuses paires de concepts. Pour couvrir l'ensemble de toutes les paires de concepts que contient Wikipédia, nous développons des algorithmes basés sur l'Analyse en composantes principales. Ces méthodes peuvent être généralisées aux autres mesures de proximité sémantique creuses, ce que nous démontrons en appliquant notre approche à la matrice d'adjacence de Wikipédia. Enfin, nous utilisons les mêmes techniques en proposant un algorithme qui est capable de trouver les liens manquants dans Wikipédia, donnant lieu à un système de meilleure convivialité. fr
Idioma:	en

1 Investigations on the form-genera Beauveria and Tritirachium por MacLeod, Donald Murdock	6 Treatment and recovery in first-episode psychosis : a qualitative analysis of client experiences por Windell, Deborah L.
2 Seismic sensitivity of tall guyed telecommunication towers. por Ghodrati Amiri, Gholamreza.	7 Geology of the Mutton Bay Intrusion and surrounding area, North Shore, Gulf of St. Lawrence, Quebec por Davies, Raymond
3 Exploring the Relationship Between Assets and Family Stress Among Low-Income Families por Rothwell, David W.,Han, Chang-Keun	8 Geology of the Mutton Bay Intrusion and surrounding area, North Shore, Gulf of St. Lawrence, Quebec por Davies, Raymond
4 The case for asset-based interventions with indigenous peoples: Evidence from Hawai‘i por Rothwell, David W.	9 Geology of the Mutton Bay Intrusion and surrounding area, North Shore, Gulf of St. Lawrence, Quebec por Davies, Raymond
5 Second Thoughts: Who Almost Participates in an IDA Program? por Rothwell, David W.,Han, Chang-Keun	10 Recent contributions to the phenomenology of musical time : a critical survey por Beaudreau, Pierre