Título: Exposing Instruction Level Parallelism in the Presence of Loops
Autores: de Alba,Marcos R
Kaeli,David
Fecha: 2004-09-01
Publicador: SCIELO
Fuente:
Tipo: journal article
Tema: No aplica
Descripción: In this thesis we explore how to utilize a loop cache to relieve the unnecessary pressure placed on the trace cache by loops. Due to the high temporal locality of loops, loops should be cached. We have observed that when loops contain control flow instructions in their bodies it is better to collect traces on a dedicated loop cache instead of using trace cache space. The traces of instructions within loops tend to exhibit predictable patterns that can be detected and exploited at run-time. We propose to capture dynamic traces of loop bodies in a loop cache. The novelty of this loop cache consists of dynamically capturing loop iterations with conditional branches and correlating them to unique loops. Once loop iterations are cached in the loop cache, their bodies can be provided by the loop cache without polluting the trace cache and without any instruction cache accesses. The proposed loop cache includes hardware capable of dynamically unfolding loops such that large traces of instructions are accessed in a single loop cache interrogation. We evaluate our loop cache and compare it against a baseline machine with a larger first-level instruction cache. We also consider how the loop cache can compliment the introduction of a trace cache by filtering out loop traces that needlessly dominate the trace cache space. We quantify the benefits provided by a fetch engine equipped with the proposed loop cache and unrolling hardware. In our experiments we explore the design space of a loop cache and associated unfolding hardware and evaluate its efficiency to detect independent iterations in loops in SPECint2000, Media-Bench and MiBench applications. We show that trace cache efficiency and ILP can be significantly improved using our loop caching scheme. This improvement translates into up to 38% performance speedup when compared to a baseline machine with a loop cache and no trace cache to a baseline machine with no loop cache. Further experiments show up to a 16% speedup on a hybrid machine with loop and trace cache compared to a machine with a larger 1 cache and a trace cache.
Idioma: Inglés

Artículos similares:

Fracturas maxilofaciales y factores asociados en derechohabientes del IMSS Campeche, México: Análisis retrospectivo 1994-1999 por Medina-Solis,Cario Eduardo,Córdova-González,José Luis,Casanova-Rosado,Alejandro José,Zazueta-Hernández,Maria Alejandra
Factores de riesgo de mortalidad en el hijo de madre toxémica por Gómez-Gómez,Manuel,Danglot-Banck,Cecilia,García-de la Torre,Guadalupe Silvia,Antonio-Ocampo,Abdiel,Fajardo-Gutiérrez,Arturo,Sánchez-García,Maria Luisa,Ahumada-Ramírez,Elias
Cerámicas mexicanas para cicatrización de piel por Piña-Barba,María Cristina,Tejeda-Cruz,Adriana,Regalado-Hernández,Miguel Ángel,Arenas-Reyes,María Isabel,Martín-Mandujano,Salvador,Montalvo,César
Seguimiento de egresados de un diplomado en enseñanza de la Medicina por Ponce de León-Castañeda,Ma. Eugenia,Ruíz-Alcocer,Ma. del Carmen,Lozano-Sánchez,J. Rogelio
Primer estudio de teledermatología en México: Una nueva herramienta de salud pública por Lepe,Verónica,Moneada,Benjamín,Castanedo-Cázares,Juan Pablo,Martínez-Rodríguez,Alejandra,Mercado-Ceja,Sergio M,Gordillo-Moscoso,Antonio
10