|Abstract:||As technology feature size continues to shrink, we see two challenging problems in designing computer systems. One is the hardware unreliability due to increasing chances of transient hardware faults caused by high-energy particles. The other is the variability in the semiconductor manufacturing process, which eventually impacts the frequency and the leakage power dissipation of a chip.
In the first part, we study the problem of handling I/O in memory-based checkpointing systems. The increasing demand for reliable computers has led to proposals for hardware-assisted rollback of memory state. Such approach promises major reductions in Mean Time To Repair (MTTR). Unfortunately, adoption of such proposals is hindered by the lack of efficient mechanisms for I/O recovery.
We present and evaluate ReViveI/O, a scheme for I/O undo and redo that is compatible with mechanisms for hardware-assisted rollback of memory state. We have implemented a Linux-based prototype that shows low-overhead, low-MTTR recovery of I/O is feasible. For 20--120 ms between checkpoints, a throughput-oriented workload has negligible overhead and recovery time.
In the second part, we study architecture-aware fine-grain body biasing to improve the frequency and leakage power dissipation of processors. As VLSI technology continues to scale, parameter variation is about to pose a major challenge to high-performance processor design. In particular, the within-die variation of threshold voltage is directly detrimental to the chip's frequency and leakage power. One proposed technique to address such variation is Fine-Grain Body Biasing (FGBB), where different chip sections are given a certain voltage bias that modifies the threshold voltage.
We show that FGBB should be applied in an architecture-aware manner, following the shapes of architectural modules. The reason is that architectural functionality affects the BB needed through temperature and type of critical path. To prove this idea, we develop a model of threshold voltage variation and apply it to simulated batches of chips. We show that architecture-aware FGBB enables 35% of the chips to work at the highest frequency, compared to 18% with conventional FGBB, potentially increasing each chip's value by 50%. It also reduces the leakage of the chips by 40%, compared to 25% with conventional FGBB.