Computer Architecture and Systems

Cache Architectures for Low Supply Voltage and Low Power Consumption

Cache is a roadblock towards low supply voltage (Vcc). It is mainly because low Vcc incurs process variation-induced bit errors in large SRAM in cache. Existing approaches for low Vcc cache suffer from low performance due to reduced effective capacity, long latency to correct errors, and increased misses due to accesses to faulty words. In our work, we propose a word-level sub-block disable-based method which increases the utilization of available cache capacity. Our key idea is to minimize accesses to faulty words. To do that, we propose utilizing access behavior history in allocating cache resource with faulty words. In addition, we propose remapping cache words inside of cache line in order to better match both access and error patterns. Experimental results show that the proposed method gives average 21.77% (up to 34.02%) performance improvement with a smallarea overhead in L1 and L2 caches. Refer to DAC11 for more details.

Tag comparisons occupy a significant portion of cache power consumption in the highly associative cache such as L2 cache. In our work, we propose a novel tag access scheme which applies a partial tag-enhanced Bloom filter to reduce tag comparisons by detecting per-way cache misses. The proposed scheme also classifies cache data into hot and cold data and the tags of hot data are compared earlier than those of cold data exploiting the fact that most of cache hits go to hot data. In addition, the power consumption of each tag comparison can be further reduced by dividing the tag comparison into two micro-steps where a partial tag comparison is performed first and, only if the partial tag comparison gives a partial hit, then the remaining tag bits are compared. We applied the proposed scheme to an L2 cache with 10 programs from SPEC2000 and SPEC2006. Experimental results show average 23.69% and 8.58% reduction in cache energy consumption compared with the conventional serial tag-data access and the other existing methods, respectively. Refer to DATE11 for more details.


[DAC11] Y. Choi, S. Yoo, S. Lee, "Matching Cache Access Behavior and Bit Error Pattern for High Performance Low Vcc L1 Cache," Proc. Design Automation Conference (DAC), June 2011.
[DATE11] H. Park, S. Yoo, S. Lee, "A Novel Tag Access Scheme for Low Power L2 Cache," Proc. Design Automation and Test in Europe (DATE), March 2011.

Phase Change RAM (PRAM)-based Main Memory Subsystems

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power management of such a hybrid main memory consisting of DRAM and phase-change RAM (PRAM). In order to reduce DRAM refresh energy which occupies a significant portion of total memory energy, we present a runtime-adaptive method of DRAM decay. In addition, we present two methods, DRAM bypass and dirty data keeping, for further reduction in refresh energy and memory access latency, respectively. The experiments show that by reducing DRAM refreshes, we can obtain 23.5%~94.7% reduction in the energy consumption with negligible performance overhead compared with the conventional DRAM-only main memory. Refer to DAC11 for more details.


[DAC11] H. Park, S. Yoo and S. Lee, "Power Management of DRAM/PRAM Hybrid Main Memory Subsystem", Proc. Design Automation Conference (DAC), June 2011.
[KCS11] S. Lee, S. Yoo, S. Lee, and C. Park, "PRAMsim: A Phase-Change RAM Subsystem Simulator", Proc., Korean Conference on Semiconductor (KCS), Feb., 2011.
[EMT10] A. Tran, S. Yoo, S. Lee, and C. Park, "Memory-Mapped Invert Coding for PRAM Main Memory," presented at EMT (emerging memory technology) workshop co-located with ISCA, June 2010.

Memory Controller Architectures for 3D-stacked DRAM and Many-Core

3D stacked DRAM improves peak memory performance. However, its effective performance is often limited by the constraints of row-to-row activation delay (tRRD), four active bank window (tFAW), etc. In this paper, we present a quantitative analysis of the performance impact of such constraints. In order to resolve the problem, we propose balancing the budget of DRAM row activation across DRAM channels. In the proposed method, an inter-memory controller coordinator receives the current demand of row activation from memory controllers and re-distributes the budget to the memory controllers in order to improve DRAM performance. Experimental results show that sharing the budget of row activation between memory channels can giveaverage 4.72% improvement in the utilization of 3D stacked DRAM. Refer to DATE11 for more details.


[DATE11] D. Kim, S. Yoo, S. Lee, J. Ahn, and H. Jung, "A Quantitative Analysis of Performance Benefits of 3D Die Stacking on Mobile and Embedded Systems", Proc. Design Automation and Test in Europe (DATE), March 2011.
[NOCS10] D. Kim, S. Yoo, S. Lee, "A Network Congestion-Aware Memory Controller," Proc. NOCS, May 2010.

Solid State Disk

Solid state disk (SSD) design is facing a significant challenge, reliability. Multi-level cell NAND Flash memory suffers from more bit errors as the number of bits per cell increases to 3b/cell and 4b/cell. The write endurance of state-of-the-art multi-level cell NAND Flash memory cell is in order of 1000. In order to address the low reliability problem, stronger error correction schemes are being adopted incurring significant cost in terms of price (dollar/bit), power consumption, and performance. In our work, we work on cost-effective solutions to enhance SSD reliability. In the near term, we study adaptive solutions to address reliability issues while we will apply PRAM (phase change RAM) to SSD in order to enhance the reliability and performance of SSD.


[KCS11] C. Kim, S. Yoo, S. Lee, and C. Park, "A Reliability Model of NAND Flash Memory for Solid State Disks," Proc., Korean Conference on Semiconductor (KCS), Feb., 2011.

Many-core SoC Architecture for Recognition and Consciousness

Compute & memory density (GOPS/cm3 and GB/cm3) of semiconductor keeps increasing by aggressive scaling, 3D die stacking, and new material and devices. The increasing compute and memory capacity will make many-core SoC as the only commercially viable solution to the future applications towards human-like recognition [ISSCC08] and consciousness [Koch]. In such a many-core SoC design, the management of embarrassingly parallel application and architecture, which will be our research focus, will become a critical issue.


[ISSCC08] K. Kim, et al., “A 125GOPS 583mW Network-on-Chip Based Parallel Processor with Bio-inspired Visual Attention Engine”, Proc. ISSCC, 2008.
[Koch] C. Koch, Quest for Consciousness, Roberts & Company Publishers, 2004.

A wireless sensor network (WSN) is a group of specialized transducers with a communications infrastructure intended to monitor and record conditions at diverse locations. Commonly monitored parameters are temperature, humidity, pressure, wind direction and speed, illumination intensity, vibration intensity, sound intensity, power-line voltage, chemical concentrations, pollutant levels and vital body functions.

A WSN consists of multiple detection stations called sensor nodes, each of which is small, lightweight and portable. Every sensor node is equipped with a transducer, microcomputer, transceiver and power source. The transducer generates electrical signals based on sensed physical effects and phenomena. The microcomputer processes and stores the sensor output. The transceiver receives commands from a central computer and transmits data to that computer. The power for each sensor node is derived from the electric utility or from a battery.

A WSN are intergration of several different technologies. For example, sensors used to convert various "sensed signals" (temperature, humidity, pressure sound, magnetism, light) into digital values. Network menagement (link creation, routing, multicasting) for ad-hoc networks created using wireless links. Application programs that can combine the raw data provided by various distributed sensors into useful information.

WSN Research Issues

There are many interesting reseatch issues related to WSN including sensor node coverage, ensuring communication connectivity, time synchronization of WSN nodes, determining 2-D or 3-D positions of WSN nodes (localization), coordination of sleep/awake cycles while maintaing coverage and connectivity (energy consevation), security, communication issues (protocols, routing, broadcasting), channel assignment and scheduling.


The development of wireless sensor networks was originally motivated by military applications such as battlefield surveillance. However, wireless sensor networks are now used in many industirial and civilian application areas, including industirial process minitoring and control, machine health monitoring, environment and habitat monitoring, healthcarea applications, home automation, smart space and traffic control.

Devices (possessing equipment)

Micaz2 Mote: MPU Atmega128L, CC2420 chipcon radio chip, 512KB flash memory
MTS310 Sensor Board: Temperature, Light (photoresistor), Accelerometer, Microphone, Sounder, Magnetometer
MIB510 Programming Board: Atmega16L for programming, MICAz, Mote JTAG, MICA2DOT connector, RS-232 serial port
CFR-PP100: MPU Atmega128L, CC2420 chipcon radio chip