ASP-DAC 2016 Technical Program

The 21st Asia and South Pacific Design Automation Conference

Session 3C Emerging Devices for Energy Efficient Computing
Time: 15:50 - 17:30 Tuesday, January 26, 2016
Location: TF4204
Chairs: Danghui Wang (Northwestern Polytechnical University, China), Jingtong Hu (Oklahoma State University, U.S.A.)

3C-1 (Time: 15:50 - 16:15)

Title	Thermal Optimization for Memristor-Based Hybrid Neuromorphic Computing Systems
Author	Chi-Ruo Wu (National Cheng Kung University, Taiwan), Wei Wen (University of Pittsburgh, U.S.A.), *Tsung-Yi Ho (National Tsing Hua University, Taiwan), Yiran Chen (University of Pittsburgh, U.S.A.)
Page	pp. 274 - 279
Keyword	Neuromorphic, Memristor, Thermal
Abstract	Neuromorphic computing is used for accelerating the computation of neural network which can simulate the brain of animal and composed by neurons and synapses. However, the neuromorphic computing with the traditional computer architecture leads to serious von Neumann bottleneck because of the gap between high frequency CPU computation and memory access. The emerging memristor is an innovation technology for future VLSI circuits potentially can be acted as both data storage and computing unit to transform the computer architecture. Furthermore, the characteristics of memristors include low programming energy, parallel process, small footprint, non-volatility, etc, which have attracted significant researches on neuromorphic computing. However, some important issues such as thermal damage defect the reliability of memristors. High thermal of memristor is a critical issue which impacts the reliability of the systems. To estimate the thermal of the memristor, we formulated the thermal as the power consumption problem. In this paper, a thermal optimization algorithm for memristor-based hybrid neuromorphic computing system is proposed to solve the the reliability issue by the incremental cluster network flow. Our results show that the maximum power consumption can be reduced about 31%.

3C-2 (Time: 16:15 - 16:40)

Title	An Energy-efficient Matrix Multiplication Accelerator by Distributed In-memory Computing on Binary RRAM Crossbar
Author	*Leibin Ni, Yuhao Wang, Hao Yu (Nanyang Technological University, Singapore), Wei Yang, Chuliang Weng, Junfeng Zhao (Shannon Laboratory, Huawei Technologies Co., Ltd, China)
Page	pp. 280 - 285
Keyword	RRAM, In-memory architecture
Abstract	Emerging resistive random-access memory (RRAM) can provide non-volatile memory storage but also intrinsic logic for matrix-vector multiplication, which is ideal for low-power and high-throughput data analytics accelerator performed in memory. However, the existing RRAM-based computing device is mainly assumed on a multi-level analog computing, whose result is sensitive to process non-uniformity as well as additional AD- conversion and I/O overhead. This paper explores the data analytics accelerator on binary RRAM-crossbar. Accordingly, one distributed in-memory computing architecture is proposed with design of according component and control protocol. Both memory array and logic accelerator can be implemented by RRAM-crossbar purely in binary, where logic-memory pairs can be distributed with protocol of control bus. Based on numerical results for fingerprint matching that is mapped on the proposed RRAM-crossbar, the proposed architecture has shown 2.86x faster speed, 154x better energy efficiency, and 100x smaller area when compared to the same design by CMOS-based ASIC.

3C-3 (Time: 16:40 - 17:05)

Title	A Racetrack Memory Based In-memory Booth Multiplier for Cryptography Application
Author	*Tao Luo (Nanyang Technological University, Singapore), Wei Zhang (Hong Kong University of Science and Technology, Hong Kong), Bingsheng He, Douglas Maskell (Nanyang Technological University, Singapore)
Page	pp. 286 - 291
Keyword	Racetrack memory, RSA, Multiplier, Adder
Abstract	Security is an important concern in cloud comput- ing nowadays. RSA is one of the most popular asymmetric encryption algorithms that are widely used in internet based applications for its public key strategy advantage over symmetric encryption algorithms. However, RSA encryption algorithm is very compute intensive, which would affect the speed and power efficiency of the encountered applications. Racetrack Memory (RM) is a newly introduced promising technology in future storage and memory system, which is perfect to be used in memory intensive scenarios because of its high data density. However, novel designs should be applied to exploit the advantages of RM while avoiding the adverse impact of its sequential access mechanism. In this paper, we present an in-memory Booth multiplier based on racetrack memory to alleviate this problem. As the building block of our multiplier, a racetrack memory based adder is proposed, which saves 56.3% power compared with the state-of-the-art magnetic adder. Integrated with the storage element, our proposed multiplier shows great efficiency in area, power and scalability.

3C-4 (Time: 17:05 - 17:30)

Title	Look-ahead Schemes for Nearest Neighbor Optimization of 1D and 2D Quantum Circuits
Author	*Robert Wille (Johannes Kepler University Linz, Austria), Oliver Keszocze (DFKI GmbH, Germany), Marcel Walter, Patrick Rohrs (University of Bremen, Germany), Anupam Chattopadhyay (Nanyang Technological University, Singapore), Rolf Drechsler (University of Bremen, Germany)
Page	pp. 292 - 297
Keyword	quantum circuits, nearest neighbor, technology mapping
Abstract	Ensuring nearest neighbor compliance of quantum circuits by inserting SWAP gates has heavily been considered in the past. Here, quantum gates are considered which work on non-adjacent qubits. SWAP gates are applied in order to “move” these qubits onto adjacent positions. However, a decision how exactly the SWAPs are “moved” has mainly been made without considering the effect a “movement” of qubits may have on the remaining circuit. In this work, we propose a methodology for nearest neighbor optimization which addresses this problem by means of a look-ahead scheme. To this end, two representative implementations are presented and discussed in detail. Experimental evaluations show that, in the best case, reductions in the number of SWAP gates of 56% (compared to the state-of-the-art methods) can be achieved following the proposed methodology.