(Go to Top Page)

The 21st Asia and South Pacific Design Automation Conference
Technical Program

Remark: The presenter of each paper is marked with "*".
Technical Program:   SIMPLE version   DETAILED version with abstract
Author Index:   HERE

Session Schedule


Tuesday, January 26, 2016

TF4303TF4203TF4304TF4204
1K  (TF Theatre)
Opening & Keynote I

8:30 - 10:00
1S  University Design Contest
10:20 - 12:00
1A  The Optimization of Memory Architecture and Management
10:20 - 12:00
1B  Secure Embedded Systems & IoT
10:20 - 12:00
1C  Design for Directed Self-Assembly
10:20 - 12:00
2S  (Special Session) Designing with Spintronics: Recent Developments and Upcoming Challenges
13:50 - 15:30
2A  Advances in Verification
13:50 - 15:30
2B  System Simulation and Testing
13:50 - 15:30
2C  Advanced Issues in Floorplanning and Placement
13:50 - 15:30
3S  (Special Session) High-Level Synthesis – Now, the Future, and the "Dark Secrets"
15:50 - 17:30
3A  Robust Timing Analysis and Optimization
15:50 - 17:30
3B  Low Power in Deep Sub-Micro: From Architecture to Physical Design
15:50 - 17:30
3C  Emerging Devices for Energy Efficient Computing
15:50 - 17:30



Wednesday, January 27, 2016

TF4303TF4203TF4304TF4204
2K  (TF Theatre)
Keynote II

9:00 - 10:00
4S  (Special Session) Design Challenges for Energy-Efficient IoT Edge Devices
10:20 - 12:00
4A  Taking Advantages of Uncertainty in System Optimization
10:20 - 12:00
4B  Security and Reliability in Emerging Devices
10:20 - 12:00
4C  Routing
10:20 - 12:00
5S  (Special Session) Cross-Layer Resilience: Snapshots from the Frontier of Design
13:50 - 15:55
5A  (Special Session) Design Automation of Energy-Efficient Smart Buildings and Smart Cars
13:50 - 15:55
5B  Advanced Embedded Software Techniques: Sensing, Computation, and Storage
13:50 - 15:55
5C  Advances in Logic Synthesis
13:50 - 15:55



Thursday, January 28, 2016

TF4303TF4203TF4304TF4204
3K  (TF Theatre)
Keynote III

8:30 - 10:00
6S  (Special Session) Cyber-Physical Systems and Security
10:20 - 12:00
6A  Testing, Modeling and Optimization Techniques for Analog Circuits
10:20 - 12:00
6B  Energy-Efficient & Customized Computing
10:20 - 12:00
6C  Design Methodologies for Microfluidic Biochips
10:20 - 12:00
7S  (Special Session) New Frontiers of Physical Design
13:50 - 15:30
7A  System-Level Design for Energy-Efficiency and Reliability
13:50 - 15:30
7B  Design for Trustworthy IC
13:50 - 15:30
7C  Design for Reliability
13:50 - 15:30
8S  (Special Session) Reliability, Adaptability and Flexibility in Timing
15:50 - 17:30
8A  Emerging Networks-on-Chip Designs
15:50 - 17:30
8B  Test and Debug
15:50 - 17:30
8C  Emerging Devices and Systems for Cyber-Physical Applications
15:50 - 17:30


List of papers

Remark: The presenter of each paper is marked with "*".

Tuesday, January 26, 2016

Session 1K  Opening & Keynote I
Time: 8:30 - 10:00 Tuesday, January 26, 2016
Location: TF Theatre
Chair: Rui Martins (University of Macau, Macau)

1K-1 (Time: 9:00 - 10:00)
Title(Keynote Address) The Next Decade
Author*Alessandro Cremonesi (STMicroelectronics, Italy)
KeywordKeynote
AbstractIn his speech Alessandro Cremonesi will give his perspective of major trends in electronics for the next decade. New services and applications will be fueled by the evolution of the electronic systems and by the evolution of the cloud technologies. Both together will bring us to a new way to handle our life, our work, our social interactions and our interaction with the environment. We will have lot of challenges in front of us but also new tools and methods to handle them.


Session 1S  University Design Contest
Time: 10:20 - 12:00 Tuesday, January 26, 2016
Location: TF4303
Chairs: Man-Kay Law (University of Macau, Macau), Yan Lu (University of Macau, Macau)

1S-1 (Time: 10:20 - 10:28)
TitleAn Automatic Place-and-Routed Two-Stage Fractional-N Injection-locked PLL Using Soft Injection
Author*Dongsheng Yang, Wei Deng, Aravind Tharayil Narayanan, Kengo Nakata, Teerachot Siriburanon, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Pagepp. 1 - 2
KeywordAutomatic Place-and-Routed, Synthesizable, Fractional-N, Soft Injection, DPLL
AbstractThis paper presents an automatic place-and-routed two-stage fractional-N injection-locked PLL (IL-PLL) using soft injection technique for on-chip clock generation. Fabricated in a 65nm CMOS process, this prototype demonstrates a 3.6-ps integrated jitter at 1.5222 GHz and consumes 3mW leading to an FoM of -224.6 dB while only occupying an area of 0.048 mm2. It realizes the first fully synthesized fractional-N injection-locked PLL up-to-date.

1S-2 (Time: 10:28 - 10:36)
TitleTime-Domain I/Q-LOFT Compensator Using a Simple Envelope Detector for a Sub-GHz IEEE 802.11af WLAN Transmitter
Author*Chak-Fong Cheang, Ka-Fai Un, Pui-In Mak, Rui Paulo da Silva Martins (University of Macau, Macau)
Pagepp. 3 - 4
Keywordenvelope detector, I/Q imbalance, LO feedthrough, wideband
AbstractThis paper proposes a hardware-efficient time-domain scheme to digitally compensate the I/Q imbalance and LO feedthrough (LOFT) of a sub-GHz wideband transmitter for the IEEE 802.11af WLAN. A simple envelope detector is the only analog part. The parameters are updated by Least-Mean-Square and estimated efficiently in time domain by using COordinate Rotation DIgital Computer (CORDIC), saving the training time and power consumption. The measured wideband image-rejection ratio (IRR) and LO-leakage- rejection ratio (LRR) are improved from 18.9 to 41.3 dB, and 20.4 to 37.9 dB, respectively.

1S-3 (Time: 10:36 - 10:44)
TitleA Noise Reduction Technique for Divider-Less Fractional-N Frequency Synthesizer using Phase-Interpolation Technique
Author*Aravind Tharayil Narayanan, Makihiko Katsuragi, Kengo Nakata, Yuki Terashima, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Pagepp. 5 - 6
KeywordPLL, Fractional, Sub-sampling, phase interpolator, DTC
AbstractThis paper proposes a noise reduction technique for divider-less fractional-N frequency synthesizer using phase-interpolation technique. The phase interpolator helps reduce the jitter introduced into the system by the multi-phase generation mechanism used for the fractional operation. The proposed frequency synthesizer is fabricated in 65nm CMOS process and it is capable of working at frequencies ranging from 4.3GHz to 4.9GHz. The measured close-in phase noise is -113dBc/Hz at an offset of 200kHz from the carrier with 3.3mW power consumption, which results in a FoM of -246dB.

1S-4 (Time: 10:44 - 10:52)
TitleA 2.2 uW 15b Incremental Delta-Sigma ADC with Output-Driven Input Segmentation
Author*Bo Wang (Hong Kong University of Science and Technology, Hong Kong), Man-Kay Law (Macau University, Macau), Saqib Mohamad (Hong Kong University of Science and Technology, Hong Kong), Amine Bermak (Hamad Bin Khalifa University, Qatar)
Pagepp. 7 - 8
Keywordincremental delta-sigma ADC, integrator multiplexing, low power ADC
AbstractA micro-power incremental delta-sigma ADC is presented. This ADC uses its decimation filter’s output to estimate the input signal level and dynamically adjusts the modulator feedback voltage, thereby reducing the integrator input range and power. For further power saving, integrator time-multiplexing is also employed. Fabricated in 0.18um CMOS, the 0.12 mm2 ADC consumes 2.16uW at a conversion speed of 85S/s, 15.3b resolution and -2/1.5LSB INL.

1S-5 (Time: 10:52 - 11:00)
TitleA 200-MHz 4-Phase Fully Integrated Voltage Regulator With Local Ground Sensing Dual Loop ZDS Hysteretic Control Using 6.5nH Package Bondwire Inductors on 65nm Bulk CMOS
AuthorMin Kyu Song, Joseph Sankman, Jayeol Lee, *Dongsheng Ma (The University of Texas at Dallas, U.S.A.)
Pagepp. 9 - 10
Keywordintegrated voltage regulator, fast transient response, multiple-phase operation, dual-loop voltage regulation
AbstractThis paper presents a 200MHz 4-phase fully integrated voltage regulator (FIVR) with 6.5nH package bondwire inductors. With an on-chip delay-locked loop (DLL) for phase synchronization, the proposed FIVR employs a cost-effective local ground sensing feedforward control loop for high speed load transient sensing and a ZDS hysteretic feedback control loop for accurate voltage regulation, independently achieving a dual-loop compensated operation within each sub-converter. Implemented on a standard 65nm bulk CMOS process, the FIVR delivers a peak efficiency of 84.2% at 256mW, with a maximum power density of 670mW/mm2. In response to a 280mA/120ps load step, the FIVR settles within 11ns with 78mV droop. To our best knowledge, this is 2.7 times faster than the best design despite 1.8 times larger load step, while facilitating the use of 1.3 times smaller on-chip capacitor.

1S-6 (Time: 11:00 - 11:08)
TitleAn AC Powered Converter-Free LED Driver with Low Flicker
Author*Yuan Gao, Lisong Li, Philip K.T. Mok (The Hong Kong University of Science and Technology, Hong Kong)
Pagepp. 11 - 12
KeywordLED driver, Flicker
AbstractA 5.5W mains-powered converter-free LED driver for general lighting application is presented in this summary for the university design contest. The driver is superior to its switching converter based counterparts as it does not require any bulky and expense magnetics or electrolytic capacitors. In addition, the driver is able to significantly reduce the flicker at light output with a quasi-constant power control scheme. The measurement results show that the prototype driver achieves 88.2% efficiency and 0.92 power factor with only 18% flicker at 110V AC 60Hz input.

1S-7 (Time: 11:08 - 11:16)
TitleA Variable-Voltage Low-Power Technique for Digital Circuit System
Author*An-Tai Xiao, Yung-Siang Miao (Department of Electronics Engineering, National Chiao Tung University, Taiwan), Ching-Hwa Cheng (Department of Electronics Engineering, Feng Chia University, Taiwan), Jiun-In Guo (Department of Electronics Engineering, National Chiao Tung University, Taiwan)
Pagepp. 13 - 14
KeywordLow-Power, Variable-Voltage
AbstractA swing variable voltage technique (CK-Vdd) is proposed to reduce power consume for generic digital circuit system. The proposed CK-Vdd generates a swing variable voltage, which is different from the conventional constant voltage (Vdd) to the digital circuit. The swing voltage is produced from using Voltage Frequency Adjustor (VFA) and Frequency Duty-Cycle Adjustor (FDCA) circuits. The clock rising and falling signals fanin FDCA to generate an adjustable high-low signal to control VFA generates high-low cycling swing voltage. When the clock is at positive-level, a generic positive-edge digital circuit will need large operation current. CK-Vdd supply high-voltage to the digital circuit at this time. On the other hand, when the clock signal transfers to the low-level, CK-Vdd can supply low-voltage to reduce power consumption. From reducing the supply current to the digital circuit at low-level clock, the digital circuit power consumption can be reduced. We implement the CK-Vdd technique in a H.264 video decoder test chip based on TSMC 90 nm CMOS process. The result shows that when CK-Vdd voltage is 0.7v ~ 0.9v it can save average 32% power consumption. To the maximum, decoder chip can save as high as 45% power consumption.

1S-8 (Time: 11:16 - 11:24)
TitleSub-threshold VLSI Logic Family Exploiting Unbalanced Pull-up/down Network, Logical Effort and Inverse-Narrow-Width Techniques
Author*Ming-Zhong Li, Chio-In Ieong, Man-Kay Law, Pui-In Mak, Mang-I Vai, Sio-Hang Pun, Rui P. Martins (University of Macau, Macau)
Pagepp. 15 - 16
KeywordCMOS, device sizing, inverse narrow width (INW), logical effort, ultralow energy
AbstractThis paper presents a complete energy optimized sub-threshold standard cell library exploiting unbalanced pull-up/down (PU/PD) network, logical effort and inverse-narrow-width (INW) techniques. Individual logic cell is optimized for ultra-low-energy applications with low-to- moderate speed requirement. Three 14-tap 8-bit FIR filters are fabricated using a 0.18-μm CMOS technology, while one of them achieved the minimum energy/tap (0.0234 pJ) and 0.365 Figure-of-Merit (FoM) at 100 kHz, 0.31 V, which are well comparable with the state of the art.

1S-9 (Time: 11:24 - 11:32)
TitleA Testable and Debuggable Dual-Core System with Thermal-Aware Dynamic Voltage and Frequency Scaling
AuthorLiang-Ying Lu, Ching-Yao Chang, Zhao-Hong Chen, Bo-Ting Yeh, Tai-Hua Lu, Peng-Yu Chen, *Pin-Hao Tang, Kuen-Jong Lee, Lih-Yih Chiou, Soon-Jyh Chang, Chien-Hung Tsai, Chung-Ho Chen, Jai-Ming Lin (Department of Electrical Engineering, National Cheng Kung University, Taiwan)
Pagepp. 17 - 18
Keyworddynamic voltage and frequency scaling (DVFS), test and debug platform
AbstractA sophisticated SoC chip that incorporates many design modules including 2 ARM-like CPUs, a dynamic voltage and frequency scaling (DVFS) design, a master/slave temperature sensing system, and an on-chip test/debug platform is developed and implemented with TSMC 90 nm technology. Measurement results validate the functions and efficiencies of the whole chip.

1S-10 (Time: 11:32 - 11:40)
TitleRapid Prototyping of Multi-Mode QC-LDPC Decoder for 802.11n/ac Standard
Author*Qing Lu, Bruce C. W. Sham, Francis C. M. Lau (The Hong Kong Polytechnic University, Hong Kong)
Pagepp. 19 - 20
Keyword802.11n/802.11ac, LDPC, multi-mode, FPGA
AbstractA multi-mode QC-LDPC decoder is proposed to satisfy the 802.11n/802.11ac WiFi standard. With code-specific design technique, the overall performance of the decoder is enhanced through ensuring an on-the-fly reconfigurable ability. The proposed architecture has been synthesized using an FPGA for measurements.

1S-11 (Time: 11:40 - 11:48)
TitleSub-µW QRS Detection Processor Using Quadratic Spline Wavelet Transform and Maxima Modulus Pair Recognition for Power-Efficient Wireless Arrhythmia Monitoring
Author*Chio-In Ieong, Pui-In Mak, Mang-I Vai, Rui P. Martins (University of Macau, Macau)
Pagepp. 21 - 22
KeywordASIC Design, Electrocardiogram, Local Signal Processor, Low Power Sensor Signal Processing, System-on-Chip
AbstractThis paper describes a power-efficient processor for extracting the timing of QRS complex from digitized ECG, based on the hardware-efficient architecture of quadratic spline wavelet transform (QSWT) and maxima modulus pair recognition (MMPR). The processor succeeds in saving the wireless system’s power by 6×.

1S-12 (Time: 11:48 - 11:56)
TitleDesign of an Energy-Autonomous, Disposable, Supply-Sensing Biosensor Using Bio Fuel Cell and 0.23-V 0.25-µm Zero-Vth All-Digital CMOS Supply-Controlled Ring Oscillator with Inductive Transmitter
Author*Kiichi Niitsu, Atsuki Kobayashi (Nagoya University, Japan), Yudai Ogawa, Matsuhiko Nishizawa (Tohoku University, Japan), Kazuo Nakazato (Nagoya University, Japan)
Pagepp. 23 - 24
Keywordenergy-autonomous, biosensor, CMOS, all-digital, bio fuel cells
AbstractAn energy-autonomous, disposable supply-sensing biosensor based on bio fuel cells and a 0.23-V 0.25-um zero-Vth all-digital CMOS supply-controlled ring oscillator with a current-driven pulse-interval-modulated inductive-coupling transmitter was demonstrated. All-digital and current-driven architecture using zero-Vth transistors enables low-voltage operation and small footprint in cost-competitive legacy CMOS. Measured results with 0.25-um CMOS testchip successfully demonstrated operation under a 0.23-V supply, which is the lowest supply voltage among reported proximity transmitters. An energy-autonomous biosensing operation using organic bio fuel cells was also demonstrated.


Session 1A  The Optimization of Memory Architecture and Management
Time: 10:20 - 12:00 Tuesday, January 26, 2016
Location: TF4203
Chairs: Yun Liang (Peking University, China, China), Swathi Gurumani (Advanced Digital Sciences Center, Singapore (UIUC-ASTAR center), Singapore)

1A-1 (Time: 10:20 - 10:45)
TitlePerformance-centric Register File Design for GPUs using Racetrack Memory
Author*Shuo Wang, Yun Liang, Chao Zhang, Xiaolong Xie, Guangyu Sun (Peking University, China), Yongpan Liu, Yu Wang (Tsinghua University, China), Xiuhong Li (Peking University, China)
Pagepp. 25 - 30
KeywordGPU, Performance, Register File, Racetrack Memory, Compiler
AbstractIn this paper, we explore racetrack memory for designing high performance register file for GPU architecture. High storage density racetrack memory helps to improve the thread level parallelism, but the lengthy shift operation may largely degrade the performance. To mitigate the shift operation overhead, we develop a compiler-time managed register mapping algorithm. Our algorithm optimizes the mapping of registers to the physical address in the register file. Experimental results demonstrate that our technique achieves up to 24% (19% on average) improvement in performance for a variety of GPU applications.

1A-2 (Time: 10:45 - 11:10)
TitleImproving Read Performance of STT-MRAM based Main Memories through Smash Read and Flexible Read
AuthorLei Jiang (Advanced Micro Devices, U.S.A.), Wujie Wen (Florida International University, U.S.A.), *Danghui Wang (Northwestern Polytechnical University, China), Lide Duan (University of Texas at San Antonio, U.S.A.)
Pagepp. 31 - 36
KeywordSTT-MRAM, read disturbance, main memory, read scheme, LPDDR3
AbstractSpin Transfer Torque Magnetoresistive RAM (STT-MRAM) has been recently deemed as one promising main memory alternative for high-end mobile processors. With process technology scaling, the amplitude of write current approaches that of read current in deep sub-micrometer STT-MRAM arrays. As a result, read disturbance errors (RDEs) emerge. Both high current restore required (HCRR) reads and low current long latency (LCLL) reads can guarantee read reliability and utterly remove RDEs. However, both of them degrade system performance, because of extra restores or a longer read latency. And neither of them always achieves the better performance when running a wide variety of applications. In this paper, we present two architectural techniques to boost read performance for STT-MRAM based main memories in the presence of RDEs. We first propose Smash Read (S-RD) to shorten the latency of HCRR reads by injecting a larger read current. We further introduce Flexible Read (F-RD) to dynamically adopt different types of read schemes, S-RD and LCLL, to maximize main memory system performance. On average, our techniques improve system performance by 9~13% and reduces total energy by 4~8% over all existing read schemes including HCRR and LCLL.

1A-3 (Time: 11:10 - 11:35)
TitleSTLAC: A Spatial and Temporal Locality-Aware Cache and Network-on-Chip Codesign for Tiled Many-core Systems
Author*Mingyu Wang (Institute of Microelectronics, Tsinghua University, China), Zhaolin Li (Research Institute of Information Technology, Tsinghua University, China)
Pagepp. 37 - 42
KeywordMany-core, Adaptive Cache, Network-on-chip
AbstractThe spatial and temporal locality of workloads are the root causes for cache designs to overcome the memory wall problem. However, few existing state-of-the-art designs exploit both the two locality features to optimize the memory hierarchies in the area of tiled many-core systems, which losses the opportunities to explore more performance improvement. To address this problem, an adaptive spatial and temporal locality-aware cache and network-on-chip (NoC) codesign (STLAC) is proposed, which dynamically partitions the last level cache (LLC) as data prefetch buffer or victim cache for locality prediction and exploits a hybrid burst-support NoC for fast data prefetch. The data prefetch buffer speculates the data blocks in subsequent addresses to exploit the spatial locality, while the victim cache collects the evicted data blocks from the upper memory hierarchy to exploit the temporal locality. By combining the proposed adaptive cache partition with the hybrid burst-support NoC, the off-chip misses and on-chip network usage are greatly reduced. Experimental results demonstrate that the proposed STLAC reduces up to 43% off-chip misses and improves 15% performance on average compared with the traditional shared LLC design.

1A-4 (Time: 11:35 - 12:00)
TitleA Lightweight OpenMP4 Run-time for Embedded Systems
AuthorRoberto E. Vargas, Sara Royuela, *Maria A. Serrano, Xavi Martorell, Eduardo Quiñones (Barcelona Supercomputing Center, Spain)
Pagepp. 43 - 49
KeywordOpenMP4, Parallel programming Models, Many-core embedded processors, Compiler Analysis, Task Dependency Graph
AbstractOpenMP is increasingly being adopted by current many-core embedded processors to exploit their parallel computation capabilities. Unfortunately, current run-time implementations of the latest specification (v4.0) are not suitable for processors relying on small and fast on-chip memories, due to its memory consumption. This paper proposes an OpenMP4 run-time that reduces the memory consumption while providing the same performance. Our run-time relies on a new compiler pass capable to generate the task dependency graph of OpenMP programs, which is then efficiently stored in memory.


Session 1B  Secure Embedded Systems & IoT
Time: 10:20 - 12:00 Tuesday, January 26, 2016
Location: TF4304
Chairs: Qiaoyan Yu (University of New Hampshire, U.S.A.), Swaroop Ghosh (University of South Florida, U.S.A.)

1B-1 (Time: 10:20 - 10:45)
TitleImproving Tag Generation for Memory Data Authentication in Embedded Processor Systems
AuthorTao Liu, *Hui Guo, Sri Parameswaran (The University of New South Wales, Australia), X. Sharon Hu (University of Notre Dame, U.S.A.)
Pagepp. 50 - 55
KeywordTag Design, emory Data Integrity Protection, Low Cost Embedded Systems
AbstractData integrity is important. One way to protect data integrity is attaching an identifying tag to individual data. The authenticity of the data can then be checked against its tag. If the data is altered by the adversary, the related tag becomes invalid and the attack will be detected. This paper studies an existing tag design (CETD) for memory data in embedded processor systems, where data that are stored in the memory or transferred over the bus can be tampered and need to be authenticated before use. Compared to other designs, this design offers the flexibility of tradeoff between the implementation cost and tag size (hence the level of security); the design is cost effective and can counter the data integrity attack with random values; namely the fake values used to replace the valid data in the attack are random. However, we find that the design is vulnerable when the fake data is not randomly selected. For some data, their tags are not distributed over the full tag value space but rather limited to a reduced set of values. When those values were chosen as the fake value, the data alteration would likely go undetected. In this paper, we analytically investigate this problem and propose a low cost enhancement to ensure the full-range distribution of tag values for each data, hence effectively removing the vulnerability of the original design.

1B-2 (Time: 10:45 - 11:10)
TitleJTAG-Based Robust PCB Authentication for Protection Against Counterfeiting Attacks
AuthorAndrew Hennessy, Yu Zheng (Case Western Reserve University, U.S.A.), *Swarup Bhunia (University of Florida, U.S.A.)
Pagepp. 56 - 61
KeywordJTAG, PUF, Security
AbstractA Printed Circuit Board (PCB) provides the backbone for interconnecting diverse electronic components into an electronic system. Unfortunately, the long and distributed supply chain of a PCB makes it vulnerable to variety of integrity violation attacks, primarily different forms of counterfeiting that includes cloning and recycling. In this paper, we propose a novel low-overhead and robust method to authenticate PCBs that utilizes an existing industry standard — IEEE 1149.1 or JTAG test infrastructure to extract high-quality signature with high entropy. Measurement results with 30 custom fabricated test boards are promising in terms of uniqueness and robustness of signature.

1B-3 (Time: 11:10 - 11:35)
TitleMaximizing Level of Confidence for Non-Equidistant Checkpointing
Author*Dimitar Nikolov, Erik Larsson (Lund University, Sweden)
Pagepp. 62 - 68
Keywordsoft errors, reliability analysis, real-time systems, checkpoinitng
AbstractEmploying fault tolerance often introduces a time overhead, which may cause a deadline violation in real-time systems (RTS). Therefore, for RTS it is important to optimize the fault tolerance techniques such that the probability to meet the deadlines, i.e. the Level of Confidence (LoC), is maximized. Previous studies have focused on evaluating the LoC for equidistant checkpointing. However, no studies have addressed the problem of evaluating the LoC for non-equidistant checkpointing. In this work, we provide an expression to evaluate the LoC for non-equidistant checkpointing, and propose the Clustered Checkpointing method that distributes a given number of checkpoints with the goal to maximize the LoC. The results show that the LoC can be improved when non-equidistant checkpointing is used.

1B-4 (Time: 11:35 - 12:00)
TitleA Mutual Auditing Framework to Protect IoT against Hardware Trojans
AuthorChen Liu, Patrick Cronin, *Chengmo Yang (University of Delaware, U.S.A.)
Pagepp. 69 - 74
Keywordhardware Trojan, cryptography, IoT security
AbstractIn an internet of Things (IoT), hardware Trojans implanted in individual nodes, which are malicious modifications to a circuit, may utilize the wireless connection facility to leak confidential information or to collude with each other. To defend against this threat, we develop a lightweight framework to detect Trojans with affordable performance and energy overhead. We propose to exploit message encryption and vendor diversity among the nodes to build a distributed mutual auditing framework wherein nodes monitor the trustworthiness of their neighbors.


Session 1C  Design for Directed Self-Assembly
Time: 10:20 - 12:00 Tuesday, January 26, 2016
Location: TF4204
Chairs: Bei Yu (Chinese University of Hong Kong, Hong Kong), Tetsuaki Matsunawa (Toshiba, Japan)

1C-1 (Time: 10:20 - 10:45)
TitleSimultaneous Template Optimization and Mask Assignment for DSA with Multiple Patterning
Author*Jian Kuang, Junjie Ye, Evangeline F.Y. Young (The Chinese University of Hong Kong, Hong Kong)
Pagepp. 75 - 82
KeywordDSA, Multiple Patterning Lithography
AbstractBlock Copolymer Directed Self-Assembly (DSA) is a promising technique to print contacts/vias for the 10nm technology node and beyond. By using hybrid lithography that cooperates DSA with multiple patterning, multiple masks are used to print the DSA templates and then the templates can be used to guide the self-assembly of the block copolymer. In this paper, we propose approaches to solve the simultaneous template optimization and mask assignment problem for DSA with multiple patterning. We verified in experiments that our approaches remarkably outperform the state of the art work in reducing the manufacturing cost.

1C-2 (Time: 10:45 - 11:10)
TitleMask Optimization for Directed Self-Assembly Lithography: Inverse DSA and Inverse Lithography
Author*Seongbo Shim, Youngsoo Shin (KAIST, Republic of Korea)
Pagepp. 83 - 88
KeywordDSAL, mask optimization, inverse DSA, inverse lithography
AbstractIn directed self-assembly lithography (DSAL), a mask contains the images of guide patterns (GPs), which are patterned on a wafer through optical lithography; the wafer then goes through DSA process to pattern contacts. Mask design for DSAL, which is the opposite of the above processes, consists of two key steps, inverse DSA and inverse lithography, which we address in this paper.

1C-3 (Time: 11:10 - 11:35)
TitleCut Redistribution with Directed Self-Assembly Templates for Advanced 1-D Gridded Layouts
Author*Zhi-Wen Lin, Yao-Wen Chang (National Taiwan University, Taiwan)
Pagepp. 89 - 94
KeywordDirected self-assembly technology, 1-D layout, Design for manufacturability and reliability, Algorithm
AbstractDirected self-assembly (DSA) technology is a promising candidate for cut printing in sub-10nm 1-D gridded designs, where cuts might need to be redistributed such that they could be patterned by DSA guiding templates. In this paper, we first propose a linear-time optimal dynamic-programming-based algorithm for a special case of the template guided cut redistribution problem, where there is at most one dummy wire segment on a track. We then extend our algorithm to general cases by applying a bipartite matching algorithm to decompose a general problem to a set of subproblems conforming to the special case (thus each of them can be solved optimally). Our resulting algorithm can achieve a provably good performance bound, with the cost of a template distribution only linearly to the problem size. Experimental results show that our algorithm can resolve all spacing rule violations, with smaller running times, compared with the previous works on a set of common benchmarks.

1C-4 (Time: 11:35 - 12:00)
TitleContact Layer Decomposition To Enable DSA With Multi-patterning Technique For Standard Cell Based Layout
AuthorZigang Xiao, Chun-Xun Lin, *Martin D.F. Wong (University of Illinois at Urbana-Champaign, U.S.A.), Hongbo Zhang (Synopsys Inc., U.S.A.)
Pagepp. 95 - 102
KeywordDesign for Manufacturability, Directed Self-Assembly, Complementary Lithography, Layout Decomposition, Hybrid Lithography
AbstractMultiple patterning lithography has been widely adopted for today's circuit manufacturing. However, increasing the number of masks will make the manufacturing process more expensive. More importantly, towards 7 nm technology node, the accumulated overlay in multiple patterning will cause unacceptable edge placement error (EPE). Recently, directed self-assembly (DSA) has been shown to be an effective lithography technology that can pattern contact/via/cuts with high throughput and low cost. DSA is currently aiming at 7 nm technology, where the guiding template generation needs either double patterning EUV or multiple patterning DUV process. By incorporating DSA into the multiple patterning process, it is possible to reduce the number of masks and achieve a cost effective solution. In this paper, we study the decomposition problem for contact layer in row-based standard cell layout with DSA-MP complementary lithography. We explore several heuristic-based approaches, and propose an algorithm that decomposes a standard cell row optimally in polynomial-time. Our experiments show that our algorithm guarantees to find a minimum cost solution if one exists, while the heuristic cannot or only finds a sub-optimal solution. Our results show that the DSA-MP complementary approach is very promising for the future advanced nodes.


Session 2S  (Special Session) Designing with Spintronics: Recent Developments and Upcoming Challenges
Time: 13:50 - 15:30 Tuesday, January 26, 2016
Location: TF4303
Organizer/Chair: Sachin S. Sapatnekar (University of Minnesota, U.S.A.)

2S-1 (Time: 13:50 - 14:20)
Title(Invited Paper) Logic and Memory Design using Spin-based Circuits
Author*Zhaoxin Liang, Meghna Mankalale, Brandon Del Bel, Sachin S. Sapatnekar (University of Minnesota, U.S.A.)
Pagepp. 103 - 108
KeywordSpintronics, All-spin logic, ASL, MTJ, error correction
AbstractThe design of logic and memory circuits in emerging spintronics technology offers fertile ground for new ideas and innovations. We first describe methods for optimizing spintronic logic circuits at the level of physical design, including systematic approaches for building standard cell libraries to enable the design of large circuits. Next, we examine issues in the design of spintronic memories and present methods that trade off volatility with error correction to build dense memory arrays.

2S-2 (Time: 14:20 - 14:50)
Title(Invited Paper) Architecture Design with STT-RAM: Opportunities and Challenges
AuthorPing Chi, Shuangchen Li, Yuanqing Cheng (University of California at Santa Barbara, U.S.A.), Yu Lu, Seung H. Kang (Qualcomm Incorporated, U.S.A.), *Yuan Xie (University of California at Santa Barbara, U.S.A.)
Pagepp. 109 - 114
KeywordSTT-RAM, cache design, memory design
AbstractThe emerging STT-RAM has attracted a lot of interest from both academia and industry in recent years. It has been considered as a promising replacement of SRAM and DRAM in the cache and memory system design thanks to many advantages. However, the disadvantages of STT-RAM also bring design challenges. This paper introduces state-of-the-art architectural approaches to adopt STT-RAM in the cache and memory system design by taking advantage of the opportunities brought by STT-RAM as well as overcoming the challenges.

2S-3 (Time: 14:50 - 15:20)
Title(Invited Paper) Prospects of Efficient Neural Computing with Arrays of Magneto-metallic Neurons and Synapses
AuthorAbhronil Sengupta, Karthik Yogendra, Deliang Fan, *Kaushik Roy (Purdue University, U.S.A.)
Pagepp. 115 - 120
KeywordNeuromorphic Computing, Spintronics
AbstractNon-von Neumann computing models, like Artificial and Spiking Neural Networks, inspired from the functionalities of the human brain, would require devices that can offer a direct mapping to the underlying neuroscience mechanisms for energy-efficient and compact hardware implementation. To that effect, spin-transfer torque phenomena in devices based on lateral spin valves, domain wall motion in magnets and magnetic tunnel junctions can potentially pave the way for spintronic neural computing systems, where spintronic neurons interfaced with spintronic synapses, can directly mimic biological neural and synaptic functionalities. We explore various device structures suitable for such non-Boolean functionalities and demonstrate the potential benefits of such neural computing based on arrays of magneto-metallic neurons and synapses.


Session 2A  Advances in Verification
Time: 13:50 - 15:30 Tuesday, January 26, 2016
Location: TF4203
Chairs: Jason C. Verley (Sandia National Laboratories, U.S.A.), Zuochang Ye (Tsinghua University, China)

2A-1 (Time: 13:50 - 14:15)
TitleAutomatic Abstraction Refinement of TR for PDR
AuthorKuan Fan, *Ming-Jen Yang, Chung-Yang (Ric) Huang (Graduate Institution of Electronic Engineering, National Taiwan University, Taiwan)
Pagepp. 121 - 126
KeywordPDR, Abstraction
AbstractLocalization abstraction is a powerful technique that has long been a solution to the scalability problem of hardware model checking. However, computation resources are often inefficiently consumed during the repeated trial-and-errors between abstraction refinement engines and proof engines. To this end, many efforts have been made to combine the two independent techniques for better efficiency in recent years. In this paper, we present a novel model checking method that combines PDR (aka IC3) with a gate-level, hybrid abstraction technique to achieve further enhancement of scalability and performance for PDR. We implemented our work in ABC and evaluated it on the HWMCC13, HWMCC14 benchmark suites. The results show that our method substantially outperforms PDR as implemented in ABC and complements it on a large number of benchmark instances.

2A-2 (Time: 14:15 - 14:40)
TitleA Complete Approach to Unreachable State Diagnosability via Property Directed Reachability
Author*Ryan Berryhill, Andreas Veneris (University of Toronto, Canada)
Pagepp. 127 - 132
Keyworddiagnosis, debugging, reachability, pdr, ic3
AbstractIn modern hardware design, substantial manual effort is required to fix a design when verification discovers a state unreachable. This paper addresses this growing pain where given an unreachable target state, a methodology is presented to return all design locations where a change can be implemented to make the target state reachable. In contrast to previous state reachability rectification techniques that use bounded model checking, our approach addresses the issue using unbounded model checking. It first enhances the circuit transition relation by inserting a novel error model construction at each suspect location. An unbounded model checking algorithm is then applied to the enhanced transition relation to find which of the suspect locations can be changed to make the target state reachable. The use of unbounded model checking allows it to identify the complete solution set of the problem. As an added benefit, it also returns a proof that no further solution(s) exist in the form of an inductive invariant. Empirical results on industrial designs confirm the theoretical and practical gains of this approach.

2A-3 (Time: 14:40 - 15:05)
TitleFormally Analyzing Fault Tolerance in Datapath Designs using Equivalence Checking
AuthorPayman Behnam (University of Tehran, Iran), Bijan Alizadeh (University of Tehran, and IPM, Iran), Sajjad Taheri (University of Tehran, Iran), *Masahiro Fujita (University of Tokyo, Japan)
Pagepp. 133 - 138
KeywordFormal Verification, Equivalence checking, Fault tolerance, Decision Diagrams
AbstractIn this paper, we present an efficient formal approach to check the equivalence of synthesized Register Transfer Level (RTL) against the high level specification in the presence of pipelining transformations. With the proposed equivalence checking method, fault tolerance issues when some faults happen in the designs can be formally analyzed. Equivalence checking with the specification can reason about how quickly the design can come back to normal operations when some faults including soft errors happen. To increase the scalability of our proposed method, we dynamically divide the designs into several smaller parts called segments by introducing dynamic cut-points. Then we employ Modular Horner Expansion Diagram (M-HED) to check whether the specification and the implementation are equivalent or not. Our proposed method enables us to deal with the equivalence checking problem for behaviorally synthesized designs even in the presence of pipelines for nested loops. The empirical results demonstrate the efficiency and scalability of our proposed method in terms of run-time and memory usage for several large designs synthesized by a commercial behavioral synthesis tool. Average improvements in terms of the memory usage and run time in comparison with SMT- and SAT-based equivalence checking are 16.7× and 111.9×, respectively.

2A-4 (Time: 15:05 - 15:30)
TitleCoupling Reverse Engineering and SAT to Tackle NP-Complete Arithmetic Circuitry Verification in ~O(# of gates)
Author*Yi Diao, Xing Wei (Easy-Logic Technology Limited, Hong Kong), Tak.Kei Lam (The Chinese University of Hong Kong, Hong Kong), Yu.Liang Wu (Easy-Logic Technology Limited, Hong Kong)
Pagepp. 139 - 146
KeywordSAT, Multiplier, Arithmetic logic, Macro, Formal verification
AbstractThere are situations (e.g. for reverse engineering or formal verification) circuit designers would need to extract complicated arithmetic circuitry deeply embedded inside a fully synthesized (or manually touched) million-gate flattened netlist without the knowing of module boundary and IO positions. Besides not knowing the IO and boundary, a formal verification task like comparing two netlists implementing (4A+3B)×C and 4A×C+3B×C respectively is quite challenging for it is an NP-Complete Circuit-SAT problem too. To tackle this problem, we propose a novel Complementary Greedy Coupling (CGC) approach coupling reverse engineering and SAT techniques together for each of them only performs well at proving equality or inequality respectively. The scheme is quite powerful, being able to handle commonly implemented arithmetic modules (Ripple/CLA adders, MUX, various multipliers and their combinations) with runtime complexity nearly linear to the number of circuit gates. For an example, our scheme can verify two 32-bit multipliers (Wallace vs Modified-Booth) within 5 seconds (regardless of their equality or inequality), while running SAT alone might take 1010 centuries. We compared our tool Easy-LEC with the two on market commercial tools using the 182 open benchmarks posted for ICCAD CAD Contest 2014. Besides running at least 400 to 1400 times faster, our scheme also solves 32% to 45% more cases (93% vs 61% or 48%).


Session 2B  System Simulation and Testing
Time: 13:50 - 15:30 Tuesday, January 26, 2016
Location: TF4304
Chairs: Liang Shi (Chongqing University, China), Qiang Xu (The Chinese University of Hong Kong, Hong Kong)

2B-1 (Time: 13:50 - 14:15)
TitleNVPsim: A Simulator for Architecture Explorations of Nonvolatile Processors
AuthorYizi Gu, *Yongpan Liu, Yiqun Wang, Hehe Li, Huazhong Yang (Tsinghua University, China)
Pagepp. 147 - 152
Keywordnonvolatile processor, simulator, architecture exploration
AbstractNonvolatile processors (NVPs) preserve run-time information when power failure occurs by utilizing nonvolatile memory technologies. This feature enables NVPs to make forward progress continuously under intermittent power supply in energy harvesting systems. This paper builds a gem5 based NVP simulator named NVPsim, which is validated against measured results of a fabricated prototype with reasonable error rate. Furthermore, to demonstrate the capability of NVPsim for architecture exploration, we evaluated performance and energy consumption of different NVP designs varying in the choice of nonvolatile memory for on-chip caches, the backup strategy and the energy buffer size. Experimental results indicate that nvSRAM outperforms other types of nonvolatile memory as the on-chip cache for energy harvesting systems.

2B-2 (Time: 14:15 - 14:40)
TitleMCSSim: A Memory Channel Storage Simulator
Author*Renhai Chen, Zili Shao (The Hong Kong Polytechnic University, Hong Kong), Chia-Lin Yang (National Taiwan University, Taiwan), Tao Li (University of Florida, U.S.A.)
Pagepp. 153 - 158
KeywordMCSSim, NVDIMMM, Memory Channel Storage
AbstractRecently, NVDIMM (Non-Volatile Dual In-line Memory Module) is being widely supported by leading hardware design companies, such as IBM. Nevertheless, existing efforts largely focus on NVDIMM specification and fabrication issues, and the potential performance gains brought by NVDIMM are not fully investigated. In this paper, we present a NVDIMM based simulator called MCSSim to help study the memory channel storage techniques. MCSSim is a cycle-accurate simulator that is elaborated with the consideration of differences between the memory channel interface and the NAND flash memory features. MCSSim is also implemented with the DRAMSim2 [30] simulator thus enabling the simulation of a variety of hybrid memory systems by combining of DRAM DIMM and NVDIMM. We have done some experiments with MCSSim, and the experimental results show the effectiveness of the proposed simulator.

2B-3 (Time: 14:40 - 15:05)
TitleTrace-Based Context-Sensitive Timing Simulation Considering Execution Path Variations
Author*Sebastian Ottlik, Jan Micha Borrmann, Sadik Asbach, Alexander Viehl (FZI Research Center for Information Technology, Germany), Wolfgang Rosenstiel, Oliver Bringmann (University of Tübingen, Germany)
Pagepp. 159 - 165
KeywordSoftware Timing Simulation, Instruction Set Simulation, Software Performance Analysis
AbstractWe present a fast and accurate timing simulation of binary code execution on complex embedded processors. Underlying block timings are extracted from a preceding hardware execution and differentiated by execution context. Thereby, complex factors, such as caches, can be reflected accurately without explicit modelling. Based on timings observed in one hardware execution, timing of numerous other executions for different inputs can be simulated at an average error below 5% for complex applications on an ARM Cortex-A9 processor.

2B-4 (Time: 15:05 - 15:30)
TitleGenerating High Coverage Tests for SystemC Designs Using Symbolic Execution
Author*Bin Lin, Zhenkun Yang, Kai Cong, Fei Xie (Portland State University, U.S.A.)
Pagepp. 166 - 171
KeywordSystemC, Test Generation, Symbolic Execution, Coverage
AbstractIn this research, we have developed an approach to generating high coverage tests for SystemC designs using symbolic execution. We have applied this approach to a representative set of SystemC designs. The results show that our approach is able to generate tests that provide high code coverage of the designs with modest time and memory usage and to scale to designs of practical sizes.


Session 2C  Advanced Issues in Floorplanning and Placement
Time: 13:50 - 15:30 Tuesday, January 26, 2016
Location: TF4204
Chairs: Sheqin Dong (Tsinghua University, China), Yukihide Kohira (The University of Aizu, Japan)

2C-1 (Time: 13:50 - 14:15)
TitleCircular-Contour-Based Obstacle-Aware Macro Placement
Author*Chien-Hsiung Chiou, Chin-Hao Chang, Szu-To Chen, Yao-Wen Chang (National Taiwan University, Taiwan)
Pagepp. 172 - 177
KeywordVLSI, Physical Design, Macro Placement, Obstacle
AbstractWe present an obstacle-aware macro placement algorithm which locates macros to simultaneously optimize wirelength and routability. We propose a circular contour to characterize the region formed by all obstacles. With the circular contour, we can effectively avoid the overlap between movable macros and obstacles, and simultaneously optimize the shape and area of the region for standard-cell placement. Experimental results show that our algorithm can achieve the best quality, compared to manual designs provided by industry and leading academic mixed-size placers.

2C-2 (Time: 14:15 - 14:40)
TitleLearning-Based Prediction of Embedded Memory Timing Failures During Initial Floorplan Design
AuthorWei-Ting J. Chan (UC San Diego, U.S.A.), Kun Young Chung (Samsung Electronics Co. Ltd., Republic of Korea), Andrew B. Kahng (UC San Diego, U.S.A.), Nancy D. MacDonald (ClariPhy Communications, U.S.A.), *Siddhartha Nath (UC San Diego, U.S.A.)
Pagepp. 178 - 185
KeywordFloorplan, multiphysics, machine learning, Boosting, timing
AbstractEmbedded memories are critical in SoC designs as they pose challenges in timing-correctness in advanced technology nodes. We propose a learning-based methodology to perform early prediction of timing failure risk given only the netlist, timing constraints and floorplan context. We save long runtimes of P&R tools with early prediction. Our methodology identifies which memories are at “risk”, and provides guidance for floorplan changes to reduce predicted “risk”. We can predict slack to within 200ps with only floorplan information.

2C-3 (Time: 14:40 - 15:05)
TitleStitch Aware Detailed Placement for Multiple E-Beam Lithography
AuthorYibo Lin (University of Texas at Austin, U.S.A.), *Bei Yu (Chinese University of Hong Kong, Hong Kong), Yi Zou (University of Texas at Austin, U.S.A.), Zhuo Li, Charles J. Alpert (Cadence Design Systems, Inc., U.S.A.), David Z. Pan (University of Texas at Austin, U.S.A.)
Pagepp. 186 - 191
KeywordMultiple Electron Beam Lithography, Stitch Error, Detailed Placement, Dynamic Programming
AbstractAs a promising candidate for next generation lithography, multiple e-beam lithography (MEBL) is able to improve manufacturing throughput using parallel beam printing. In MEBL, a layout is split into stripes and the layout patterns are cut by stripe boundaries, then all the stripes are printed in parallel. If a via pattern or a vertical long wire is overlapping with a stitch, it may suffer from poor printing quality due to the so called stitch error, then the circuit performance may be degraded. In this paper, we propose a comprehensive study on the stitch aware detailed placement to simultaneously minimize the stitch error and optimize other traditional objectives, e.g., wirelength and density. Experimental results show that our algorithms are very effective on modified ICCAD 2014 benchmarks that zero stitch error is guaranteed while the scaled half-perimeter wirelength is very comparable to a state-of-the-art detailed placer.

2C-4 (Time: 15:05 - 15:30)
TitleMinimum Implant Area-Aware Placement and Threshold Voltage Refinement
AuthorSeong-I Lei, *Wai Kei Mak (National Tsing Hua University, Taiwan), Chris Chu (Iowa State University, U.S.A.)
Pagepp. 192 - 197
KeywordDetailed placement, Threshold Voltage Assignment, Implant area
AbstractThreshold voltage assignment is a very effective technique to reduce leakage power consumption in modern integrated circuit (IC) design. As feature size continues to decrease, the layout constraints (called MinIA constraints) on the implant area, which determines the threshold voltage of a device, are becoming increasingly difficult to satisfy. It is necessary to take these constraints into consideration during the layout stage. In this paper, we propose to resolve the MinIA constraint violations by a simultaneous detailed placement and threshold voltage refinement approach. We present an optimal and efficient mixed integer-linear programming (MILP)-based algorithm which guarantees to fix all MinIA constraint violations. Experimental results demonstrate that our algorithm only perturbs the original placement and threshold voltage assignment solutions minimally to eliminate all violations and is fast in practice.


Session 3S  (Special Session) High-Level Synthesis – Now, the Future, and the "Dark Secrets"
Time: 15:50 - 17:30 Tuesday, January 26, 2016
Location: TF4303
Organizer: Deming Chen (UIUC, U.S.A.), Chair: Eric Yun Liang (Peking University, China)

3S-1 (Time: 15:50 - 16:15)
Title(Invited Paper) Design and Verification Using High-Level Synthesis
Author*Andres Takach (Mentor Graphics, U.S.A.)
Pagepp. 198 - 203
Keywordhigh level synthesis, verification, ECO
AbstractThe adoption of HLS has been driven by the need to tackle growing verification costs in traditional RTL design flows. This paper presents an overview of design, optimization and verification using HLS. It also outlines some of the requirements for HLS design to fit into existing design and verification flows and ways in which such flows might be adapted as HLS is more widely deployed.

3S-2 (Time: 16:15 - 16:40)
Title(Invited Paper) High-Level Synthesis of Accelerators in Embedded Scalable Platforms
AuthorPaolo Mantovani, Giuseppe Di Guglielmo, *Luca P. Carloni (Columbia University, U.S.A.)
Pagepp. 204 - 211
KeywordSoC, system-level design, high-level synthesis, accelerators, embedded scalable plaftorms
AbstractEmbedded scalable platforms combine a flexible socketed architecture for heterogeneous system-on-chip (SoC) design and a companion system-level design methodology. The architecture supports the rapid integration of processor cores with many specialized hardware accelerators. The methodology simplifies the design, integration, and programming of the heterogeneous components in the SoC. In particular, it raises the level of abstraction in the design process and guides designers in the application of high-level synthesis (HLS) tools. HLS enables a more efficient design of accelerators with a focus on their algorithmic properties, a broader exploration of their design space, and a more productive reuse across many different SoC projects.

3S-3 (Time: 16:40 - 17:05)
Title(Invited Paper) High Quality IP Design using High-Level Synthesis Design Flow
Author*Qiang Zhu (Cadence Design Systems, Japan), Masato Tatsuoka (Socionext Inc., Japan)
Pagepp. 212 - 217
KeywordHigh Level Synthesis, IP designs, Physically Aware
AbstractIn this paper we will describe practical experiences about the use of high-level synthesis technologies to achieve higher performance, higher quality, and lower power for IP designs as compared to traditional RTL design. We will demonstrate how the introduction of three key techniques, interface-based design, architectural exploration and congestion-aware high-level synthesis, were utilized to achieve higher quality IP designs. In real application results, we will show significantly better QoR (Quality-of-Results) using high-level synthesis than the traditional RTL design flow by utilizing the above three key technologies.

3S-4 (Time: 17:05 - 17:30)
Title(Invited Paper) Designing High-Quality Hardware on a Development Effort Budget: A Study of the Current State of High-Level Synthesis
AuthorZelei Sun, Keith Campbell, Wei Zuo (UIUC, U.S.A.), Kyle Rupnow, Swathi Gurumani (ADSC, Singapore), Frederic Doucet (Qualcomm, U.S.A.), *Deming Chen (UIUC, U.S.A.)
Pagepp. 218 - 225
KeywordHigh-level synthesis, evaluation, coding guidances, optimization, hardware design
AbstractHigh-level synthesis (HLS) promises high-quality hardware with minimal development effort. In this paper, we evaluate the current state-of-the-art in HLS and design techniques based on software references and architecture references. We present a software reference study developing a JPEG encoder from pre-existing software, and an architecture reference study developing an AES block encryption module from scratch in SystemC and SystemVerilog based on a desired architecture. Additionally, we develop micro-benchmarks to demonstrate best-practices in C coding styles that produce high-quality hardware with minimal development effort. Finally, we suggest language, tool, and methodology improvements to improve upon the current state-of-the-art in HLS.


Session 3A  Robust Timing Analysis and Optimization
Time: 15:50 - 17:30 Tuesday, January 26, 2016
Location: TF4203
Chairs: Ngai Wong (The University of Hong Kong, China), Hao Yu (Nanyang Technological University, Singapore)

3A-1 (Time: 15:50 - 16:15)
TitleClock Buffer Polarity Assignment Utilizing Useful Clock Skews for Power Noise Reduction
Author*Deokjin Joo, Taewhan Kim (Seoul National University, Republic of Korea)
Pagepp. 226 - 231
Keywordclock, scheduling, power, noise
AbstractClock trees are one of the most active components on a chip which makes them one of the most dominant sources of noise. While many clock polarity assignment (PA) techniques were proposed to mitigate the clock noise, no attention has been paid to the PA under useful skew constraints. In this work, we show that PA problem under useful skew constraints is intractable and propose a scalable clique search based algorithm to solve the problem effectively.

3A-2 (Time: 16:15 - 16:40)
TitleBuffer Insertion to Remove Hold Violations at Multiple Process Corners
Author*Inhak Han, Daijoon Hyun, Youngsoo Shin (KAIST, Republic of Korea)
Pagepp. 232 - 237
KeywordBuffer insertion, Hold fix, Process corner
AbstractBuffer insertion to remove hold violations at multiple process corners is addressed for the first time. The problem is formulated as integer linear program (ILP); it is combined with circuit partitioning so that some larger circuits can also be handled. A heuristic algorithm is then proposed and compared to ILP, which demonstrates only a slight increase of the number of buffers (2.4% on average). Two additional intuitive methods are implemented to demonstrate why new heuristic algorithm is needed.

3A-3 (Time: 16:40 - 17:05)
TitleSpeed Binning With High-Quality Structural Patterns From Functional Timing Analysis (FTA)
Author*Louis Y.-Z. Lin, Charles H.-P. Wen (Dept. of Elec. Comp. Engr., National Chiao Tung University, Taiwan)
Pagepp. 238 - 243
Keywordspeed-binning, FTA
AbstractThe operating speed of a chip decides its price in the nanometer era. Thus, design companies require highquality speed binning to maximize their profits. The way they usually rely on is legacy (i.e. structural) tests for speed binning since functional tests are too expensive to derive. Besides legacy and functional tests, recent studies tried to apply the notion of delay testing for deriving speed-binning patterns; nevertheless, all of them could not determine the number of patterns in the meanwhile of taking process variation into consideration. Therefore, in this paper, we propose speed-binning pattern generation (SBPG) method to deterministically generate a high-quality pattern set for speed binning. This SPBG mainly consists of two core techniques: (1) empirical variation sampling (EVS) and (2) functional timing analysis (FTA), which efficiently derives few high-quality patterns from a small number of learning samples. Finally, in experimental results, SBPG achieves a satisfactory accuracy (> 99% on average) for five benchmark circuits under various conditions of process variation, and is shown to be an efficient solution for speed binning.

3A-4 (Time: 17:05 - 17:30)
TitleElectromigration Recovery Modeling and Analysis under Time-Dependent Current and Temperature Stressing
AuthorXin Huang (University of California, Riverside, U.S.A.), Valeriy Sukharev (Mentor Graphics Corporation, U.S.A.), Taeyoung Kim (University of California, Riverside, U.S.A.), Haibao Chen (Shanghai Jiao Tong University, China), *Sheldon X.-D. Tan (University of California, Riverside, U.S.A.)
Pagepp. 244 - 249
KeywordEM, reliability, recovery, analytical model
AbstractElectromigration (EM) has been considered to be the major reliability issue for current and future VLSI technologies. Current EM reliability analysis is overloaded by over-conservative and simplified EM models. Particularly the transient recovery effect in the EM-induced stress evolution kinetics has never been treated properly in all the existing analytical EM models. In this article, we propose a new physics-based dynamic compact EM model, which for the first time, can accurately predict the transient hydrostatic stress recovery effect in a confined metal wire. The new dynamic EM model is based on the direct analytical solution of one-dimensional Korhonen’s equation with load driven by any unipolar or bipolar current waveforms under varying temperature. We show that the EM recovery effect can be quite significant even under unidirectional current loads. This healing process is sensitive to temperature, and higher temperatures lead to faster and more complete recovery. Such effect can be further exploited to significantly extend the lifetime of the interconnect wires if the chip current or power can be properly regulated and managed. As a result, the new dynamic EM model can be incorporated with existing dynamic thermal/power/reliability management and optimization approaches, devoted to reliability-aware optimization at multiple system levels (chip/server/rack/data centers). Presented results show that the proposed EM model agrees very well with the numerical analysis results under any time-varying current density and temperature profiles.


Session 3B  Low Power in Deep Sub-Micro: From Architecture to Physical Design
Time: 15:50 - 17:30 Tuesday, January 26, 2016
Location: TF4304
Chairs: Takashi Sato (Kyoto University, Japan), Pingqiang Zhou (ShanghaiTech University, China)

3B-1 (Time: 15:50 - 16:15)
TitleA Novel Low-Cost Dynamic Logic Reconfigurable Structure Strategy for Low Power Optimization
Author*Yu-Guang Chen, Wan-Yu Wen, Yun-Ting Wang, You-Luen Lee, Shih-Chieh Chang (National Tsing Hua University, Taiwan)
Pagepp. 250 - 255
KeywordDVFS, Low Power Design, Dynamic Logic Reconfigurable Structure
AbstractLow power design techniques have been extensively applied in modern IC designs to avoid negative side effects from high power density. Unlike Dynamic Voltage and/or Frequency Scaling (DVFS) approaches only applied on a “fixed” design, we propose a dynamic logic reconfigurable structure strategy which allows dynamic switching from a high speed/power logic structure to a low speed/power logic structure. A design with such configurable structure is called Dynamic Logic Reconfigurable Structure (DLRS). Different from approximate computing which trades off between computation accuracy and power, our DLRS designs maintain data integrity. In this paper, we propose novel low-cost DLRS adders and multipliers, and a comprehensive framework for low power designs. We further integrate DLRS with DVFS, which creates more flexibility to trade-off between performance and power consumption. Experimental results show that with DLRS adders and multipliers in three indoor designs, the proposed method can achieve up to 60.05% power reduction compared with traditional DVFS scheme with only 6.55% area overhead.

3B-2 (Time: 16:15 - 16:40)
TitleAn Energy-Efficient Random Number Generator for Stochastic Circuits
Author*Kyounghoon Kim (Seoul National University, Republic of Korea), Jongeun Lee (UNIST, Republic of Korea), Kiyoung Choi (Seoul National University, Republic of Korea)
Pagepp. 256 - 261
KeywordStochastic computing, stochastic number generator, energy-efficient design, approximate computing
AbstractStochastic circuits provide very high efficiency in terms of gate area and power consumption compared with conventional binary logic. However, they require random bit streams generated by stochastic number generators (SNGs), which account for a significant portion of area and energy offsetting their merits. In this paper, we propose a new SNG that significantly reduces area and energy while improving accuracy in progressive precision. Experimental results show that the proposed SNG reduces energy by more than 72% compared to the state-of-the-art designs.

3B-3 (Time: 16:40 - 17:05)
TitleDesign of an All-Digital Temperature Sensor in 28 nm CMOS Using Temperature-Sensitive Delay Cells and Adaptive-1P Calibration for Error Reduction
AuthorShang-Yi Li, *Pei-Yuan Chou, Jinn-Shyan Wang (Chung-Cheng University, Taiwan)
Pagepp. 262 - 267
Keywordtemperature sensor, all digital, calibration, zero temperature coefficient, process variation
AbstractWe describe design techniques, calibration method, and measurement results of an all-digital temperature sensor in 28 nm CMOS. To deal with the issue of Vcc being near the zero-temperature-coefficient point, a new delay cell with much improved temperature sensitivity is proposed. Adaptive 1-point (1P) calibration is proposed to reduce the serious impact due to process variations, while without increasing the calibration cost. Measurement results show that, compared to the conventional 1P calibration, the new method achieves a 32% error reduction.

3B-4 (Time: 17:05 - 17:30)
TitleDesign and Allocation of Loosely Coupled Multi-bit Flip-flops for Power Reduction in Post-Placement Optimization
Author*Hyoungseok Moon, Taewhan Kim (Seoul National University, Republic of Korea)
Pagepp. 268 - 273
KeywordFlip-flop allocation, clock power, Post-placement
AbstractRecently, allocating multi-bit flip-flops (MBFFs) as opposed to 1-bit flip-flops has been recognized as one of effective design optimization techniques to reduce clock power. This work tries to eliminate timing and area constraints so that a full benefit of multi-bit flip-flops can be reaped. Precisely, rather than using the conventional structure of multi-bit flip-flops, we introduce a new style of multi-bit flip-flop, called loosely coupled multi-bit flip-flop (LC-MBFF). Utilizing LC-MBFFs, we propose a routability and clock-tree driven multi-bit flip-flop allocation algorithm, which fully explores the diverse allocation of LC-MBFF structures to maximally reduce clock power consumption.


Session 3C  Emerging Devices for Energy Efficient Computing
Time: 15:50 - 17:30 Tuesday, January 26, 2016
Location: TF4204
Chairs: Danghui Wang (Northwestern Polytechnical University, China), Jingtong Hu (Oklahoma State University, U.S.A.)

3C-1 (Time: 15:50 - 16:15)
TitleThermal Optimization for Memristor-Based Hybrid Neuromorphic Computing Systems
AuthorChi-Ruo Wu (National Cheng Kung University, Taiwan), Wei Wen (University of Pittsburgh, U.S.A.), *Tsung-Yi Ho (National Tsing Hua University, Taiwan), Yiran Chen (University of Pittsburgh, U.S.A.)
Pagepp. 274 - 279
KeywordNeuromorphic, Memristor, Thermal
AbstractNeuromorphic computing is used for accelerating the computation of neural network which can simulate the brain of animal and composed by neurons and synapses. However, the neuromorphic computing with the traditional computer architecture leads to serious von Neumann bottleneck because of the gap between high frequency CPU computation and memory access. The emerging memristor is an innovation technology for future VLSI circuits potentially can be acted as both data storage and computing unit to transform the computer architecture. Furthermore, the characteristics of memristors include low programming energy, parallel process, small footprint, non-volatility, etc, which have attracted significant researches on neuromorphic computing. However, some important issues such as thermal damage defect the reliability of memristors. High thermal of memristor is a critical issue which impacts the reliability of the systems. To estimate the thermal of the memristor, we formulated the thermal as the power consumption problem. In this paper, a thermal optimization algorithm for memristor-based hybrid neuromorphic computing system is proposed to solve the the reliability issue by the incremental cluster network flow. Our results show that the maximum power consumption can be reduced about 31%.

3C-2 (Time: 16:15 - 16:40)
TitleAn Energy-efficient Matrix Multiplication Accelerator by Distributed In-memory Computing on Binary RRAM Crossbar
Author*Leibin Ni, Yuhao Wang, Hao Yu (Nanyang Technological University, Singapore), Wei Yang, Chuliang Weng, Junfeng Zhao (Shannon Laboratory, Huawei Technologies Co., Ltd, China)
Pagepp. 280 - 285
KeywordRRAM, In-memory architecture
AbstractEmerging resistive random-access memory (RRAM) can provide non-volatile memory storage but also intrinsic logic for matrix-vector multiplication, which is ideal for low-power and high-throughput data analytics accelerator performed in memory. However, the existing RRAM-based computing device is mainly assumed on a multi-level analog computing, whose result is sensitive to process non-uniformity as well as additional AD- conversion and I/O overhead. This paper explores the data analytics accelerator on binary RRAM-crossbar. Accordingly, one distributed in-memory computing architecture is proposed with design of according component and control protocol. Both memory array and logic accelerator can be implemented by RRAM-crossbar purely in binary, where logic-memory pairs can be distributed with protocol of control bus. Based on numerical results for fingerprint matching that is mapped on the proposed RRAM-crossbar, the proposed architecture has shown 2.86x faster speed, 154x better energy efficiency, and 100x smaller area when compared to the same design by CMOS-based ASIC.

3C-3 (Time: 16:40 - 17:05)
TitleA Racetrack Memory Based In-memory Booth Multiplier for Cryptography Application
Author*Tao Luo (Nanyang Technological University, Singapore), Wei Zhang (Hong Kong University of Science and Technology, Hong Kong), Bingsheng He, Douglas Maskell (Nanyang Technological University, Singapore)
Pagepp. 286 - 291
KeywordRacetrack memory, RSA, Multiplier, Adder
AbstractSecurity is an important concern in cloud comput- ing nowadays. RSA is one of the most popular asymmetric encryption algorithms that are widely used in internet based applications for its public key strategy advantage over symmetric encryption algorithms. However, RSA encryption algorithm is very compute intensive, which would affect the speed and power efficiency of the encountered applications. Racetrack Memory (RM) is a newly introduced promising technology in future storage and memory system, which is perfect to be used in memory intensive scenarios because of its high data density. However, novel designs should be applied to exploit the advantages of RM while avoiding the adverse impact of its sequential access mechanism. In this paper, we present an in-memory Booth multiplier based on racetrack memory to alleviate this problem. As the building block of our multiplier, a racetrack memory based adder is proposed, which saves 56.3% power compared with the state-of-the-art magnetic adder. Integrated with the storage element, our proposed multiplier shows great efficiency in area, power and scalability.

3C-4 (Time: 17:05 - 17:30)
TitleLook-ahead Schemes for Nearest Neighbor Optimization of 1D and 2D Quantum Circuits
Author*Robert Wille (Johannes Kepler University Linz, Austria), Oliver Keszocze (DFKI GmbH, Germany), Marcel Walter, Patrick Rohrs (University of Bremen, Germany), Anupam Chattopadhyay (Nanyang Technological University, Singapore), Rolf Drechsler (University of Bremen, Germany)
Pagepp. 292 - 297
Keywordquantum circuits, nearest neighbor, technology mapping
AbstractEnsuring nearest neighbor compliance of quantum circuits by inserting SWAP gates has heavily been considered in the past. Here, quantum gates are considered which work on non-adjacent qubits. SWAP gates are applied in order to “move” these qubits onto adjacent positions. However, a decision how exactly the SWAPs are “moved” has mainly been made without considering the effect a “movement” of qubits may have on the remaining circuit. In this work, we propose a methodology for nearest neighbor optimization which addresses this problem by means of a look-ahead scheme. To this end, two representative implementations are presented and discussed in detail. Experimental evaluations show that, in the best case, reductions in the number of SWAP gates of 56% (compared to the state-of-the-art methods) can be achieved following the proposed methodology.



Wednesday, January 27, 2016

Session 2K  Keynote II
Time: 9:00 - 10:00 Wednesday, January 27, 2016
Location: TF Theatre
Chair: Pui-In Mak (University of Macau, Macau)

2K-1 (Time: 9:00 - 10:00)
Title(Keynote Address) Systems of Systems - The Next Frontier of Semiconductor
Author*Qi Wang (Cadence Design Systems, Inc., U.S.A.)
KeywordKeynote
AbstractMost of today’s most exciting new electronic products are not single-function, standalone devices, but rather are multi-function system devices, composed of subsystems, and connected into even larger systems. Being at the core of any electronic system, the semiconductor technology is going through a sea change where tackling the traditional semiconductor issues such as timing, power, and performance becomes insufficient. Additional challenges include time-to-market, functional partitioning, communications protocols, IP selection, hardware-software verification, reliability, safety, and many others. In this presentation, the presenter will summarize the design challenges and highlight some solutions for system design enablement in this increasingly complex environment.


Session 4S  (Special Session) Design Challenges for Energy-Efficient IoT Edge Devices
Time: 10:20 - 12:00 Wednesday, January 27, 2016
Location: TF4303
Organizers/Chairs: Saibal Mukhopadhyay (Georgia Tech, U.S.A.), Vijay Raghunathan (Purdue University, U.S.A.)

4S-1 (Time: 10:20 - 10:45)
Title(Invited Paper) Energy-Efficient System Design for IoT Devices
AuthorHrishikesh Jayakumar, Arnab Raha, Younghyun Kim, Soubhagya Sutar, Woo Suk Lee, *Vijay Raghunathan (Purdue University, U.S.A.)
Pagepp. 298 - 301
KeywordInternet of Things, IoT, low power design, energy efficient design
AbstractIt is projected that, within the coming decade, there will be more than 50 billion smart objects connected to the Internet of Things (IoT). These smart objects, which connect the physical world with the world of computing infrastructure, are expected to pervade all aspects of our daily lives and revolutionize a number of application domains such as healthcare, energy conservation, transportation, etc. In this paper, we present an overview of the challenges involved in designing energy-efficient IoT edge devices and describe recent research that has proposed promising solutions to address these challenges. First, we outline the challenges involved in efficiently supplying power to an IoT device. Next, we discuss the role of emerging memory technologies in making IoT devices energy-efficient. Finally, we discuss the potential impact that approximate computing can have in increasing the energy-efficiency of wearables and other compute- intensive IoT devices.

4S-2 (Time: 10:45 - 11:10)
Title(Invited Paper) Energy Delivery for Self-Powered IoT Devices
AuthorKhondker Z. Ahmed, Monodeep Kar, *Saibal Mukhopadhyay (Georgia Tech, U.S.A.)
Pagepp. 302 - 307
KeywordIoT power delivery, bias gating, high conversion ratio, boost, buck
AbstractDistributed small-scale electronics for IoT applications are on the rise. Power delivery for such electronics requires innovative design techniques to improve energy efficiency. This paper summarizes energy delivery challenges for IoT devices and discusses several design techniques for efficient power delivery units. Such design solutions cover challenges like energy harvesting from very low input voltage, maximized energy harvesting, energy delivery with multiple voltage domains and design using low voltage devices to sustain higher than breakdown voltages.

4S-3 (Time: 11:10 - 11:35)
Title(Invited Paper) Efficient Embedded Learning for IoT Devices
AuthorSwagath Venkataramani, Kaushik Roy, *Anand Raghunathan (Purdue University, U.S.A.)
Pagepp. 308 - 311
KeywordInternet of things, Machine learning, Accelerators, Approximate computing, Spintronic Devices
AbstractThe pervasiveness of IoT devices will usher an unprecedented growth in the amount of digital data produced and consumed. Realizing the rich class of applications enabled by IoT devices requires large-scale machine learning systems to make sense of the raw data and derive meaningful, actionable information. State-of-the-art machine learning algorithms are highly compute and data intensive, posing significant computational challenges across the spectrum of computing devices, from low-power client devices to the cloud. As benefits due to semiconductor technology scaling diminish, addressing the computational gap requires identifying new sources of computing efficiency. In this paper, we highlight 3 approaches viz. machine learning accelerators, approximate computing and post-CMOS technologies that demonstrate significant promise in bridging the efficiency gap.

4S-4 (Time: 11:35 - 12:00)
Title(Invited Paper) Computing with Coupled Spin Torque Nano Oscillators
AuthorKarthik Yogendra (Purdue University, U.S.A.), Deliang Fan (Univerisity of Central Florida, U.S.A.), Yong Shim, Minsuk Koo, *Kaushik Roy (Purdue University, U.S.A.)
Pagepp. 312 - 317
Keywordcoupled oscillators, frequency locking, non-Boolean computation, spin torque nano oscillators, spin transfer torque
AbstractThis paper gives an overview of coupled oscillators and how such oscillators can be efficiently used to perform computations that are unsuitable or inefficient in von-Neumann computing models. The “unconventional computing” ability of coupled oscillatory system is demonstrated through Spin Torque Nano Oscillators (STNOs). Recent experiments on STNOs have demonstrated their frequency of oscillation in few tens of gigahertz range, operating at low input currents. These attractive features and the ability to obtain frequency locking using a variety of techniques, make STNOs an attractive candidate for non-Boolean computing. We discuss coupled STNO systems for applications such as edge detection of an image, associative computing, determination of L2 norm for distance calculation, and pattern recognition.


Session 4A  Taking Advantages of Uncertainty in System Optimization
Time: 10:20 - 12:00 Wednesday, January 27, 2016
Location: TF4203
Chairs: Ing-Jer Huang (National Sun Yat-sen University, Taiwan), Chun Jason Xue (City University of Hong Kong, Hong Kong)

4A-1 (Time: 10:20 - 10:45)
TitleApproxMap: On Task Allocation and Scheduling for Resilient Applications
Author*Juan Yi (Chongqing University, China), Qian Zhang, Ye Tian, Ting Wang (The Chinese University of Hong Kong, China), Weichen Liu, Edwin H.-M. Sha (Chongqing University, China), Qiang Xu (The Chinese University of Hong Kong, China)
Pagepp. 318 - 323
Keywordmultiprocessors, resilient application, scheduling, energy efficiency
AbstractMany emerging applications are inherently error-resilient and hence do not require exact computation. In this paper, we consider the task allocation and scheduling problem for mapping such applications to voltage-scalable multiprocessor systems. The proposed solution, namely ApproxMap, judiciously determines the mapping and execution sequence of resilient tasks to minimize the energy consumption of the application while meeting their target quality requirements and timing constraints. To be specific, ApproxMap generates energy-efficient yet flexible task schedule at design-time, and conducts lightweight online adjustment according to runtime dynamics for further energye-fficiency improvement. Experimental results on various task graphs demonstrate the efficacy of ApproxMap.

4A-2 (Time: 10:45 - 11:10)
TitleEnergy Optimization of Stochastic Applications with Statistical Guarantees of Deadline and Reliability
Author*Xiong Pan, Wei Jiang (University of Electronic Science and Technology of China, China), Ke Jiang (Linköping University, Sweden), Liang Wen, Qi Dong (University of Electronic Science and Technology of China, China)
Pagepp. 324 - 329
KeywordEnergy, Reliability, Soft Real-time, Stochastic task, System-level design
AbstractIn this paper, we target on energy-efficient design of soft real-time and reliable applications on uniprocessor embedded systems. We consider soft real-time tasks with stochastic execution times with given distribution. Instead of guaranteeing hard real-time constraint, the application may be finished after their deadlines with a certain probability. We utilize Dynamic Voltage and Frequency Scaling (DVFS) to save energy, and also take into account of the impact of DVFS on reliability. Our objective is to minimize the expected energy consumption of the system subject to statistical reliability and deadline constraints. Due to the huge complexity of solving the problem exactly, we develop a fast bi-search approach based on dynamic programming, which can find the near-optimal solution with energy cost at most (1+β) times of the optimal energy and has polynomial time complexity. Extensive experiments and a real-life application were conducted to evaluate the efficiency of the proposed techniques.

4A-3 (Time: 11:10 - 11:35)
TitleSMoSi: A Framework for the Derivation of Sleep Mode Traces from RTL Simulations
Author*Dustin Peterson, Oliver Bringmann (University of Tübingen, Germany)
Pagepp. 330 - 335
Keywordsleep mode trace, design partitioning
AbstractWe propose a methodology for the generation of sleep modes traces. Sleep mode traces identify idle times of components in a design and are used in state-of-the-art power optimization approaches. While designers are currently forced to generate them manually, our graph-based method enables a full automation of this process. We implemented our methodology in a framework, that we call SMoSi. Experiments show that SMoSi generates sleep mode traces in reasonable time for a given design.

4A-4 (Time: 11:35 - 12:00)
TitleOptimization of Behavioral IPs in Multi-Processor System-on-Chips
AuthorYidi Liu, *Benjamin Carrion Schafer (the Hong Kong Polytechnic University, Hong Kong)
Pagepp. 336 - 341
Keywordhigh level synthesis, multi-processor SoC, behavioral IP, MPSoC
AbstractThis work shows that behavioral IPs (BIPs) are often over-designed when used in heterogeneous Multi-Processor SoCs (MPSoCs) mainly because they are designed and optimized separately. When inserted in the MPSoC, these IPs often have to wait for data from the master and also access to the bus to return the results. Behavioral IPs have the advantage over traditional RTL-based IPs that they can be re-synthesized with different constraints, which allows the generation of micro-architectures with unique area vs. performance trade-off. This work leverages this and introduces a method to automatically identify the workload of each behavioral IP mapped as a slave on an MPSoC system and re-synthesizes it to maximize its efficiency, i.e. reduce its area and minimize its idle time, without affecting the overall performance. We show the area can be reduced by up to 26.1% compared to the fastest implementation without any performance degradation and on average by 13.21%. Compared to an exhaustive search our method is only on average 5% worse while on average 16x faster.


Session 4B  Security and Reliability in Emerging Devices
Time: 10:20 - 12:00 Wednesday, January 27, 2016
Location: TF4304
Chairs: Yier Jin (University of Central Florida, U.S.A.), Swaroop Ghosh (University of South Florida, U.S.A.)

4B-1 (Time: 10:20 - 10:45)
TitleA Novel PUF based on Cell Error Rate Distribution of STT-RAM
Author*Xian Zhang, Guangyu Sun (Peking University, China), Yaojun Zhang, Yiran Chen, Hai Li (University of Pittsburgh, U.S.A.), Wujie Wen (Florida International University, U.S.A.), Jia Di (University of Arkansas, U.S.A.)
Pagepp. 342 - 347
KeywordPUF, STTRAM, Spintronic
AbstractPhysical Unclonable Functions (PUFs) have been widely proposed as security primitives to provide device iden- tification and authentication. Recently, PUFs based on Non- volatile Memory (NVM) are widely proposed since the promise of NVMs’ wide application. In addition, NVM-based PUFs are considered to be more immune to invasive attack and simulation attack than CMOS-based PUFs. However, the existing NVM- based PUF either shows the unreliability under environmental variations or need extra modifications to the IC manufacturing process. In this work, we propose err-PUF, a novel PUF design based on the cell error rate distribution of STT-RAM. Instead of using the distribution directly, we generate a stable finger- print based on a novel concept called Error-rate Differential Pair (EDP) without modifications to the read/write circuits. Comprehensive results demonstrate that err-PUF can achieve sufficient reliability under environmental variations, which can significantly impact the cell error rates. Moreover, compared with existing approaches, err-PUF has a higher speed and lower power consumption with negligible overhead.

4B-2 (Time: 10:45 - 11:10)
TitleData Privacy in Non-Volatile Cache: Challenges, Attack Models and Solutions
AuthorNitin Rathi, *Swaroop Ghosh, Anirudh Iyengar (University of South Florida, U.S.A.), Helia Naeimi (Intel Labs, U.S.A.)
Pagepp. 348 - 353
KeywordNonvolatile cache memory, Data Privacy, Attack model, Architecture
AbstractNon-volatile memories (NVMs) have drawn significant attention due to complete elimination of bitcell leakage. Among the NVMs, Spin-Transfer-Torque RAM (STTRAM) is considered to be a strong candidate for last level cache (LLC). Although promising STTRAM LLC brings new security challenges that were absent in conventional volatile memories such as Static RAM (SRAM). The root cause is persistent data and the fundamental dependency of the memory technology on ambient parameters such as magnetic field and temperature that can be exploited to compromise the data. We provide a qualitative analysis of the data privacy issues in the emerging nonvolatile cache. We also propose new attack models to compromise the sensitive data in LLC. The encryption technique used to secure the data in main memory and hard disk may not be useful for LLC due to latency overhead. We propose two low-overhead techniques to ensure data privacy in LLC- (a) implementing semi nonvolatile memory (SNVM); and, (b) data erasure at power OFF. Erasing could be energy intensive and may require dedicated battery to work under power failure attacks. To address this concern we reuse the energy stored in power rail after power OFF to erase the bits using a canary circuit to track MTJ write time. The simulation results show 0.6% IPC loss and 1.2% energy overhead during normal operation due to added circuitry.

4B-3 (Time: 11:10 - 11:35)
TitlePin Tumbler Lock: A Shift based Encryption Mechanism for Racetrack Memory
Author*Hongbin Zhang (Tsinghua University, China), Chao Zhang, Xian Zhang, Guangyu Sun (Peking University, China), Jiwu Shu (Tsinghua University, China)
Pagepp. 354 - 359
KeywordRacetrack Memory, Encryption, NVM
AbstractAs various non-volatile memory (NVM) technologies have been adopted in different levels of memory hierarchy, the security issue of protecting information retained in NVM after power-off has become a new challenge, which results in extensive research on data encryption for NVM. Previous encryption approaches, however, have some limitations, such as high design complexity and non-trivial timing and energy overhead. Recently, an emerging NVM called racetrack memory (RM) has been widely investigated because of its advantages of ultra-high storage density and fast read/write speed. Besides these well-known advantages, we observe that the tape-like structure of RM cell and its unique shift operation can also be leveraged to facilitate NVM data encryption. Base on this observation, we propose an efficient shift based mechanism, named Pin Tumbler Lock (PTL), which completes encryption and decryption by shifting racetracks in several nanoseconds. Experimental results demonstrate that our design can achieve the same security strength of AES-128 with 3.1% performance overhead and 3.7% energy overhead and 1.56% storage cost and 1.6% area cost.

4B-4 (Time: 11:35 - 12:00)
TitleRouting Path Reuse Maximization for Efficient NV-FPGA Reconfiguration
AuthorYuan Xue, Patrick Cronin, *Chengmo Yang (University of Delaware, U.S.A.), Jingtong Hu (Oklahoma State University, U.S.A.)
Pagepp. 360 - 365
KeywordNVM-based FPGA, reuse-aware routing, switch-box reconfiguration
AbstractNon-volatile memory-based FPGAs (NV-FPGAs) are expecting to replace traditional SRAM-based FPGAs to achieve higher scalability and lower power consumption. Yet the slow write performance of NVMs challenges FPGA (re)configuration speed and overhead. To efficiently configure switch boxes, this paper proposes a routing path reuse technique. Technical contributions include a mathematical reconfiguration cost model of routing resources, a reuse-aware routing scheme, as well as the incorporation of the proposed scheme into standard VTR CAD tool.


Session 4C  Routing
Time: 10:20 - 12:00 Wednesday, January 27, 2016
Location: TF4204
Chairs: Iris Hui-Ru Jiang (National Chiao Tung University, Taiwan), Yao-Wen Chang (National Taiwan University, Taiwan)

4C-1 (Time: 10:20 - 10:45)
TitleMCMM Clock Tree Optimization based on Slack Redistribution Using a Reduced Slack Graph
Author*Rickard Ewetz, Cheng-Kok Koh (Purdue University, U.S.A.)
Pagepp. 366 - 371
KeywordCTO, clock, skew
AbstractModern clock networks are required to operate in multiple corners and in multiple modes (MCMM). An initially con- structed clock tree may contain different timing violations in different mode and corner combinations. Clock tree optimiza- tion (CTO) is employed to remove these timing violations. We propose a CTO framework based on slack redistribution us- ing a reduced slack graph. The main idea is to reduce the MCMM problem to an equivalent single-corner single-mode (SCSM) problem using delay adjustment linearization. Using the equivalent SCSM problem, a linear program is solved to determine a set of delay adjustments to remove the timing vi- olations. Next, the delay adjustments are realized using feasi- ble delay adjustment ranges. The experimental results show that the proposed framework obtains average reductions of 84% and 83% in the total negative slack and the worst neg- ative slack, respectively, at the expense of a 4% capacitive overhead.

4C-2 (Time: 10:45 - 11:10)
TitleDynamic Planning of Local Congestion from Varying-Size Vias for Global Routing Layer Assignment
AuthorDaohang Shi, Edward Tashjian, *Azadeh Davoodi (University of Wisconsin-Madison, U.S.A.)
Pagepp. 372 - 377
Keywordglobal routing, layer assignment, local congestion, detailed routing, via modeling
AbstractThis work is the first to present global routing models for capturing the impact of local congestion caused by varying-size vias. The models are then incorporated to dynamically drive a proposed layer assignment algorithm. This is also the first work to actually evaluate the impact of global routing solutions using a commercial detailed router. In our experiments we report fewer number of DRC violations by only changing the layer assignment at global routing, and detailed route using the Olympus-SoC of Mentor Graphics.

4C-3 (Time: 11:10 - 11:35)
TitleNegotiation-Based Track Assignment Considering Local Nets
Author*Man-Pan Wong (National Tsing Hua University, Taiwan), Wen-Hao Liu (Cadence Design Systems Inc., U.S.A.), Ting-Chi Wang (National Tsing Hua University, Taiwan)
Pagepp. 378 - 383
KeywordRoutability, congestion, track assignment
AbstractRoutability has become a very challenging issue in a modern VLSI design flow. Many works use global routing to estimate the routability in early design stages. However, global routing cannot accurately capture local congestion, so it is hard to detect the detailed routability issue. To more accurately estimate the detailed-routing routability, this paper presents a track-assignment-based routability estimator. In this work, wire segments called iroutes are extracted from a global routing result, and then the proposed negotiation-based algorithm assigns these iroutes to proper tracks and minimizes the overlaps between the iroutes. Based on the assignment result, we can judge which regions may have critical routability issues by seeing where more overlaps reside.

4C-4 (Time: 11:35 - 12:00)
TitleOrdered Escape Routing for Grid Pin Array Based on Min-cost Multi-commodity Flow
Author*Fengxian Jiao, Sheqin Dong (Tsinghua University, China)
Pagepp. 384 - 389
KeywordPCB Routing, Ordered Escape Routing, Min-cost Multi-commodity flow
AbstractOrdered Escape routing is a critical issue in high-speed PCB routing. In this paper, for the first time, a Min-cost Multi-commodity Flow (MMCF) approach is proposed to solve the ordered escape routing. The characteristic of grid pin array is analyzed and then a basic network model is used to convert ordered escape routing to MMCF model. To satisfy the constraints of ordered escape routing, three novel transformations, such as non-crossing transformation, ordering transformation and capacity transformation, are used to convert the basic network model to the final correct MMCF model. Experimental results show that our method achieves 100% routability for all the test cases. The method can get both a feasible solution and an optimal solution. Compared to published approaches, our method improves in both wire length and CPU time remarkably.


Session 5S  (Special Session) Cross-Layer Resilience: Snapshots from the Frontier of Design
Time: 13:50 - 15:55 Wednesday, January 27, 2016
Location: TF4303
Organizer: Ulf Schlichtmann (TUM, Germany), Chair: Jörg Henkel (KIT, Germany)

5S-1 (Time: 13:50 - 14:15)
Title(Invited Paper) Efficient Reliability Management in SoCs – An Approximate DRAM Perspective
AuthorMatthias Jung, Deepak M. Mathew, Christian Weis, *Norbert Wehn (University of Kaiserslautern, Germany)
Pagepp. 390 - 394
KeywordApproximate, DRAM, Reliability, Refresh, Memory
AbstractIn today's computing systems Dynamic Random Access Memories (DRAMs) have a large influence on performance and contribute significantly to the total power consumption. Thus, recent research activities bring the idea of approximate DRAM into focus to save power and improve performance by lowering the refresh rate or disabling refresh completely. Hence, fast and accurate models are required for a thoroughly exploration of approximate DRAM for error resilient applications. In this paper we present a holistic simulation environment for investigations on approximate DRAM and show the impact on error resilient applications.

5S-2 (Time: 14:15 - 14:40)
Title(Invited Paper) Cross-layer Virtual/Physical Sensing and Actuation for Resilient Heterogeneous Many-core SoCs
Author*Santanu Sarma, Tiago Mück, Majid Shoushtari, Abbas BanaiyanMofrad, Nikil Dutt (UC Irvine, U.S.A.)
Pagepp. 395 - 402
Keywordcross-layer, virtual sensor, SoC, CPSoC, MPSoC
AbstractWe introduce the concepts of cross-layer virtual/physical sensing and actuation to achieve resiliency for the emerging class of heterogeneous many-core Systems-on-Chip (SoCs). Using the CyberPhysical System-on-Chip (CPSoC) concept as an exemplar sensor-rich many-core heterogeneous computing platform, we illustrate how to intrinsically couple on-chip and cross-layer physical and virtual sensing and actuation applied across different layers of the hardware/software system stack to adaptively achieve desired objectives and Quality-of-Service (QoS). We present two sample use cases that exemplify the cross-layer virtual/physical sensing and actuation approach. First, we present SmartBalance, a cross-layer sensing-driven Linux load balancer for energy efficient task execution on heterogeneous MPSoCs. Second, we present “Partially Forgetful Memories”, a software/hardware approach that achieves dynamic memory guard-banding for memory resilience and its application for approximate computing.

5S-3 (Time: 14:40 - 15:05)
Title(Invited Paper) On-chip Monitoring and Compensation Scheme with Fine-grain Body Biasing for Robust and Energy-Efficient Operations
AuthorA.K.M. Mahfuzul Islam (University of Tokyo, Japan), *Hidetoshi Onodera (Kyoto University, Japan)
Pagepp. 403 - 409
KeywordEnergy Optimization, Compensation, Monitor, Body Biasing, Scaling
AbstractAggressive technology scaling and strong demand for lowering supply voltage impose a serious challenge in achieving robust and energy-efficient circuit operation. This paper first overviews on device-circuit interactions to enable cross-layer resiliency, and energy optimization. We show that the ability to monitor and control device and circuit characteristics not only in- crease energy-efficiency by more than 20% but also relax the severe design constraints, which were required because of the uncertainties of variability. We then demonstrate two proof-of-concept circuits in a 65 nm process to show variability resiliency and energy optimization with local body biasing.

5S-4 (Time: 15:05 - 15:30)
Title(Invited Paper) Embedded Software Reliability Testing by Unit-Level Fault Injection
AuthorPetra R. Maier, Daniel Mueller-Gritschneder, Ulf Schlichtmann (TU Munich, Germany), *Veit B. Kleeberger (Infineon Technologies, Germany)
Pagepp. 410 - 416
KeywordReliability, Embedded Software, ISO 26262
AbstractDecreasing device sizes in integrated circuits lead to increasing vulnerability of hardware to errors resulting from radiation, crosstalk or power-supply disturbances. Especially in the automotive domain many tasks of electronics are safety relevant, so that solid error detection and correction is imperative. However, completely safe hardware is too expensive for the cost sensitive automotive market. Hence, software safety mechanisms must deal with errors originating from hardware to ensure safe system behavior. To verify safe system behavior under the influence of hardware errors, fault injection is currently done at integration level, but software redesign at this design stage should be avoided due to high costs. To early detect code vulnerable to hardware errors, we propose fault injection at unit level. Thanks to short simulation scenarios and good parallelization capability, even exhaustive fault injection is possible for multiple representative workloads. Using the results from the fault-injection campaigns, the software designer is able to consider reliability during the implementation phase and avoid costly redesigns.


Session 5A  (Special Session) Design Automation of Energy-Efficient Smart Buildings and Smart Cars
Time: 13:50 - 15:55 Wednesday, January 27, 2016
Location: TF4203
Organizer: Naehyuck Chang (KAIST, Republic of Korea), Chair: Tohru Ishihara (Kyoto University, Japan)

5A-1 (Time: 13:50 - 14:15)
Title(Invited Paper) Thermal Modeling for Energy-Efficient Smart Building With Advanced Overfitting Mitigation Technique
AuthorWandi Liu, Hai Wang (University of Electronic Science and Technology of China, China), Hengyang Zhao, Shujuan Wang (University of California at Riverside, U.S.A.), Haibao Chen, Yuzhuo Fu (Shanghai Jiaotong University, China), Jian Ma (University of Electronic Science and Technology of China, China), Xin Li (Carnegie Mellon University, U.S.A.), *Sheldon X.-D. Tan (University of California at Riverside, U.S.A.)
Pagepp. 417 - 422
KeywordThermal Modeling, Smart Building
AbstractBuilding energy accounts large amount of the total energy consumption, and smart building energy control leads to high energy efficiency and significant energy savings. A compact and accurate building thermal model is important for designing the efficient energy control system. In this paper, we propose an accurate thermal behavior modeling technique for general and complicated buildings. This new modeling technique builds compact thermal model by system identification using temperature and power data obtained from EnergyPlus software, which can provide realistic temperature, weather and power data for buildings. In order to make the best use of data from EnergyPlus and avoid the overfitting problem associated with the system identificatoin method, a cross-validation technique is employed to generate multiple thermal models to find the optimal model order. The final model is then generated by performing a regular system identification using the previously selected order. Experimental results from a case study of a 5-zone building have shown that the proposed method is able to find the optimal model order, and the building models built by the proposed method can achieve 1-3% average errors and less than 10-18% maximum errors for the estimation of zone temperatures for about a one year period.

5A-2 (Time: 14:15 - 14:40)
Title(Invited Paper) Modeling, Analysis, and Optimization of Electric Vehicle HVAC Systems
Author*Mohammad Abdullah Al Faruque, Korosh Vatanparvar (UC Irvine, U.S.A.)
Pagepp. 423 - 428
KeywordElectric Vehicle, Battery, HVAC, Climate Control
AbstractMajor challenges of driving range and battery lifetime in Electric Vehicles (EV) have been addressed by designing more efficient power electronics, advanced embedded hardware, and sophisticated embedded software. Besides the electric motor in EVs, Heating, Ventilation, and Air Conditioning (HVAC) has been seen as a significant contributor to the EV power consumption. The main responsibility of automotive climate controls has been to control the HVAC system in order to maintain the passengers’ thermal comfort. However, the HVAC power consumption and its dynamic behavior may influence the battery lifetime and driving range significantly. Therefore, modeling and analyzing the HVAC system and its thermodynamic behavior may benefit the control designers to integrate the HVAC control and optimization into Battery Management Systems (BMS) for better battery lifetime and driving range. In this paper, the EV architecture, HVAC system dynamic behavior, and battery characteristics are explained and modeled. Automotive climate controls (e.g. battery lifetime-aware automotive climate control) and the benefits gained by system modeling and estimation for different conditions in terms of battery lifetime and driving range are illustrated. Moreover, present and future challenges regarding the HVAC system and control design are explained.

5A-3 (Time: 14:40 - 15:05)
Title(Invited Paper) Distributed Reconfigurable Battery System Management Architectures
Author*Sebastian Steinhorst (TUM CREATE Ltd., Singapore), Zili Shao (The Hong Kong Polytechnic University, Hong Kong), Samarjit Chakraborty (TU Munich, Germany), Matthias Kauer (TUM CREATE Ltd., Singapore), Shuai Li (The Hong Kong Polytechnic University, Hong Kong), Martin Lukasiewycz, Swaminathan Narayanaswamy (TUM CREATE Ltd., Singapore), Muhammad Usman Rafique, Qixin Wang (The Hong Kong Polytechnic University, Hong Kong)
Pagepp. 429 - 434
KeywordBattery System Management Architectures (BSMAs), Lithium-Ion Batteries, Smart Cells, Battery Management, Reconfigurability
AbstractThis paper presents an overview of recent trends in Battery System Management Architectures (BSMAs). After introducing the main characteristics of large battery packs, the state of the art in BSMAs is discussed. Two emerging concepts are in the focus of this contribution. On the one hand, there is a development from centralized battery management architectures with a single control entity towards decentralized management where the computational resources are distributed across the battery pack and, hence, move closer to the individual battery cells. This enables a more scalable and modular battery system architecture, while, at the same time, posing challenges regarding hardware and management algorithm design. On the other hand, the static setup of the series- and parallel-connected cells forming the battery pack may be developed towards a reconfigurable architecture such that the electrical topology of the pack can be adaptively changed. Such reconfigurability could increase the reliability of battery packs and reduce management efforts such as cell balancing. At the same time, limited energy efficiency of the additional hardware poses a challenge. We give an outlook how these two trends could be combined into distributed reconfigurable BSMAs. This introduces a set of challenges which have to be solved in order to benefit from the increased scalability, reliability and safety such designs could offer.

5A-4 (Time: 15:05 - 15:30)
Title(Invited Paper) Minimum-Energy Driving Speed Profiles for Low-Speed Electric Vehicles
AuthorDonkyu Baek, Joonki Hong, *Naehyuck Chang (KAIST, Republic of Korea)
Pagep. 435
KeywordDriving optimization, Electric vehicles, Speed profile
AbstractElectric vehicles (EV) are rapidly invading the previous internal combustion engine vehicle (ICEV) market introducing not only environmental friendliness and a higher efficiency but a better ride quality, comfortness and performance. However, there still remain factors that the EV cannot reach the territory of ICEV such as a limited fully charged driving range per vehicle cost due to a low energy density of batteries compared with petroleum fuel. We formulate an optimization problem that minimizes the total energy consumption for a given route that consists of arbitrary slope variations.


Session 5B  Advanced Embedded Software Techniques: Sensing, Computation, and Storage
Time: 13:50 - 15:55 Wednesday, January 27, 2016
Location: TF4304
Chairs: Zili Shao (Hong Kong Polytechnic University, China), Duo Liu (Chongqing University, China)

5B-1 (Time: 13:50 - 14:15)
TitleMulti-version Checkpointing for Flash File Systems
Author*Shih-Chun Chou (Department of Computer Science and Information Engineering, National Taiwan University, Taiwan), Yuan-Hao Chang, Yuan-Hung Kuan (Institute of Information Science, Academia Sinica, Taiwan), Po-Chun Huang (Department of Computer Science and Engineering, Yuan Ze University, Taiwan), Che-Wei Tsao (Department of Computer Science and Information Engineering, National Taiwan University, Taiwan)
Pagepp. 436 - 443
Keywordfile system, multi-version checkpointing, reliability
AbstractReliability has become a critical design issue in flash storage systems, because of the adoption of the low-cost, high-error-rate flash chips to fulfill the needs of the fast-growing storage capacity. In this paper, a multi-version checkpointing strategy is proposed to resolve the reliability issue of flash storage systems from the perspective of flash file systems. The proposed strategy can efficiently and effectively utilize checkpoints of file systems to guarantee the integrity and consistency of flash file systems after files or flash pages are corrupted. By utilizing the coexistence fact of multiple versions of the same data in flash memory, a control/recovery mechanism is presented to maintain checkpoints and to recover file systems with minimized management and recovery time overheads. A series of experiments was conducted based on realistic traces that were collected from benchmarks running over flash file systems in Linux operating systems. The results illustrate that the proposed strategy can significantly improve the reliability of flash file systems, as compared with other existing designs.

5B-2 (Time: 14:15 - 14:40)
TitleRelay-based Key Management to Support Secure Deletion for Resource-Constrained Flash-Memory Storage Devices
AuthorWei-Lin Wang (Department of Computer Science, National Tsing Hua University, Taiwan), Yuan-Hao Chang (Institute of Information Science, Academia Sinica, Taiwan), *Po-Chun Huang (Department of Computer Science and Engineering, Yuan Ze University, Taiwan), Chia-Heng Tu (Smart Network System Institute, Institute for Information Industry, Taiwan), Hsin-Wen Wei (Department of Electrical Engineering, Tamkang University, Taiwan), Wei-Kuan Shih (Department of Computer Science, National Tsing Hua University, Taiwan)
Pagepp. 444 - 449
Keywordreliability, flash memory, key management, secure deletion
AbstractThe support of secure deletion on formatting a file system is to make sure that when a file system is formatted, there is no way to get any file content back again. Due to the fast-growing storage capacity, the performance of secure deletion to file systems on resource-constrained flash storage devices has become a critical issue. In contrast to the existing works that take a long time on overwriting/resetting all the file contents of a file system, we propose an efficient secure deletion scheme to securely delete all the contents of a file system without rewriting file contents. Thus, secure deletion to file systems can be efficiently achieved and can be independent of the device capacity and file systems. A series of experiments was conducted with realistic workloads to evaluate the capability of the proposed scheme. The results show that the proposed scheme achieves secure deletion with limited performance overheads in most cases.

5B-3 (Time: 14:40 - 15:05)
TitlePeak-to-average Pumping Efficiency Improvement for Charge Pump in Phase Change Memories
AuthorHuizhang Luo (Chongqing University, China), Jingtong Hu (Oklahoma State University, U.S.A.), *Liang Shi (Chongqing University, China), Chun Jason Xue (City University of Hong Kong, Hong Kong), Qingfeng Zhuge (Chongqing University, China)
Pagepp. 450 - 455
KeywordPCM, Charge Pump
AbstractThe pumping efficiency of a PCM chip is a concave function of the write current. Based on the characteristics of the concave function, the overall pumping efficiency can be improved if the write current is uniform. In this paper, we propose the peak-to-average (PTA) write scheme, which smooths the write current fluctuation by regrouping write units. An off-line optimal Integer Programming (IP) formulation and an efficient online algorithm are proposed to achieve this goal. Experimental results show that PTA can improve the charge pump efficiency greatly with little overhead.

5B-4 (Time: 15:05 - 15:30)
TitleExploiting Parallelism of Imperfect Nested Loops with Sibling Inner Loops on Coarse-Grained Reconfigurable Architectures
Author*Xinhan Lin, Shouyi Yin, Leibo Liu, Shaojun Wei (Tsinghua University, China)
Pagepp. 456 - 461
KeywordCGRA, software pipelining, imperfect nested loop, outer-level pipelining, kernel compression
AbstractCoarse-grained reconfigurable architecture (CGRA) is a promising platform for loop acceleration, but existing software pipelining methods cannot achieve satisfactory performance on a fair number of imperfect nested loops, especially those with sibling inner loops. To tackle this problem, this paper makes 2 contributions: 1) a 2-level pipelining method with an effective II optimization strategy for the imperfect loops with sibling inner loops; 2) a novel kernel compression method to reduce oversize kernel. Experiment results show that our approach can achieve much higher performance than the state-of-the-art approaches at acceptable costs.

5B-5 (Time: 15:30 - 15:55)
TitleSlowMo – Enhancing Mobile Gesture-Based Authentication Schemes via Sampling Rate Optimization
Author*Kent W. Nixon, Xiang Chen, Zhi-Hong Mao, Yiran Chen (University of Pittsburgh, U.S.A.)
Pagepp. 462 - 467
Keywordgesture, security, sample, rate
AbstractIn the era of network service, the user authentication become more indispensable but also vulnerable. Traditional user verification approaches such as PIN or pattern lock suffer from easy hacking and replica. A promising approach to continuous user verification on mobile is gesture-based security, Compare to the traditional authentications, the gesture-based security utilize the user interacts with the device as a dynamic authentication pattern in real-time. It has high complexity and better reliability. But it still lack sufficient research on data sampling and preprocessing techniques on classification accuracy. In this work, we develop SlowMo, a novel gesture security technique, and utilize it for user classification in low sampling-rate environments. The proposed algorithm provides maximum classification accuracy at a sampling rate of 4Hz with extreme low power consumption suggesting a more capable adaptation to the security environment.


Session 5C  Advances in Logic Synthesis
Time: 13:50 - 15:55 Wednesday, January 27, 2016
Location: TF4204
Chairs: Benjamin Carrion Schafer (The Hong Kong Polytechnic University, Hong Kong), Kai-Chiang Wu (National Chiao Tung University, Taiwan)

5C-1 (Time: 13:50 - 14:15)
TitleLattice-Based Boolean Diagrams: Canonical, Order-Independent Graphical Representations of Boolean Functions
AuthorAhmed Nassar, *Fadi J. Kurdahi (University of California, Irvine, U.S.A.)
Pagepp. 468 - 473
KeywordBoolean functions, Graph representations, decision diagrams
AbstractThis paper presents lattice-based Boolean diagrams (LBBDs), a graphical representation of Boolean functions that is not derived from binary decision diagrams (BDDs), as well as symbolic manipulation algorithms. It also identifies a class of Boolean functions where LBBDs are demonstrably more efficient to construct, and reason with, when compared to BDDs. The case studies include ITC99 and MCNC benchmarks, randomly generated cube covers or sum-of-products (SOP) formulas as well as multi-level Boolean formulas. Finally, LBBDs proved to be instrumental to the efficient runtime verification of software over distributed multiprocessor systems.

5C-2 (Time: 14:15 - 14:40)
TitleBDD Minimization for Approximate Computing
Author*Mathias Soeken, Daniel Große, Arun Chandrasekharan, Rolf Drechsler (University of Bremen, Germany)
Pagepp. 474 - 479
KeywordBDDs, Approximate computing, Algorithms, Optimization
AbstractWe present Approximate BDD Minimization (ABM) as a problem that has application in approximate computing. Given a BDD representation of a multi-output Boolean function, ABM asks whether there exists another function that has a smaller BDD representation but meets a threshold with respect to an error metric. We present operators to derive approximated functions and present algorithms to exactly compute the error metrics directly on the BDD representation. An experimental evaluation demonstrates the applicability of the proposed approaches.

5C-3 (Time: 14:40 - 15:05)
TitleMajorSat: A SAT Solver to Majority Logic
AuthorYu-Min Chou (National Tsing Hua University, Taiwan), Yung-Chih Chen (Yuan Ze University, Taiwan), Chun-Yao Wang, *Ching-Yi Huang (National Tsing Hua University, Taiwan)
Pagepp. 480 - 485
KeywordSatisfiability, Majority
AbstractA majority function can be represented as sum-of-product (SOP) form or product-of-sum (POS) form. However, a Boolean expression including majority functions could be more compact compared to SOP or POS forms. Hence, majority logic provides a new viewpoint for manipulating the Boolean logic. Recently, majority logic attracts more attentions than before and some synthesis algorithms and axiomatic system for majority logic have been proposed. On the other hand, solvers for satisfiability (SAT) problem have a tremendous progress in the past decades. The format of instances for the SAT solvers is the Conjunctive Normal Form (CNF). For the instances that are not expressed as CNF, we have to transform them into CNF before running the SAT-solving process. However, for the instances including majority functions, this transformation might be not scalable and time-consuming due to the exponential growth in the number of clauses in the resultant CNF. As a result, this paper presents a new SAT solver—MajorSat, which is for solving a SAT instance containing majority functions without any transformation. Some techniques for speeding up the solver are also proposed. Besides, we also propose a transformation method that can generate the characteristic function of a majority logic gate. The experimental results show that the MajorSat solver can efficiently solve random instances containing majority functions that CNF SAT solvers, like MiniSat or Lingeling, cannot.

5C-4 (Time: 15:05 - 15:30)
TitleFast Synthesis of Threshold Logic Networks with Optimization
Author*Yung-Chih Chen, Runyi Wang, Yan-Ping Chang (Yuan Ze University, Taiwan)
Pagepp. 486 - 491
KeywordThreshold logic, logic synthesis, logic optimization
AbstractThreshold logic, a more compact Boolean representation compared to conventional logic gate representation, re-attracted substantial attention from researchers due to the advances of threshold logic implementations with novel nanoscale devices. For the compact representation to be promising, a fast and effective method for transforming a conventional Boolean logic network into a threshold logic network is necessary. This paper presents such a synthesis method for threshold logic based on logic optimization. First, a Boolean logic network is mapped into a threshold logic network by one-to-one mapping. Then, a method is used to optimize the threshold logic network based on eight transformations for reducing gate count. Unlike the previous methods, the proposed method does not require threshold function identification, and thus is much more efficient. The experimental results show that the proposed method is three orders of magnitude faster than a widely used synthesis method. Additionally, the proposed method has a better synthesis quality with an average saving of 28% threshold gates.

5C-5 (Time: 15:30 - 15:55)
TitlePolysynchronous Stochastic Circuits
Author*M. Hassan Najafi, David J. Lilja, Marc Riedel, Kia Bazargan (University of Minnesota, U.S.A.)
Pagepp. 492 - 498
KeywordPolysynchronous circuits, asynchronous circuits, multi-clock circuits, stochastic computing, clock distribution network
AbstractClock distribution networks (CDNs) are costly in high-performance ASICs. This paper proposes a new approach: splitting clock domains at a very fine level, down to the level of a handful of gates. Each domain is synchronized with an inexpensive clock signal, generated locally. This is possible by adopting the paradigm of stochastic computation, where signal values are encoded as random bit streams. The design method is illustrated with the synthesis of circuits for applications in signal and image processing.



Thursday, January 28, 2016

Session 3K  Keynote III
Time: 8:30 - 10:00 Thursday, January 28, 2016
Location: TF Theatre
Chair: David Pan (University of Texas at Austin, U.S.A.)

3K-1 (Time: 8:30 - 9:00)
Title(Keynote Address) Majority-based Synthesis for Nanotechnologies
AuthorLuca Amaru, Pierre-Emmanuel Gaillardon, *Giovanni De Micheli (Integrated Systems Laboratory, EPFL, Switzerland)
Pagepp. 499 - 502
KeywordLogic Synthesis, Majority Logic, Nanotechnology
AbstractWe study the logic synthesis of emerging nanotech- nologies whose elementary devices abstraction is a majority voter. We argue that synthesis tools, natively supporting the majority logic abstraction, are the technology enablers. This is because they allow designers to validate majority-based nanotechnologies on large-scale benchmarks. We describe models and data- structures for logic design with majority-based nanotechnologies and we show results of applying new synthesis algorithms and tools. We conclude that new logic synthesis methods are required to achieve a fair assessment on emerging nanotechnologies.

3K-2 (Time: 9:00 - 9:30)
Title(Keynote Address) A Scalable Communication-Aware Compilation Flow for Programmable Accelerators
Author*Jason Cong, Hui Huang, Mohammad Ali Ghodrat (UCLA, U.S.A.)
Pagepp. 503 - 510
KeywordKeynote
AbstractProgrammable accelerators (PA) are receiving increased attention in domain-specific architecture designs to provide more general support for customization. In a PA-rich system, computational kernels are compiled into predefined PA templates and dynamically mapped to real PAs at runtime. This imposes a demanding challenge on the compiler side – that is, how to generate high-quality PA mapping code. Another important concern is the communication cost among PAs: if not handled properly at compile time, data transfers among tens or hundreds of accelerators in a PA-rich system will limit the overall performance gain. In this paper we present an efficient PA compilation flow, which is scalable for mapping large computation kernels into PA-rich architectures. Communication overhead is modeled and optimized in the proposed flow to reduce runtime data transfers among accelerators. Experimental results show that for 12 computation-intensive standard benchmarks, the proposed approach significantly improves compilation calability, mapping quality and overall communication cost compared to state-of-art PA compilation approaches. We also evaluate the proposed flow on a recently developed PA-rich platform [1]; the final performance gain is improved by 49.5% on average.

3K-3 (Time: 9:30 - 10:00)
Title(Keynote Address) Software and System Co-optimization in the era of Heterogeneous Computing
Author*Michael Gschwind (IBM Thomas J Watson Research Center, U.S.A.)
KeywordKeynote
AbstractEscalating costs of semiconductor technology and its lagging performance relative to historic trends is motivating acceleration and specialization as more impactful means to increase system value. Targeted specialization is being increasingly pursued as an important way to achieve dramatic improvements in workload acceleration. This requires a broad understanding of workloads, system structures, and algorithms to determine what to accelerate / specialize, and how, i.e., via SW?; via HW?; or via SW+HW? which presents many choices, necessitating co-optimization of SW and HW. In this talk, we will focus on an application driven approach to software and system co-optimization, based on inventing new software algorithms, that have strong affinity to hardware acceleration. A High Level design methodology that is needed to enable targeted specialization in hardware will also be described.


Session 6S  (Special Session) Cyber-Physical Systems and Security
Time: 10:20 - 12:00 Thursday, January 28, 2016
Location: TF4303
Organizer/Chair: Jeyavijayan Rajendran (University of Texas at Dallas, U.S.A.), Farinaz Koushanfar (Rice University, U.S.A.)

6S-1 (Time: 10:20 - 10:45)
Title(Invited Paper) Enabling Multi-Layer Cyber-Security Assessment of Industrial Control Systems through Hardware-in-the-Loop Testbeds
AuthorAnastasis Keliris, Charalambos Konstantinou, Nektarios Georgios Tsoutsos (New York University, U.S.A.), Raghad Baiad, *Michail Maniatakos (New York University Abu Dhabi, United Arab Emirates)
Pagepp. 511 - 518
Keywordsecurity, industrial control systems, testbed, firmware
AbstractIndustrial Control Systems (ICS) are under modernization towards increasing efficiency, reliability, and controllability. Despite the numerous benefits of interconnecting ICS components, the wide adoption of Information Technologies (IT) has introduced new security challenges and vulnerabilities to industrial processes, previously obscured by the systems' custom designs. Towards securing the backbone of critical infrastructure, selection of the proper assessment environment for performing cyber-security assessments is crucial. In this paper, we present a layered analysis of vulnerabilities and threats in ICS components, that identifies the need for including real hardware components in the assessment environment. Moreover, we advocate the suitability of Hardware-In-The-Loop testbeds for ICS cyber-security assessment and present their advantages over other assessment environments.

6S-2 (Time: 10:45 - 11:10)
Title(Invited Paper) Security Analysis on Consumer and Industrial IoT Devices
AuthorJacob Wurm, Khoa Hoang, Orlando Arias (University of Central Florida, U.S.A.), Ahmad-Reza Sadeghi (TU Darmstadt, Germany), *Yier Jin (University of Central Florida, U.S.A.)
Pagepp. 519 - 524
KeywordIoT Security, Hardware Security, IoT Devices
AbstractThe fast development of Internet of Things (IoT) and cyber-physical systems (CPS) has triggered a large demand of smart devices which are loaded with sensors collecting information from their surroundings, processing it and relaying it to remote locations for further analysis. The wide deployment of IoT devices and the pressure of time to market of device development have raised security and privacy concerns. In order to help better understand the security vulnerabilities of existing IoT devices and promote the development of low-cost IoT security methods, in this paper, we use both commercial and industrial IoT devices as examples from which the security of hardware, software, and networks are analyzed and backdoors are identified. A detailed security analysis procedure will be elaborated on a home automation system and a smart meter proving that security vulnerabilities are a common problem for most devices. Security solutions and mitigation methods will also be discussed to help IoT manufacturers secure their products.

6S-3 (Time: 11:10 - 11:35)
Title(Invited Paper) Covert Channels Using Mobile Device’s Magnetic Field Sensors
AuthorNikolay Matyunin (Technische Universität Darmstadt, Germany), *Jakub Szefer (Yale University, U.S.A.), Sebastian Biedermann, Stefan Katzenbeisser (Technische Universität Darmstadt, Germany)
Pagepp. 525 - 532
KeywordHardware Security, Side-Channel, Covert-Channel, Magnetic
AbstractThis paper presents a new covert channel using smartphone magnetic sensors. We show that modern smartphones are capable to detect the magnetic field changes induced by different computer components during I/O operations. In particular, we are able to create a covert channel between a laptop and a mobile device without any additional equipment, firmware modifications or privileged access on either of the devices. We present two encoding schemes for the covert channel communication and evaluate their effectiveness.

6S-4 (Time: 11:35 - 12:00)
Title(Invited Paper) Multi-valued Arbiters for Quality Enhancement of PUF Responses on FPGA Implementation
Author*Siarhei S. Zalivaka (Nanyang Technological University, Singapore), Alexander V. Puchkov, Vladimir P. Klybik, Alexander A. Ivaniuk (Belarusian State University of Informatics and Radioelectronics, Belarus), Chip-Hong Chang (Nanyang Technological University, Singapore)
Pagepp. 533 - 538
KeywordPhysical Unclonable Function, Arbiter, Hardware security
AbstractOne main problem encountered in the FPGA implementation of Arbiter based Physical Unclonable Function (A-PUF) is the response instability caused by the metastability of delay flip-flop. This paper presents a new multi-arbiter approach to extract more entropy to extend the number of response bits to a single challenge. New multi-arbiter schemes based on the insertion of either a four-flip-flop arbiter or SR latch arbiter after each pair of multiplexers in the configurable paths are proposed to detect the metastable state when two copies of test pulse arrive at the arbiter inputs almost simultaneously. The detected metastable states are distinguishable by the encoded multiple valued outputs of the arbiter. The codes corresponding to the metastable states collectively form a deterministic ternary state that can be recoded to one of the stable states to improve the uniqueness and reliability of the PUF. Our analysis shows that the proposed design can generate robust and reliable challenge-response pairs with a uniqueness of 0.4982 and a reliability of 0.9985 at the expense of a relatively small FPGA resource overhead.


Session 6A  Testing, Modeling and Optimization Techniques for Analog Circuits
Time: 10:20 - 12:00 Thursday, January 28, 2016
Location: TF4203
Chairs: Sheldon Tan (University of California, Riverside, U.S.A.), Mark Po-Hung LIN (National Chung Cheng University, Taiwan)

6A-1 (Time: 10:20 - 10:45)
TitleEvery Test Makes a Difference: Compressing Analog Tests to Decrease Production Costs
AuthorSeyed Nematollah Ahmadyan (University of Illinois at Urbana-Champaign, U.S.A.), Suriyaprakash Natarajan (Intel, U.S.A.), *Shobha Vasudevan (University of Illinois at Urbana-Champaign, U.S.A.)
Pagepp. 539 - 544
KeywordStress Test, Compression, Random tree, Optimization
AbstractWe introduce a methodology for automated test compression during electrical stress testing of analog and mixed signal circuits. This methodology optimally extracts only portions of a functional test that electrically stress the nets and devices of an analog circuit. We model test compression as a problem of optimizing functional of the transient response. We present a random tree based approach to find optimal solutions for these computationally hard integrals. We demonstrate with an op-amp, VCO and CMOS inverter that the method consistently reduces the length of each test by an average of 93%.

6A-2 (Time: 10:45 - 11:10)
TitleRe-thinking Polynomial Optimization: Efficient Programming of Reconfigurable Radio Frequency (RF) Systems by Convexification
AuthorFa Wang, Shihui Yin, Minhee Jun, *Xin Li, Tamal Mukherjee, Rohit Negi, Larry Pileggi (Carnegie Mellon University, U.S.A.)
Pagepp. 545 - 550
KeywordPolynomial Optimization, Sequential Semidefinite Programming
AbstractReconfigurable radio frequency (RF) system has emerged as a promising avenue to achieve high communication performance while adapting to versatile commercial wireless environment. In this paper, we propose a novel technique to optimally program a reconfigurable RF system in order to achieve maximum performance and/or minimum power. Our key idea is to adopt an equation-based optimization method that relies on general purpose, non-convex polynomial performance models to determine the optimal configurations of all tunable circuit blocks. Most importantly, our proposed approach guarantees to find the globally optimal solution of the non-convex polynomial programming problem by solving a sequence of convex semidefinite programming (SDP) problems based on convexification. A reconfigurable RF front-end example designed for WLAN 802.11g demonstrates that the proposed method successfully finds the globally optimal configuration, while other traditional techniques often converge to local optima.

6A-3 (Time: 11:10 - 11:35)
TitleAn Efficient Trajectory-based Algorithm for Model Order Reduction of Nonlinear Systems via Localized Projection and Global Interpolation
AuthorChenjie Yang, *Fan Yang, Xuan Zeng (Fudan University, China), Dian Zhou (Fudan University, University of Texas at Dallas, China)
Pagepp. 551 - 556
KeywordTrajectory, Model Order Reduction
AbstractIn this paper, we propose a new, efficient trajectory- based model order reduction algorithm for nonlinear systems via localized projection and global interpolation. We employ an efficient procedure to transform the smaller localized reduced-order models into a set of equivalent reduced-order models with consistent global coordinate. The reduced-orders for the nonlinear systems are then obtained by globally interpolating the much smaller localized reduced-order models.

6A-4 (Time: 11:35 - 12:00)
TitleSTORM: A Nonlinear Model Order Reduction Method via Symmetric Tensor Decomposition
AuthorJian Deng, Haotian Liu, Kim Batselier, Yu-Kwong Kwok, *Ngai Wong (The University of Hong Kong, Hong Kong)
Pagepp. 557 - 562
Keywordcircuit modeling, nonlinear circuit, model order reduction, symmetric tensor decomposition, polynomial system
AbstractIn this paper, a novel symmetric tensor-based order-reduction method (STORM) is presented for simulating large-scale nonlinear systems. Compared to the recent tensor-based nonlinear model order reduction (TNMOR) algorithm, STORM shows advantages in two aspects. First, STORM avoids the assumption of the existence of a low-rank tensor approximation. Second, with the use of the symmetric tensor decomposition, STORM allows significantly faster computation and less storage complexity than TNMOR. Numerical experiments demonstrate the superior computational efficiency and accuracy of STORM.


Session 6B  Energy-Efficient & Customized Computing
Time: 10:20 - 12:00 Thursday, January 28, 2016
Location: TF4304
Chairs: Weichen Liu (Chongqing University, China), Yaoyao Ye (Dept. of Micro/Nano-Electronics, Shanghai Jiao Tong University, China)

6B-1 (Time: 10:20 - 10:45)
TitleFootfall – GPS Polling Scheduler for Power Saving on Wearable Devices
Author*Kent W. Nixon, Xiang Chen, Yiran Chen (University of Pittsburgh, U.S.A.)
Pagepp. 563 - 568
KeywordGPS, scheduler, map-matching, wearable
AbstractWrist-worn wearable fitness devices, such as FitBit and Apple Watch, have become popular in recent years. Runners can use the GPS embedded in these wearable devices to log the route taken during their exercise, providing vital feedback on pace and distance traveled. Unfortunately, continuous polling for GPS data results in a significant adverse impact on device battery life, e.g., many flagship wearables need to be charged for as frequently as every two days or even less. In this work, we propose Footfall – an intelligent GPS scheduler that can utilize data from alternative sensors on a device to greatly reduce GPS utilization while still maintaining minimum location accuracy. Compared to existing implementations, Footfall system can achieve on average a 75% reduction in total power consumption, while only inducing a 5% discrepancy in location accuracy, which is sufficient for the tar-geted applications.

6B-2 (Time: 10:45 - 11:10)
TitleCP-FPGA: Computation Data-Aware Software/Hardware Co-design for Nonvolatile FPGAs based on Checkpointing Techniques
Author*Zhe Yuan, Yongpan Liu, Hehe Li, Huazhong Yang (Tsinghua University, China)
Pagepp. 569 - 574
KeywordFPGA, nonvolatile, checkpoint, IOT
AbstractWith the booming trend of internet of things (IoT), reconfigurable devices, such as FPGAs, have drawn lots of attentions due to their flexible and high-performance capability. However, commercial FPGAs suffer from high leakage power consumption, which makes zero-leakage nonvolatile FPGA (nvFPGA) promising. This paper proposes a hardware/software co-design based nvFPGA with efficient checkpointing strategy. With nonvolatile checkpointing BRAM (CBRAM), it maintains both computation data as well as configuration when power-off to avoid expensive rollbacks due to data loss. A checkpointing location-aware technique is used to balance computation rollback overheads and backup energy. Experimental results show that the proposed checkpointing strategy can reduce 45.8% backup data of nvFPGA when system-level power gating happens.

6B-3 (Time: 11:10 - 11:35)
TitleDesign Space Exploration of FPGA-Based Deep Convolutional Neural Networks
AuthorMohammad Motamedi, *Philipp Gysel, Venkatesh Akella, Soheil Ghiasi (University of California, Davis, U.S.A.)
Pagepp. 575 - 580
KeywordDCNN, Accelerator, Design Space Exploration, Deep Convolutional Neural Network
AbstractDeep Convolutional Neural Networks (DCNNs) have proven to be very effective in many pattern recognition applications. To meet performance and energy-efficiency constraints, various hardware accelerators have been developed. In this paper, we propose an FPGA-based accelerator which can handle convolutional layers with large hyperparameters. We present a design space exploration algorithm to find the optimal architecture that leverages all sources of parallelism. To the best of our knowledge, we improve the state-of-art for AlexNet on a large FPGA by 1.9X.

6B-4 (Time: 11:35 - 12:00)
TitleLRADNN: High-Throughput and Energy-Efficient Deep Neural Network Accelerator using Low Rank Approximation
Author*Jingyang Zhu (Hong Kong University of Science and Technology, Hong Kong), Zhiliang Qian (Shanghai Jiao Tong University, China), Chi-Ying Tsui (Hong Kong University of Science and Technology, Hong Kong)
Pagepp. 581 - 586
KeywordDeep Neural Network, Low Rank Approximation, VLSI archiecture, Energy Efficiency
AbstractIn this work, we propose an energy-efficient hardware accelerator for Deep Neural Network (DNN) using Low Rank Approximation (LRADNN). Using this scheme, inactive neurons in each layer of the DNN are dynamically identified and the corresponding computations are then bypassed. Accordingly, both the memory accesses and the arithmetic operations associated with these inactive neurons can be saved. Therefore, compared to the architectures using the direct feed-forward algorithm, LRADNN can achieve a higher throughput as well as a lower energy consumption with negligible prediction accuracy loss (within 0.1%). We implement and synthesize the proposed accelerator using TSMC 65nm technology. From the experimental results, an energy reduction ranging from 31% to 53% together with an increase in the throughput from 22% to 43% can be achieved.


Session 6C  Design Methodologies for Microfluidic Biochips
Time: 10:20 - 12:00 Thursday, January 28, 2016
Location: TF4204
Chairs: Hailong Yao (Tsinghua University, China), Tohru Ishihara (Kyoto University, Japan)

6C-1 (Time: 10:20 - 10:45)
TitleSequence-Pair-Based Placement and Routing for Flow-Based Microfluidic Biochips
Author*Qin Wang, Yizhong Ru, Hailong Yao (Tsinghua University, China), Tsung-Yi Ho (National Tsing Hua University, Taiwan), Yici Cai (Tsinghua University, China)
Pagepp. 587 - 592
KeywordFlow-based microfluidic biochips, Sequence-pair-based placement, Flow-channel crossings avoidance, Placement adjustment
AbstractFlow-based microfluidic biochips are attracting increasing attention with successful applications in lab-on-a-chip experiments and point-of-care diagnosis. Physical design for flow-based biochips determines the number of flow-channel intersections, and thus affects the number of microvalves. As reducing microvalves will significantly improve the overall design quality and reliability, physical design is of great importance. Typically, physical design consists of two major stages, i.e., component placement and routing. Existing works follow the step-by-step scheme, which perform placement and routing separately. The lack of interactions between the two design stages results in degraded design with large number of unfavorable channel intersections and microvalves. This paper presents a novel placement and routing method based on the sequence-pair representation, which seamlessly integrates placement and routing stages and allows iterative placement adjustment upon routing feedbacks. Experimental results show that compared with the existing work, the proposed method obtains average 54.10% improvement in flow-channel crossings, 42.15% improvement in total chip area, and 23.43% improvement in total channel length.

6C-2 (Time: 10:45 - 11:10)
TitleCongestion- and Timing-Driven Droplet Routing for Pin-Constrained Paper-Based Microfluidic Biochips
Author*Jain-De Li, Sying-Jyan Wang (National Chung Hsing University, Taiwan), Katherine Shu-Min Li (National Sun Yat-sen University, Taiwan), Tsung-Yi Ho (National Tsing Hua University, Taiwan)
Pagepp. 593 - 598
KeywordPaper-based Digital Microfluidic (PB-DMF), Digital Microfluidic, Global Routing, Escape Routing
AbstractPaper-based microfluidic chips provide a novel way to carry out microfluidic analysis. Such chips achieve “lab-on-paper” instead of traditional “lab-on-chips”. The paper substrate is attractive because it is cost-effective, easy to use and disposable. The routing problem of paper-based digital microfluidic (PB-DMF) biochips is to realize bio-chemical operations on paper with inkjet printing techniques. In this paper, we propose a routing scheme targeting multiple preprogrammed droplet paths such that both routability and wire-length are optimized in a paper layer. Compared with previous digital microfluidic (DMF), the proposed paper-based DMF needs only one integrated paper layer instead of two layers of control and signal layers in the traditional DMF. Experimental results on a set of paper chip applications show the effectiveness, efficiency and scalability of the proposed algorithm.

6C-3 (Time: 11:10 - 11:35)
TitleChain-Based Pin Count Minimization for General-Purpose Digital Microfluidic Biochips
Author*Yung-Chun Lei, Chen-Shing Hsu, Juinn-Dar Huang (National Chiao Tung University, Taiwan), Jing-Yang Jou (National Central University, Taiwan)
Pagepp. 599 - 604
KeywordDigital microfluidic biochip, pin count minimization, Lab on a chip
AbstractMinimizing the number of external control pins is one of the most important optimization objectives in digital microfluidic biochip (DMFB) design especially as the chip size gets even bigger. So far, only few works focus on this issue for general-purpose DMFBs. In this paper, we present a pin count minimization algorithm based on sophisticated electrode chaining on regular or irregular electrode arrays. The key idea of the proposed method is that actuation information can be implied from previous neighborhood electrodes to later ones throughout the chain. Experimental results show that the pin count reduction can be near 50% in large DMFBs.

6C-4 (Time: 11:35 - 12:00)
TitleA Routability-Driven Flow Routing Algorithm for Programmable Microfluidic Devices
AuthorYi-Siang Su (National Taiwan University, Taiwan), *Tsung-Yi Ho (National Tsing Hua University, Taiwan), Der-Tsai Lee (National Taiwan University, Taiwan)
Pagepp. 605 - 610
KeywordBiochip, Microfluidics, Routing
AbstractBiochips that are made of Micro Electro Mechanical Systems (MEMS) are concerned by everyone in recent years. The advantages of biochips are high accuracy and fast reaction rate with only a small volume consumption of samples and reagents. Among various types of biochips, flow-based microfluidic biochips receive much attention recently, especially the programmable microfluidic device (PMD). PMDs are capable of performing multitude functions in one platform without requiring any hardware modifications. As the size of chips increase, ow routing becomes more complicated. Traditional method to manually control multiple flows is inefficient and it may not have feasible assay completion time. Fortunately, PMDs has high potential to route flows with pure software programs to overcome the drawbacks of traditional methods. However, naive software program that simply minimizing assay completion time may cause flow-congestion problems and unexpected mixing between different assays. To conduct a viable experiment, a feasible program should not only minimize assay completion time but also consider congestion problems and fluidic constraint. Therefore, we formulate the flow routing problem and propose a routability-driven flow routing algorithm which considers the fluidic constraint and minimizes the assay completion time on PMDs.


Session 7S  (Special Session) New Frontiers of Physical Design
Time: 13:50 - 15:30 Thursday, January 28, 2016
Location: TF4303
Organizer: Evangeline F.Y. Young (Chinese University of Hong Kong, Hong Kong), Chair: Bei Yu (Chinese University of Hong Kong, Hong Kong)

7S-1 (Time: 13:50 - 14:20)
Title(Invited Paper) Advanced Multi-Patterning and Hybrid Lithography Techniques
Author*Fedor G. Pikus, Andres Torres (Mentor Graphics, U.S.A.)
Pagepp. 611 - 616
KeywordDFM, MP, DSA, semiconductor, manufacturing
AbstractWe present an overview of several techniques that are used when the layout pitch and feature size become significantly smaller than the minimum resolution of the lithographic process. We consider several multi-patterning (MP) techniques, in which a single layer is decomposed into two or more masks and printed in multiple stages. We also introduce the direct self-assembly (DSA) technology, where features several times smaller than the minimum lithographic resolution form spontaneously due to self-assembling, or self-organizing, formation of block copolymers.

7S-2 (Time: 14:20 - 14:50)
Title(Invited Paper) Recent Research Development and New Challenges in Analog Layout Synthesis
Author*Mark Po-Hung Lin (National Chung Cheng University, Taiwan), Yao-Wen Chang (National Taiwan University, Taiwan), Chih-Ming Hung (MediaTek, Taiwan)
Pagepp. 617 - 622
KeywordAnalog layout, placement, routing, migration, knowledge mining
AbstractAnalog and mixed-signal integrated circuits play an important role in many modern emerging system-on-chip (SoC) design applications. With the expansion of the markets of those applications, the demands of analog/mixed-signal ICs have been dramatically increased. Although analog/mixed-signal ICs have gained more and more importance and demands in modern SoC applications, the development of analog electronic design automation (EDA) tools is still farther behind that of digital EDA tools. As a result, analog/mixed-signal IC design, especially the analog layout design, is still a manual, time-consuming, and error-prone task. In order to speedup modern SoC design for large varieties of emerging applications, it is required to develop novel analog/mixed-signal IC deign methodologies and algorithms, as well as new analog EDA tools. The purpose of this paper is to summarize recent research progress during the past decade, address new analog layout design challenges in advanced technology nodes, and facilitate more research activities in analog layout synthesis.

7S-3 (Time: 14:50 - 15:20)
Title(Invited Paper) To Detect, Locate, and Mask Hardware Trojans in Digital Circuits by Reverse Engineering and Functional ECO
Author*Xing Wei, Yi Diao, Yu-Liang Wu (Easy-Logic Technology Ltd., Hong Kong)
Pagepp. 623 - 630
KeywordHardware Trojan, Formal verification, Reverse Engineer, Functional ECO, Logic Rewiring
AbstractDuring the design phase, a specification may be tampered directly by dishonest engineers (or industry spy), or may be tampered indirectly through the use of malicious modules from a third party Intellectual Property (3PIP) block vendors. During integration and fabrication, the chips may also be tampered by the untrusted system integrator. Particularly for high-end commercial or classified military chips, Hardware Trojan (HT) Detect-Locate-and-Mask (DL&M) is crucially necessary so as to make sure a design is produced exactly as the original specification. Our objectives are (1) to detect any functionality difference which might be caused by bugs or HTs, (2) to locate/output the difference circuitry to correct the bugs or to investigate the tampering intention or purpose, and (3) to kill (mask) the HTs by restoring the chip’s functionality back to golden with a minimum circuitry change. Besides blocking the plotted damage in an early stage and pointing the spy source by revealing the HT intention, the masking circuit revision must also be minimized to avoid affecting the chip performance (timing) too much. In this paper, we propose a scheme that integrates reverse engineering, formal verification, functional ECO, and logic rewiring to detect, locate and mask Hardware Trojans with minimized cost. This formal verification based scheme can guarantee catching 100% of the hidden combinational circuit HTs and can handle multiple HTs (no number limit) automatically in one run. Some techniques within our scheme won the first places of the CAD Contests at ICCAD 2012, 2013, and 2014


Session 7A  System-Level Design for Energy-Efficiency and Reliability
Time: 13:50 - 15:30 Thursday, January 28, 2016
Location: TF4203
Chairs: Guihai Yan (Institute of Computing Technology, Chinese Academy of Sciences, China), Donghwa Shin (Department of Computer Engineering, Yeungnam University, Republic of Korea)

7A-1 (Time: 13:50 - 14:15)
TitleAging-aware High-level Physical Planning for Reconfigurable Systems
AuthorZana Ghaderi, *Eli Bozorgzadeh (University of California, Irvine, U.S.A.)
Pagepp. 631 - 636
KeywordAging mitigation, Reconfigurable systems, Physical planning, Performance degradation, Aging analysis
AbstractDue to advanced silicon technology, reconfigurable system-on-chip devices such as FPGAs are increasingly becoming sensitive to aging effects. This paper presents a high-level physical planning with reconfiguration strategy in order to mitigate the aging-induced delay degradation in FPGA resources. The proposed solution is an offline framework composed of an aging-aware floorplanner coupled with a proactive aging-aware reconfiguration policy which generates checkpoints aperiodically for runtime reconfiguration. We consider BTI and HCI aging mechanisms and consider the BTI-based aging recovery during idle periods using aging history map. The experiments demonstrate that sequence of configurations generated by this scheme provides average aging-rate reduction on FPGA resources and application performance by 53.2% and 17.5%, respectively.

7A-2 (Time: 14:15 - 14:40)
TitleHardware Reliability Margining for the Dark Silicon Era
AuthorLiangzhen Lai, *Puneet Gupta (UCLA, U.S.A.)
Pagepp. 637 - 642
Keyworddark silicon, reliability, margining
AbstractHardware reliability margin should be derived from the worst-case aging scenario, which typically occurs when the circuits are operating at peak performance state with the highest operating voltage and frequency. However, as integrated circuits enter the ``dark silicon'' era, it is impossible to power up all circuits throughout the entire lifetime. Reliability margining in absence of architecture-level power/thermal constraints can be overly pessimistic. In this work, we propose a margining scheme that employs the power/thermal contexts and system management policies to derive the actual worst-case workload pattern for different reliability phenomena. Our experiment results show that at 60% dark ratio, conventional margining approach can overestimate the aging degradation due to EM and BTI by up to 3-7X and 18% respectively. Our margining method is able to eliminate these over-pessimism and results in about 20% delay margin and 40%-60% metal width margin reduction.

7A-3 (Time: 14:40 - 15:05)
TitleACR: Enabling Computation Reuse for Approximate Computing
Author*Xin He (Chinese Academy of Sciences, China), Guihai Yan, Yinhe Han, Xiaowei Li (Institute of Computing Technology, Chinese Academy of Sciences, China)
Pagepp. 643 - 648
KeywordApproximate computing, Computation reuse
AbstractApproximate computing, which trades off computation quality (e.g, accuracy) and computation efforts, has becoming a promising technique to improve performance for many mission-non-critical and error-tolerant applications. The computations in such applications usually exhibit superior value locality, i.e, computations performed by a function or code region are very likely to reproduce ``similar'' results. Reusing the similar results can bypass redundant computations, as long as ``exact'' results are not mandatory. However, conventional computation reuse techniques are less effective in approximate computing paradigm. The input values of two computation instances have to be identical to reuse one for another, hence ``exact'' in nature. We propose ACR, an approximate computation reuse framework, to enable computation reuse for approximate computing. ACR relaxes the exact matching in inputs to some extent regulated by ``similarity'' quantification, thereby shifting the exact computation reuse paradigm to its approximate counterpart. We furthermore propose an input significance-aware similarity quantification scheme through statistical approaches. Experimental result shows ACR could effectively exploit the potential of computation reuse for approximate computing and reduce 47.6\% computations on average for a set of approximate applications.

7A-4 (Time: 15:05 - 15:30)
TitleWork hard, sleep well - Avoid irreversible IC wearout with proactive rejuvenation
Author*Xinfei Guo, Mircea R. Stan (University of Virginia, U.S.A.)
Pagepp. 649 - 654
Keywordirreversible wearout, boundary, sleep-when-getting-tired, negative turbo boost
AbstractVarious wearout mechanisms have both a reversible and an irreversible (permanent) part, with some, like BTI and EM having a significant reversible part, while others, like HCI, being mostly irreversible. In this paper we make two contributions. First, we show that the boundary between the reversible and irreversible parts of wearout is not fixed, with the irreversible part becoming at least partially reversible under the right conditions of active accelerated recovery and stress/recovery scheduling. Second, we show that there are certain stress/recovery schedules that can (almost) completely eliminate irreversible wearout, thus allowing significant reductions in necessary design margins. The experiments were done on commercial FPGAs fabricated in a 40nm technology. To fully repair and avoid the irreversible wearout, we propose a biology-inspired sleep-when-getting-tired strategy. The strategy can achieve >60x design margin reduction and ~9% average performance improvement within a 10-year lifetime constraint compared to the no-recovery case. Potential system level implementations (a negative “turbo-boost” like strategy) in multicore and NoC systems are also presented.


Session 7B  Design for Trustworthy IC
Time: 13:50 - 15:30 Thursday, January 28, 2016
Location: TF4304
Chairs: Yu Wang (Tsinghua University, China), Jeyavijayan Rajendran (University of Texas at Dallas, U.S.A.)

7B-1 (Time: 13:50 - 14:15)
TitleNetlist Reverse Engineering for High-Level Functionality Reconstruction
Author*Travis Meade, Shaojie Zhang, Yier Jin (University of Central Florida, U.S.A.)
Pagepp. 655 - 660
KeywordReverse Engineering, IP Security, Netlist Analysis
AbstractIn a modern IC design flow, from specification development to chip fabrication, various security threats are emergent. Of particular concern are modifications made to third-party IP cores and commercial off-the-shelf (COTS) chips where no golden models are available for comparisons. Toward this direction, we develop a tool, named Reverse Engineering Finite State Machine (REFSM), that helps end-users reconstruct a high-level description of the control logic from a flattened netlist. We demonstrate that REFSM effectively recovers circuit control logic from netlists with varying degrees of complexity. Experimental results also showed that the developed tool can easily identify malicious logic from a flattened (or even obfuscated) netlist. If combined with chip level reverse engineering techniques, the developed REFSM tool can help detect the insertion of hardware Trojans in fabricated circuits.

7B-2 (Time: 14:15 - 14:40)
TitleAssessing CPA Resistance of AES with Different Fault Tolerance Mechanisms
AuthorHoda Pahlevanzadeh, Jaya Dofe, *Qiaoyan Yu (University of New Hampshire, U.S.A.)
Pagepp. 661 - 666
KeywordAES, correlation power analysis, fault tolerance, partial guessing entropy, FPGA
AbstractCountermeasures for Advanced Encryption Standard (AES) to thwart side-channel attack and fault attack are typically investigated in a separate fashion. There is lack of thorough investigation on how one countermeasure specifically for one attack affects the efficiency of another attack. In this work, we consider three different fault detection (FD) methods − double modular redundancy (DMR), inverse function (inverse), and even parity check code (parity). We perform FPGA-based systematic analysis to investigate the impact of FD schemes on the correlation power analysis (CPA) resistance of a complete AES implementation. Moreover, the power model used in the existing work is Hamming weight rather than the powerful Hamming distance one. Our experimental results show that, in some scenarios, the use of fault detection mechanisms in AES improves the resistance against CPA. For instance, applying a parity FD to the AES’s S-Box makes it harder to retrieve the key than the case without any FD protection.

7B-3 (Time: 14:40 - 15:05)
TitleSPARTA: A Scheduling Policy for Thwarting Differential Power Analysis Attacks
Author*Ke Jiang, Petru Eles, Zebo Peng, Sudipta Chattopadhyay (Linköping University, Sweden), Lejla Batina (Radboud University, Netherlands)
Pagepp. 667 - 672
KeywordReal-time systems, Security, Countermeasure, DPA attacks
AbstractEmbedded systems (ESs) have been widely used in various application domains. It is very important to design ESs that guarantee functional correctness of the system under strict timing constraints. Such systems are known as the real-time embedded systems (RTESs). More recently, RTESs started to be utilized in safety and reliability critical areas, which made the overlooked security issues, especially confidentiality of the communication, a serious problem. Differential power analysis attacks (DPAs) pose serious threats to confidentiality protection mechanisms, i.e., implementations of cryptographic algorithms, on embedded platforms. In this work, we present a scheduling policy, SPARTA, that thwarts DPAs. Theoretical guarantees and preliminary experimental results are presented to demonstrate the efficiency of the SPARTA scheduler.

7B-4 (Time: 15:05 - 15:30)
TitleAnalysis and Vulnerability Exploration of Current Secure Scan Designs
AuthorYanhui Luo, *Aijiao Cui (Harbin Institute of Technology Shenzhen Graduate School, China), Huawei Li (Chinese Academy of Sciences, China), Gang Qu (University of Maryland College Park, U.S.A.)
Pagepp. 673 - 678
Keywordsecure scan design, scan-based side-channel attack, obfuscating scan chain
AbstractScan design has become another side-channel of leaking confidential information inside crypto chips. Methods based on obfuscating scan chain order have been proposed as effective countermeasures. In this paper, we analyze the existing secure scan designs from the angle whether they need a complete chain state and rely on any specific scan chain order. We show that all existing attacks do not rely on specific scan chain order. As an example, for the recently proposed ROS countermeasure, we demonstrate, how an attacker can access the complete state of the scan chain and hence defeat the countermeasure.


Session 7C  Design for Reliability
Time: 13:50 - 15:30 Thursday, January 28, 2016
Location: TF4204
Chairs: Martin Wong (University of Illinois at Urbana-Champaign, U.S.A.), Evangeline F.Y. Young (Chinese University of Hong Kong, Hong Kong)

7C-1 (Time: 13:50 - 14:15)
TitleLaplacian Eigenmaps and Bayesian Clustering Based Layout Pattern Sampling and Its Applications to Hotspot Detection and OPC
Author*Tetsuaki Matsunawa (Toshiba Corp., Japan), Bei Yu (Chinese University of Hong Kong, Hong Kong), David Z. Pan (University of Texas at Austin, U.S.A.)
Pagepp. 679 - 684
KeywordPattern Sampling, Clustering, OPC, Hotspot Detection
AbstractEffective layout pattern sampling is a fundamental component for lithography process optimization, hotspot detection, and model calibration. Existing pattern sampling algorithms rely on either vector quantization or heuristic approaches. However, it is difficult to manage these methods due to the heavy demands of prior knowledges, such as high-dimensional layout features and manually tuned hypothetical model parameters. In this paper we present a self-contained layout pattern sampling framework, where no manual parameter tuning is needed. To handle high dimensionality and diverse layout feature types, we propose a nonlinear dimensionality reduction technique with kernel parameter optimization. Furthermore, we develop a Bayesian model based clustering, through which automatic sampling is realized without arbitrary setting of model parameters. The effectiveness of our framework is verified through a sampling benchmark suite and two applications, lithography hotspot detection and optical proximity correction.

7C-2 (Time: 14:15 - 14:40)
TitleBalancing Lifetime and Soft-Error Reliability to Improve System Availability
Author*Junlong Zhou (University of Notre Dame, East China Normal University, U.S.A.), X. Sharon Hu, Yue Ma (University of Notre Dame, U.S.A.), Tongquan Wei (East China Normal University, China)
Pagepp. 685 - 690
KeywordLifetime, Soft-Error, Reliability, Availability, MTTF
AbstractCMOS scaling has greatly increased concerns for lifetime reliability due to permanent faults and soft-error reliability due to transient faults. Most existing works only focus on one of the two reliability concerns, but often times techniques used to increase one type of reliability may adversely impact the other type. A few efforts do consider both types of reliability together and use two different metrics to quantify the two types of reliability. However, for many systems, the concern of the user is to maximize system availability by improving the mean time to failure (MTTF), regardless of whether the failure is caused by permanent faults or transient faults. Addressing this concern requires a uniform metric to measure the effect due to both types of faults. In this paper, we derive a novel analytical expression for calculating the MTTF due to transient faults. Using this new formula and an existing method to evaluate system MTTF, we formulate and solve the problem of maximizing system availability with consideration of permanent faults, transient faults, and throughput constraint. Extensive simulations of synthetic task sets and benchmarks based on real-world applications were performed to validate our algorithm.

7C-3 (Time: 14:40 - 15:05)
TitleA Closed-Form Stability Model for Cross-Coupled Inverters Operating in Sub-Threshold Voltage Region
Author*Tatsuya Kamakari, Jun Shiomi, Tohru Ishihara, Hidetoshi Onodera (Kyoto University, Japan)
Pagepp. 691 - 696
KeywordCross-coupled inverter, Yield, Stability, Analytical Model, Sub-threshold Voltage
AbstractA cross-coupled inverter which is an essential element of on-chip memory subsystems plays an important role in synchronous LSI circuits. In this paper, an analytical stability model for a cross-coupled inverter operating in a sub-threshold voltage region is proposed. The proposed model analytically shows that the minimum operating voltage of the cross-coupled inverter distributes normally in a high-sigma region if the distribution of the threshold voltage is Gaussian. The minimum supply voltage at which the yield of the cross-coupled inverter becomes a specific value can be accurately derived by a simple calculation using the model. Monte-Carlo simulation assuming a commercial 28nm process technology demonstrates the accuracy and the validity of the proposed model. Based on the model, this paper shows strategies for variation tolerant memory design.

7C-4 (Time: 15:05 - 15:30)
TitleDelay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products
AuthorSamyoung Bang (Samsung Electronics, Republic of Korea), Kwangsoo Han, Andrew B. Kahng, *Mulong Luo (University of California at San Diego, U.S.A.)
Pagepp. 697 - 704
Keywordinterconnection, crosstalk, DRAM, timing
AbstractCrosstalk-induced delay change is a critical challenge to physical design of long interconnect channels in DRAM products at the 2x and 1x technology nodes. Due to severe cost challenges in a high-volume, commodity market, layout resources including channel width, buffers, and number of metal routing layers are extremely scarce. We describe a new channel optimizer that reduces crosstalk-induced delay uncertainty, weighted by signal criticality and aware of signal activity correlations (e.g., to reduce delay uncertainty by mutual shielding). Instead of the typical signal net permutation strategy, we apply (pessimistic) timing-driven swizzling to minimize the delay uncertainty cost function. Contributions of this work include (1) an accurate and efficient analytical crosstalk delay calculator, (2) scalability up to hundreds of signals and tracks in the routing channel through use of greedy and decomposition strategies as well as a pair-swapping heuristic, and (3) experimental studies that demonstrate up to 24% reduction of the worstcase criticality-weighted delay uncertainty (or, 34ps of absolute delay uncertainty reduction) compared with the typical signal permutation approach.


Session 8S  (Special Session) Reliability, Adaptability and Flexibility in Timing
Time: 15:50 - 17:30 Thursday, January 28, 2016
Location: TF4303
Organizer: Bing Li (Technische Universität München, Germany), Chair: Hidetoshi Onodera (Kyoto University, Japan)

8S-1 (Time: 15:50 - 17:30)
Title(Invited Paper) Reliability, Adaptability and Flexibility in Timing: Buy a Life Insurance for Your Circuits
Author*Ulf Schlichtmann (TU Munich, Germany), Masanori Hashimoto (Osaka University, Japan), Iris Hui-Ru Jiang (National Chiao Tung University, Taiwan), Bing Li (TU Munich, Germany)
Pagepp. 705 - 711
KeywordAging Analysis, Timing Adaptation, Criticality-dependency-aware Timing
AbstractAt nanometer manufacturing technology nodes, process variations affect circuit performance significantly. In addition, performance deterioration of circuits due to aging effects is also increasing. Consequently, a large timing margin is required to maintain yield. To combat the pessimism and the resulting overdesign, aging analysis with high-level models, on-chip timing margin monitoring and tuning, and flexible delay models of flip-flops can be deployed. This paper gives an overview of the state of the art of applying these techniques to improve the health of circuits.


Session 8A  Emerging Networks-on-Chip Designs
Time: 15:50 - 17:30 Thursday, January 28, 2016
Location: TF4203
Chairs: Mehdi Tahoori (Karlsruhe Institute of Technology, Germany), Chun-Yi Lee (National Tsing Hua University, Taiwan)

8A-1 (Time: 15:50 - 16:15)
TitleA High Performance Reliable NoC Router
Author*Lu Wang, Sheng Ma, Zhiying Wang (College of Computer, National University of Defense Technology, China)
Pagepp. 712 - 718
KeywordReliability, Network-on-chip, High performance, Router design
AbstractAggressive scaling of CMOS process technology allows the fabrication of highly integrated chips, and enables the design of multiprocessors system-on-chip connected by the network- on-chip (NoC). However, it brings about widespread reliability challenges. Aiming to tackle the permanent faults on the router components, we propose a high performance, high reliability and low cost router design based on a generic 2-stage router. Four fault tolerant strategies are added in our reliable router. We exploit a double routing strategy for the routing computation(RC) failure, a default winner strategy for the virtual channel allocation (VA), a runtime arbiter selection strategy for the switch allocation (SA) failure and a double bypass bus strategy for the crossbar failure. Different from previous reliable routers, our design leverages the feature of pipeline optimization and routing algorithm to maintain the performance in fault tolerance especially under heavy network loads. Besides, our proposed router provides higher reliability with lower hardware consumption than previous reliable router designs.

8A-2 (Time: 16:15 - 16:40)
TitleDynamic Admission Control for Real-Time Networks-On-Chips
Author*Adam Kostrzewa, Selma Saidi, Leonardo Ecco, Rolf Ernst (TU Braunschweig, Germany)
Pagepp. 719 - 724
Keywordreal-time, safety, overlay network
AbstractNetworks-on-Chip (NoCs) for real-time systems require solutions for safe and predictable sharing of network resources. In this work, we present a mechanism for a global and dynamic admission con- trol in NoCs designed for real-time systems. It in- troduces an overlay network to synchronize transmis- sions using arbitration units called Resource Managers (RMs), which allows a global and work-conserving scheduling. We present a formal worst-case timing analysis for the proposed mechanism and demonstrate that this solution not only exposes higher performance in simulation but, even more importantly, consistently reaches smaller formally guaranteed worst-case laten- cies than TDM for realistic levels of system’s utiliza- tion. Our mechanism does not require modification of routers and therefore can be used together with any architecture utilizing non-blocking routers.

8A-3 (Time: 16:40 - 17:05)
TitleFoToNoC: A Hierarchical Management Strategy Based on Folded Torus-Like Network-on-Chip for Dark Silicon Many-Core Systems
Author*Lei Yang, Weichen Liu, Weiwen Jiang, Mengquan Li, Juan Yi, Edwin Hsing-Mean Sha (Chongqing University, China)
Pagepp. 725 - 730
KeywordDark silicon, System performance, Network-on-Chip, Temperature
AbstractIn this dark silicon era, techniques have been developed to selectively activate nonadjacent cores in physical locations to maintain the safe temperature and allowable power budget on a many-core chip. This will result in unexpected increase in the communication overhead due to longer average distance between active cores in a typical mesh-based Network-on-Chip (NoC), and in turn reduce the system performance and energy efficiency. In this paper, we present FoToNoC, a Folded Torus-like NoC, and a hierarchical management strategy on top of it, to address this tradeoff problem for heterogeneous many-core systems. Optimizations of chip temperature, inter-core communication, application performance, and system energy consumption are well isolated in FoToNoC, and addressed in different design phases and aspects. A cluster-based hierarchical strategy is proposed to manage the system adaptively in several different control levels. Compared with mesh-based systems on a set of synthetic and real benchmarks, FoToNoC can achieve on average 39.4% performance improvement when similar temperature conditions are maintained, and the proposed strategy can further reduce the total energy consumption by up to 42.0%.

8A-4 (Time: 17:05 - 17:30)
TitleAnalytical ThruChip Inductive Coupling Channel Design Optimization
Author*Li-Chung Hsu, Junichiro Kadomoto, So Hasegawa, Atsutake Kosuge, Yasuhiro Take, Tadahiro Kuroda (Keio University, Japan)
Pagepp. 731 - 736
KeywordTCI, 3-D IC, ThruChip, Inductive Coupling
AbstractThruChip interface (TCI) is an emerg- ing 3-D integrated circuit stacking technology. TCI utilizes on-chip inductor to build vertical communi- cation channel in near field distance and has been proved to stand comparison with through-silicon-via (TSV) in data rate, power, and reliability. Moreover, it is also cost-effective in manufacturing due to its wireless nature. In this paper, an analytical method is proposed to find near-optimal TCI inductive cou- pling channel solution. The experiment results show an average 16.8% transmitting current reduction and shrink design time from days to a few minutes.


Session 8B  Test and Debug
Time: 15:50 - 17:30 Thursday, January 28, 2016
Location: TF4304
Chair: Shi-Yu Huang (National Tsing Hua University, Taiwan)

8B-1 (Time: 15:50 - 16:15)
TitleExtending Trace History Through Tapered Summaries in Post-silicon Validation
Author*Sandeep Chandran, Preeti Ranjan Panda, Smruti R. Sarangi (Indian Institute of Technology Delhi, India), Deepak Chauhan, Sharad Kumar (Freescale Semiconductors India Pvt Ltd, India)
Pagepp. 737 - 742
KeywordPost-silicon Validation Methodology, Online filtering, Tapered Summaries
AbstractOn-chip trace buffers are increasingly being used for at-speed debug during post-silicon validation. However, the activity history captured by these buffers is small due to their limited size. We propose a novel scheme that extends the captured trace history (by upto 162%) by using a portion of the trace buffer to also store summaries of trace messages. We describe an Overlapped architecture that uses reduced number of ports to capture tapered summaries. We demonstrate the usefulness of the proposed methodology for debugging various classes of bugs encountered during post-silicon validation.

8B-2 (Time: 16:15 - 16:40)
TitleNovel Applications of Deep Learning Hidden Features for Adaptive Testing
Author*Bingjun Xiao (University of California, Los Angeles, U.S.A.), Jinjun Xiong (IBM Research, U.S.A.), Yiyu Shi (University of Notre Dame, U.S.A.)
Pagepp. 743 - 748
KeywordAdaptive testing, DNN, Big data
AbstractAdaptive test of integrated circuits (IC) promises to increase the quality and yield of products with reduced manufacturing test cost compared to traditional static test flows. Based on recent progress on machine learning, this paper proposes a novel deep learning based method for adaptive test. In this paper, we start from a trained deep neuron network (DNN) with a much higher accuracy than the conventional test flow for the pass and fail prediction. We further develop two novel applications by leveraging the features learned from DNN: one to enable partial testing, i.e., make decisions on pass and fail without finishing the entire test flow, and two to enable dynamic test ordering, i.e., changing the sequence of tests adaptively. Experiment results show significant improvement on the accuracy and effectiveness of our proposed method.

8B-3 (Time: 16:40 - 17:05)
TitleMixed 01X-RSL-Encoding for Fast and Accurate ATPG with Unknowns
Author*Dominik Erb, Karsten Scheibler (University of Freiburg, Germany), Michael A. Kochte (University of Stuttgart, Germany), Matthias Sauer (University of Freiburg, Germany), Hans-Joachim Wunderlich (University of Stuttgart, Germany), Bernd Becker (University of Freiburg, Germany)
Pagepp. 749 - 754
KeywordUnknown values, test generation, Restricted symbolic logic, SAT, stuck-at faults
AbstractUnknown (X) values in a design introduce pessimism in conventional test generation algorithms which results in a loss of fault coverage. This pessimism is reduced by a more accurate modeling and analysis. Unfortunately, accurate analysis techniques highly increase runtime and limit scalability. One promising technique to prevent high runtimes while still providing high accuracy is the use of restricted symbolic logic (RSL). However, also pure RSL-based algorithms reach their limits as soon as millon gate circuits need to be processed. In this paper, we propose new ATPG techniques to overcome such limitations. An efficient hybrid encoding combines the accuracy of RSL-based modeling with the compactness of conventional threevalued encoding. A low-cost two-valued SAT-based untestability check is able to classify most untestable faults with low runtime. An incremental and event-based accurate fault simulator is introduced to reduce fault simulation effort. The experiments demonstrate the effectiveness of the proposed techniques. Over 97% of the faults are accurately classified. Both the number of aborts and the total runtime are significantly reduced compared to the state-of-the-art pure RSL-based algorithm. For circuits up to a million gates, the fault coverage could be increased considerably compared to a state-of-the-art commercial tool with very competitive runtimes.

8B-4 (Time: 17:05 - 17:30)
TitleTest and Diagnosis Pattern Generation for Dynamic Bridging Faults and Transition Delay Faults
Author*Cheng-Hung Wu, Saint James Lee, Kuen-Jong Lee (National Cheng Kung Univ., Taiwan)
Pagepp. 755 - 760
KeywordFault diagnosis, transition fault, dynamic bridging fault, diagnosis pattern generation, test compaction
AbstractA dynamic bridging fault (DBF) induces a transition delay on a circuit node and hence has similar effects as a transition delay fault (TDF). However the causes of these two types of faults are quite different: a DBF is due to the bridging effects between two circuit nodes, while a TDF is due to a node itself or the logic connected to the node. Thus in addition to detecting these two types of faults, it is also important to distinguish them such that the exact sources of defects can be identified during the yield ramping process. In this paper we present an efficient test and diagnosis pattern generation procedure to detect DBFs and TDFs as well as to distinguish them. We first analyze the dominance relation between a DBF and its corresponding TDF. A novel circuit model called the inverse DBF (IDBF) model is then developed which can transform the problem of distinguishing a pair of a DBF and a TDF into the problem of detecting the inverse DBF. The pattern generation process can then be done by using an ATPG tool for dynamic bridging faults. We believe this is the first work to distinguish TDFs and DBFs. A complete procedure to generate both test and diagnosis patterns to detect all testable TDFs and DBFs in addition to distinguishing them is then presented. In this flow all TDFs and DBFs as well as all fault pairs between the two types of faults can be modeled in a single circuit and dealt with in a few ATPG runs. Thus the pattern generation process is quite efficient and very compact pattern sets can be obtained by utilizing the test pattern compaction feature of the ATPG tool. Experimental results on ISCAS89 benchmarks show that our procedure can detect all detectable TDFs and DBFs and up to 99.96% diagnosis resolution for these faults is achieved.


Session 8C  Emerging Devices and Systems for Cyber-Physical Applications
Time: 15:50 - 17:30 Thursday, January 28, 2016
Location: TF4204
Chairs: Chengmo Yang (University of Delaware, U.S.A.), Duo Liu (Chongqing University, China)

8C-1 (Time: 15:50 - 16:15)
TitleFlexible Transition Metel Dichalcogenide Field-Effect Transistors: A Circuit-Level Simulation Study of Delay and Power under Bending, Process Variation, and Scaling
AuthorYing-Yu Chen (University of Illinois at Urbana-Champaign, U.S.A.), *Morteza Gholipour (Babol University of Technology, Iran), Deming Chen (University of Illinois at Urbana-Champaign, U.S.A.)
Pagepp. 761 - 768
Keywordtransition metal dichalcogenide, TMDFET, flexible electronics, modeling, simulation
AbstractIn this paper, a new and efficient SPICE model of flexible transition metal dichalcogenide field-effect transistors (TMDFETs) is developed for different types of materials, considering effects when scaling the transistor size down to the 16-nm technology node. Extensive circuit-level simulations are performed using this model, and the delay and power performance of TMDFET circuits with different amounts of bending are reported. Simulation results indicate that delay and power trade-off can be done in TMDFET circuits via bending. Effects from process variation are also evaluated via circuit simulations. Finally, our cross-technology and scaling studies show that while TMDFETs perform better than Si-based transistors in terms of energy-delay product (EDP) at 180-nm and 90-nm technology nodes (the best being 12.7% and 40.7% of that of Si-based transistors, respectively), their EDPs are worse than Si-based transistors (at least 4.9x of that of the best performing Si-based transistor) on the 16-nm technology node. Such a compact model would enable SPICE-level circuit simulation for early assessment, design, and evaluation of futuristic TMDFET-based flexible circuits targeting advanced technology nodes.

8C-2 (Time: 16:15 - 16:40)
TitleNon-Volatile Non-Shadow Flip-Flop using Spin Orbit Torque for Efficient Normally-off Computing
Author*Rajendra Bishnoi, Fabian Oboril, Mehdi B. Tahoori (Karlsruhe Institute of Technology, Germany)
Pagepp. 769 - 774
KeywordSpin orbit torque, low power, flip-flop, write avoidance, power gate
AbstractWith technology downscaling, it is very challenging to deal with static power. Conventional CMOS and Non-Volatile flip-flops cannot provide effective solution for such problem. We propose a novel Non-Volatile Non-Shadow flip-flop using Spin Orbit Torque based MTJ cells. In this design, we exploit the high speed, low energy and high reliability features of SOT devices to employ them as active components of the flip-flop. This enables efficient normally-off computing by allowing very aggressive power gating for both short and long standby periods. Experimental results show that the NVNS-FF has similar energy and timing characteristics as conventional CMOS-based flip-flops in active mode, and at the same time it allows to reduce the static power by 5X compared to backup flip-flops.

8C-3 (Time: 16:40 - 17:05)
TitleOptimal Co-Scheduling of HVAC Control and Battery Management for Energy-Efficient Buildings Considering State-of-Health Degradation
Author*Tiansong Cui, Shuang Chen (University of Southern California, U.S.A.), Yanzhi Wang (Syracuse University, U.S.A.), Qi Zhu (University of California, Riverside, U.S.A.), Shahin Nazarian, Massoud Pedram (University of Southern California, U.S.A.)
Pagepp. 775 - 780
KeywordHVAC Control, Battery, Smart Building
AbstractThe heating, ventilation and air conditioning (HVAC) system accounts for half of the energy consumption of a typical building. Additionally, the need for HVAC changes over hours and days as does the electric energy price. Level of comfort of the building occupants is, however, a primary concern, which tends to overwrite pricing. Dynamic HVAC control under a dynamic energy pricing model while meeting an acceptable level of occupants' comfort is thus critical to achieving energy efficiency in buildings in a sustainable manner. Finally, there is the possibility that the building is equipped with some renewable source of power such as solar panels mounted on the rooftop. The presence of a battery energy storage system in a target building would enable peak power shaving by adopting a suitable charge and discharge schedule for the battery, while simultaneously meeting building energy efficiency and user satisfaction. Achieving this goal requires detailed information (or predictions) about the amount of local power generation from the renewable source plus the power consumption load of the building. This paper addresses the co-scheduling problem of HVAC control and battery management to achieve energy-efficient buildings, while also accounting for the degradation of the battery state-of-health during charging and discharging operations (which in turn determines the amortized cost of owning and utilizing a battery storage system). A time-of-use dynamic pricing scenario is assumed and various energy loss components are considered including power dissipation in the power conversion circuitry as well as the rate capacity effect in the battery. A global optimization framework targeting the entire billing cycle is presented and an adaptive co-scheduling algorithm is provided to dynamically update the optimal HVAC air flow control and the battery charging/discharging decision in each time slot during the billing cycle to mitigate the prediction error of unknown parameters. Experimental results show that the proposed algorithm achieves up to 15% in the total electric utility cost reduction compared with some baseline methods.

8C-4 (Time: 17:05 - 17:30)
TitleAccurate Remaining Range Estimation for Electric Vehicles
Author*Joonki Hong, Sangjun Park, Naehyuck Chang (Korea Advanced Institute of Science and Technology, Republic of Korea)
Pagepp. 781 - 786
KeywordElectric vehicle, Range estimation, Modeling methodology, EV power model, Regression
AbstractEV drivers have range anxiety because of a short driving range of the EV. In this paper, we emphasize that accurate remaining range estimation can efficiently mitigate the range anxiety of EV drivers. Just like the analogous concepts used in the power estimation of digital circuits, remaining range estimation consists of the two consecutive steps, driving profile estimation and power consumption estimation. We come up with a hybrid modeling methodology, and decreased the estimation error down to 2.52%.