Title | An Automatic Place-and-Routed Two-Stage Fractional-N Injection-locked PLL Using Soft Injection |
Author | *Dongsheng Yang, Wei Deng, Aravind Tharayil Narayanan, Kengo Nakata, Teerachot Siriburanon, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan) |
Page | pp. 1 - 2 |
Keyword | Automatic Place-and-Routed, Synthesizable, Fractional-N, Soft Injection, DPLL |
Abstract | This paper presents an automatic place-and-routed two-stage fractional-N injection-locked PLL (IL-PLL) using soft injection technique for on-chip clock generation. Fabricated in a 65nm CMOS process, this prototype demonstrates a 3.6-ps integrated jitter at 1.5222 GHz and consumes 3mW leading to an FoM of -224.6 dB while only occupying an area of 0.048 mm2. It realizes the first fully synthesized fractional-N injection-locked PLL up-to-date. |
Title | Time-Domain I/Q-LOFT Compensator Using a Simple Envelope Detector for a Sub-GHz IEEE 802.11af WLAN Transmitter |
Author | *Chak-Fong Cheang, Ka-Fai Un, Pui-In Mak, Rui Paulo da Silva Martins (University of Macau, Macau) |
Page | pp. 3 - 4 |
Keyword | envelope detector, I/Q imbalance, LO feedthrough, wideband |
Abstract | This paper proposes a hardware-efficient time-domain scheme to digitally compensate the I/Q imbalance and LO feedthrough (LOFT) of a sub-GHz wideband transmitter for the IEEE 802.11af WLAN. A simple envelope detector is the only analog part. The parameters are updated by Least-Mean-Square and estimated efficiently in time domain by using COordinate Rotation DIgital Computer (CORDIC), saving the training time and power consumption. The measured wideband image-rejection ratio (IRR) and LO-leakage- rejection ratio (LRR) are improved from 18.9 to 41.3 dB, and 20.4 to 37.9 dB, respectively. |
Title | A Noise Reduction Technique for Divider-Less Fractional-N Frequency Synthesizer using Phase-Interpolation Technique |
Author | *Aravind Tharayil Narayanan, Makihiko Katsuragi, Kengo Nakata, Yuki Terashima, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan) |
Page | pp. 5 - 6 |
Keyword | PLL, Fractional, Sub-sampling, phase interpolator, DTC |
Abstract | This paper proposes a noise reduction technique
for divider-less fractional-N frequency synthesizer using phase-interpolation
technique. The phase interpolator helps reduce the
jitter introduced into the system by the multi-phase generation
mechanism used for the fractional operation. The proposed
frequency synthesizer is fabricated in 65nm CMOS process and
it is capable of working at frequencies ranging from 4.3GHz
to 4.9GHz. The measured close-in phase noise is -113dBc/Hz
at an offset of 200kHz from the carrier with 3.3mW power
consumption, which results in a FoM of -246dB. |
Title | A 2.2 uW 15b Incremental Delta-Sigma ADC with Output-Driven Input Segmentation |
Author | *Bo Wang (Hong Kong University of Science and Technology, Hong Kong), Man-Kay Law (Macau University, Macau), Saqib Mohamad (Hong Kong University of Science and Technology, Hong Kong), Amine Bermak (Hamad Bin Khalifa University, Qatar) |
Page | pp. 7 - 8 |
Keyword | incremental delta-sigma ADC, integrator multiplexing, low power ADC |
Abstract | A micro-power incremental delta-sigma ADC is presented. This ADC uses its decimation filter’s output to estimate the input signal level and dynamically adjusts the modulator feedback voltage, thereby reducing the integrator input range and power. For further power saving, integrator time-multiplexing is also employed. Fabricated in 0.18um CMOS, the 0.12 mm2 ADC consumes 2.16uW at a conversion speed of 85S/s, 15.3b resolution and -2/1.5LSB INL. |
Title | A 200-MHz 4-Phase Fully Integrated Voltage Regulator With Local Ground Sensing Dual Loop ZDS Hysteretic Control Using 6.5nH Package Bondwire Inductors on 65nm Bulk CMOS |
Author | Min Kyu Song, Joseph Sankman, Jayeol Lee, *Dongsheng Ma (The University of Texas at Dallas, U.S.A.) |
Page | pp. 9 - 10 |
Keyword | integrated voltage regulator, fast transient response, multiple-phase operation, dual-loop voltage regulation |
Abstract | This paper presents a 200MHz 4-phase fully integrated voltage regulator (FIVR) with 6.5nH package bondwire inductors. With an on-chip delay-locked loop (DLL) for phase synchronization, the proposed FIVR employs a cost-effective local ground sensing feedforward control loop for high speed load transient sensing and a ZDS hysteretic feedback control loop for accurate voltage regulation, independently achieving a dual-loop compensated operation within each sub-converter. Implemented on a standard 65nm bulk CMOS process, the FIVR delivers a peak efficiency of 84.2% at 256mW, with a maximum power density of 670mW/mm2. In response to a 280mA/120ps load step, the FIVR settles within 11ns with 78mV droop. To our best knowledge, this is 2.7 times faster than the best design despite 1.8 times larger load step, while facilitating the use of 1.3 times smaller on-chip capacitor. |
Title | A Variable-Voltage Low-Power Technique for Digital Circuit System |
Author | *An-Tai Xiao, Yung-Siang Miao (Department of Electronics Engineering, National Chiao Tung University, Taiwan), Ching-Hwa Cheng (Department of Electronics Engineering, Feng Chia University, Taiwan), Jiun-In Guo (Department of Electronics Engineering, National Chiao Tung University, Taiwan) |
Page | pp. 13 - 14 |
Keyword | Low-Power, Variable-Voltage |
Abstract | A swing variable voltage technique (CK-Vdd) is proposed to reduce power consume for generic digital circuit system. The proposed CK-Vdd generates a swing variable voltage, which is different from the conventional constant voltage (Vdd) to the digital circuit. The swing voltage is produced from using Voltage Frequency Adjustor (VFA) and Frequency Duty-Cycle Adjustor (FDCA) circuits. The clock rising and falling signals fanin FDCA to generate an adjustable high-low signal to control VFA generates high-low cycling swing voltage. When the clock is at positive-level, a generic positive-edge digital circuit will need large operation current. CK-Vdd supply high-voltage to the digital circuit at this time. On the other hand, when the clock signal transfers to the low-level, CK-Vdd can supply low-voltage to reduce power consumption. From reducing the supply current to the digital circuit at low-level clock, the digital circuit power consumption can be reduced. We implement the CK-Vdd technique in a H.264 video decoder test chip based on TSMC 90 nm CMOS process. The result shows that when CK-Vdd voltage is 0.7v ~ 0.9v it can save average 32% power consumption. To the maximum, decoder chip can save as high as 45% power consumption. |
Title | Sub-threshold VLSI Logic Family Exploiting Unbalanced Pull-up/down Network, Logical Effort and Inverse-Narrow-Width Techniques |
Author | *Ming-Zhong Li, Chio-In Ieong, Man-Kay Law, Pui-In Mak, Mang-I Vai, Sio-Hang Pun, Rui P. Martins (University of Macau, Macau) |
Page | pp. 15 - 16 |
Keyword | CMOS, device sizing, inverse narrow width (INW), logical effort, ultralow energy |
Abstract | This paper presents a complete energy optimized sub-threshold standard cell library exploiting unbalanced pull-up/down (PU/PD) network, logical effort and inverse-narrow-width (INW) techniques. Individual logic cell is optimized for ultra-low-energy applications with low-to- moderate speed requirement. Three 14-tap 8-bit FIR filters are fabricated using a 0.18-μm CMOS technology, while one of them achieved the minimum energy/tap (0.0234 pJ) and 0.365 Figure-of-Merit (FoM) at 100 kHz, 0.31 V, which are well comparable with the state of the art. |
Title | A Testable and Debuggable Dual-Core System with Thermal-Aware Dynamic Voltage and Frequency Scaling |
Author | Liang-Ying Lu, Ching-Yao Chang, Zhao-Hong Chen, Bo-Ting Yeh, Tai-Hua Lu, Peng-Yu Chen, *Pin-Hao Tang, Kuen-Jong Lee, Lih-Yih Chiou, Soon-Jyh Chang, Chien-Hung Tsai, Chung-Ho Chen, Jai-Ming Lin (Department of Electrical Engineering, National Cheng Kung University, Taiwan) |
Page | pp. 17 - 18 |
Keyword | dynamic voltage and frequency scaling (DVFS), test and debug platform |
Abstract | A sophisticated SoC chip that incorporates many design modules including 2 ARM-like CPUs, a dynamic voltage and frequency scaling (DVFS) design, a master/slave temperature sensing system, and an on-chip test/debug platform is developed and implemented with TSMC 90 nm technology. Measurement results validate the functions and efficiencies of the whole chip. |
Title | Rapid Prototyping of Multi-Mode QC-LDPC Decoder for 802.11n/ac Standard |
Author | *Qing Lu, Bruce C. W. Sham, Francis C. M. Lau (The Hong Kong Polytechnic University, Hong Kong) |
Page | pp. 19 - 20 |
Keyword | 802.11n/802.11ac, LDPC, multi-mode, FPGA |
Abstract | A multi-mode QC-LDPC decoder is proposed to satisfy the 802.11n/802.11ac WiFi standard. With code-specific design technique, the overall performance of the decoder is enhanced through ensuring an on-the-fly reconfigurable ability. The proposed architecture has been synthesized using an FPGA for measurements. |
Title | Sub-µW QRS Detection Processor Using Quadratic Spline Wavelet Transform and Maxima Modulus Pair Recognition for Power-Efficient Wireless Arrhythmia Monitoring |
Author | *Chio-In Ieong, Pui-In Mak, Mang-I Vai, Rui P. Martins (University of Macau, Macau) |
Page | pp. 21 - 22 |
Keyword | ASIC Design, Electrocardiogram, Local Signal Processor, Low Power Sensor Signal Processing, System-on-Chip |
Abstract | This paper describes a power-efficient processor for extracting the timing of QRS complex from digitized ECG, based on the hardware-efficient architecture of quadratic spline wavelet transform (QSWT) and maxima modulus pair recognition (MMPR). The processor succeeds in saving the wireless system’s power by 6×. |
Title | Design of an Energy-Autonomous, Disposable, Supply-Sensing Biosensor Using Bio Fuel Cell and 0.23-V 0.25-µm Zero-Vth All-Digital CMOS Supply-Controlled Ring Oscillator with Inductive Transmitter |
Author | *Kiichi Niitsu, Atsuki Kobayashi (Nagoya University, Japan), Yudai Ogawa, Matsuhiko Nishizawa (Tohoku University, Japan), Kazuo Nakazato (Nagoya University, Japan) |
Page | pp. 23 - 24 |
Keyword | energy-autonomous, biosensor, CMOS, all-digital, bio fuel cells |
Abstract | An energy-autonomous, disposable supply-sensing biosensor based on bio fuel cells and a 0.23-V 0.25-um zero-Vth all-digital CMOS supply-controlled ring oscillator with a current-driven pulse-interval-modulated inductive-coupling transmitter was demonstrated. All-digital and current-driven architecture using zero-Vth transistors enables low-voltage operation and small footprint in cost-competitive legacy CMOS. Measured results with 0.25-um CMOS testchip successfully demonstrated operation under a 0.23-V supply, which is the lowest supply voltage among reported proximity transmitters. An energy-autonomous biosensing operation using organic bio fuel cells was also demonstrated. |
Title | Performance-centric Register File Design for GPUs using Racetrack Memory |
Author | *Shuo Wang, Yun Liang, Chao Zhang, Xiaolong Xie, Guangyu Sun (Peking University, China), Yongpan Liu, Yu Wang (Tsinghua University, China), Xiuhong Li (Peking University, China) |
Page | pp. 25 - 30 |
Keyword | GPU, Performance, Register File, Racetrack Memory, Compiler |
Abstract | In this paper, we explore racetrack memory for designing high performance register file for GPU architecture. High storage density racetrack memory helps to improve the thread level parallelism, but the lengthy shift operation may largely degrade the performance. To mitigate the shift operation overhead, we develop a compiler-time managed register mapping algorithm. Our algorithm optimizes the mapping of registers to the physical address in the register file. Experimental results demonstrate that our technique achieves up to 24% (19% on average) improvement in performance for a variety of GPU applications. |
Title | Improving Read Performance of STT-MRAM based Main Memories through Smash Read and Flexible Read |
Author | Lei Jiang (Advanced Micro Devices, U.S.A.), Wujie Wen (Florida International University, U.S.A.), *Danghui Wang (Northwestern Polytechnical University, China), Lide Duan (University of Texas at San Antonio, U.S.A.) |
Page | pp. 31 - 36 |
Keyword | STT-MRAM, read disturbance, main memory, read scheme, LPDDR3 |
Abstract | Spin Transfer Torque Magnetoresistive RAM (STT-MRAM) has been recently deemed as one promising main memory alternative for high-end mobile processors. With process technology scaling, the amplitude of write current approaches that of read current in deep sub-micrometer STT-MRAM arrays. As a result, read disturbance errors (RDEs) emerge. Both high current restore required (HCRR) reads and low current long latency (LCLL) reads can guarantee read reliability and utterly remove RDEs. However, both of them degrade system performance, because of extra restores or a longer read latency. And neither of them always achieves the better performance when running a wide variety of applications. In this paper, we present two architectural techniques to boost read performance for STT-MRAM based main memories in the presence of RDEs. We first propose Smash Read (S-RD) to shorten the latency of HCRR reads by injecting a larger read current. We further introduce Flexible Read (F-RD) to dynamically adopt different types of read schemes, S-RD and LCLL, to maximize main memory system performance. On average, our techniques improve system performance by 9~13% and reduces total energy by 4~8% over all existing read schemes including HCRR and LCLL. |
Title | STLAC: A Spatial and Temporal Locality-Aware Cache and Network-on-Chip Codesign for Tiled Many-core Systems |
Author | *Mingyu Wang (Institute of Microelectronics, Tsinghua University, China), Zhaolin Li (Research Institute of Information Technology, Tsinghua University, China) |
Page | pp. 37 - 42 |
Keyword | Many-core, Adaptive Cache, Network-on-chip |
Abstract | The spatial and temporal locality of workloads are the root causes for cache designs to overcome the memory wall problem. However, few existing state-of-the-art designs exploit both the two locality features to optimize the memory hierarchies in the area of tiled many-core systems, which losses the opportunities to explore more performance improvement. To address this problem, an adaptive spatial and temporal locality-aware cache and network-on-chip (NoC) codesign (STLAC) is proposed, which dynamically partitions the last level cache (LLC) as data prefetch buffer or victim cache for locality prediction and exploits a hybrid burst-support NoC for fast data prefetch. The data prefetch buffer speculates the data blocks in subsequent addresses to exploit the spatial locality, while the victim cache collects the evicted data blocks from the upper memory hierarchy to exploit the temporal locality. By combining the proposed adaptive cache partition with the hybrid burst-support NoC, the off-chip misses and on-chip network usage are greatly reduced. Experimental results demonstrate that the proposed STLAC reduces up to 43% off-chip misses and improves 15% performance on average compared with the traditional shared LLC design. |
Title | A Lightweight OpenMP4 Run-time for Embedded Systems |
Author | Roberto E. Vargas, Sara Royuela, *Maria A. Serrano, Xavi Martorell, Eduardo Quiñones (Barcelona Supercomputing Center, Spain) |
Page | pp. 43 - 49 |
Keyword | OpenMP4, Parallel programming Models, Many-core embedded processors, Compiler Analysis, Task Dependency Graph |
Abstract | OpenMP is increasingly being adopted by current many-core embedded processors to exploit their parallel computation capabilities.
Unfortunately, current run-time implementations of the latest specification (v4.0) are not suitable for processors relying on small and fast on-chip memories, due to its memory consumption.
This paper proposes an OpenMP4 run-time that reduces the memory consumption while providing the same performance.
Our run-time relies on a new compiler pass capable to generate the task dependency graph of OpenMP programs, which is then efficiently stored in memory. |
Title | Improving Tag Generation for Memory Data Authentication in Embedded Processor Systems |
Author | Tao Liu, *Hui Guo, Sri Parameswaran (The University of New South Wales, Australia), X. Sharon Hu (University of Notre Dame, U.S.A.) |
Page | pp. 50 - 55 |
Keyword | Tag Design, emory Data Integrity Protection, Low Cost Embedded Systems |
Abstract | Data integrity is important. One way to protect data integrity is attaching an identifying tag to individual data. The authenticity of the data can then be checked against its tag. If the data is altered by the adversary, the related tag becomes invalid and the attack will be detected.
This paper studies an existing tag design (CETD) for memory data in
embedded processor systems, where data that are stored in the memory or transferred over the bus can be tampered and need to be authenticated before use. Compared to other designs, this design offers the flexibility of tradeoff between the implementation cost and tag size (hence the level of security); the design is cost effective and can counter the data integrity attack with random values; namely the fake values used to replace the valid data in the attack are random. However, we find that the design is vulnerable when the fake data is not randomly selected. For some data, their tags are not distributed over the full tag value space but rather limited to a reduced set of values. When those values were chosen as the fake value, the data alteration would likely go undetected.
In this paper, we analytically investigate this problem and propose a low cost enhancement to ensure the full-range distribution of tag values for each data, hence effectively removing the vulnerability of the original design. |
Title | Maximizing Level of Confidence for Non-Equidistant Checkpointing |
Author | *Dimitar Nikolov, Erik Larsson (Lund University, Sweden) |
Page | pp. 62 - 68 |
Keyword | soft errors, reliability analysis, real-time systems, checkpoinitng |
Abstract | Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in real-time systems (RTS). Therefore, for RTS it is important to optimize the fault tolerance techniques such that the probability to meet the deadlines, i.e. the Level of Confidence (LoC), is maximized. Previous studies have focused on evaluating the LoC for equidistant checkpointing. However, no studies have addressed the problem of evaluating the LoC for non-equidistant checkpointing. In this work, we provide an expression to evaluate the LoC for non-equidistant checkpointing, and propose the Clustered Checkpointing method that distributes a given number of checkpoints with the goal to maximize the LoC. The results show that the LoC can be improved when non-equidistant checkpointing is used. |
Title | A Mutual Auditing Framework to Protect IoT against Hardware Trojans |
Author | Chen Liu, Patrick Cronin, *Chengmo Yang (University of Delaware, U.S.A.) |
Page | pp. 69 - 74 |
Keyword | hardware Trojan, cryptography, IoT security |
Abstract | In an internet of Things (IoT), hardware Trojans implanted in individual nodes, which are malicious modifications to a circuit, may utilize the wireless connection facility to leak confidential information or to collude with each other.
To defend against this threat, we develop a lightweight framework to detect Trojans with affordable performance and energy overhead. We propose to exploit message encryption and vendor diversity among the nodes to build a distributed mutual auditing framework wherein nodes monitor the trustworthiness of their neighbors. |
Title | Mask Optimization for Directed Self-Assembly Lithography: Inverse DSA and Inverse Lithography |
Author | *Seongbo Shim, Youngsoo Shin (KAIST, Republic of Korea) |
Page | pp. 83 - 88 |
Keyword | DSAL, mask optimization, inverse DSA, inverse lithography |
Abstract | In directed self-assembly lithography (DSAL), a mask contains the images of guide patterns (GPs), which are patterned on a wafer through optical lithography; the wafer then goes through DSA process to pattern contacts. Mask design for DSAL, which is the opposite of the above processes, consists of two key steps, inverse DSA and inverse lithography, which we address in this paper. |
Title | Cut Redistribution with Directed Self-Assembly Templates for Advanced 1-D Gridded Layouts |
Author | *Zhi-Wen Lin, Yao-Wen Chang (National Taiwan University, Taiwan) |
Page | pp. 89 - 94 |
Keyword | Directed self-assembly technology, 1-D layout, Design for manufacturability and reliability, Algorithm |
Abstract | Directed self-assembly (DSA) technology is a promising candidate for cut printing in sub-10nm 1-D gridded designs, where cuts might need to be redistributed such that they could be patterned by DSA guiding templates.
In this paper, we first propose a linear-time optimal dynamic-programming-based algorithm for a special case of the template guided cut redistribution problem, where there is at most one dummy wire segment on a track. We then extend our algorithm to general cases by applying a bipartite matching algorithm to decompose a general problem to a set of subproblems conforming to the special case (thus each of them can be solved optimally). Our resulting algorithm can achieve a provably good performance bound, with the cost of a template distribution only linearly to the problem size. Experimental results show that our algorithm can resolve all spacing rule violations, with smaller running times, compared with the previous works on a set of common benchmarks. |
Title | Contact Layer Decomposition To Enable DSA With Multi-patterning Technique For Standard Cell Based Layout |
Author | Zigang Xiao, Chun-Xun Lin, *Martin D.F. Wong (University of Illinois at Urbana-Champaign, U.S.A.), Hongbo Zhang (Synopsys Inc., U.S.A.) |
Page | pp. 95 - 102 |
Keyword | Design for Manufacturability, Directed Self-Assembly, Complementary Lithography, Layout Decomposition, Hybrid Lithography |
Abstract | Multiple patterning lithography has been widely adopted for today's circuit manufacturing. However, increasing the number of masks will make the manufacturing process more expensive. More importantly, towards 7 nm technology node, the accumulated overlay in multiple patterning will cause unacceptable edge placement error (EPE). Recently, directed self-assembly (DSA) has been shown to be an effective lithography technology that can pattern contact/via/cuts with high throughput and low cost. DSA is currently aiming at 7 nm technology, where the guiding template generation needs either double patterning EUV or multiple patterning DUV process. By incorporating DSA into the multiple patterning process, it is possible to reduce the number of masks and achieve a cost effective solution. In this paper, we study the decomposition problem for contact layer in row-based standard cell layout with DSA-MP complementary lithography. We explore several heuristic-based approaches, and propose an algorithm that decomposes a standard cell row optimally in polynomial-time. Our experiments show that our algorithm guarantees to find a minimum cost solution if one exists, while the heuristic cannot or only finds a sub-optimal solution. Our results show that the DSA-MP complementary approach is very promising for the future advanced nodes. |
Title | (Invited Paper) Logic and Memory Design using Spin-based Circuits |
Author | *Zhaoxin Liang, Meghna Mankalale, Brandon Del Bel, Sachin S. Sapatnekar (University of Minnesota, U.S.A.) |
Page | pp. 103 - 108 |
Keyword | Spintronics, All-spin logic, ASL, MTJ, error correction |
Abstract | The design of logic and memory circuits in emerging spintronics technology offers fertile ground for new ideas and innovations. We first describe methods for optimizing spintronic logic circuits at the level of physical design, including systematic approaches for building standard cell libraries to enable the design of large circuits. Next, we examine issues in the design of spintronic memories and present methods that trade off volatility with error correction to build dense memory arrays. |
Title | (Invited Paper) Architecture Design with STT-RAM: Opportunities and Challenges |
Author | Ping Chi, Shuangchen Li, Yuanqing Cheng (University of California at Santa Barbara, U.S.A.), Yu Lu, Seung H. Kang (Qualcomm Incorporated, U.S.A.), *Yuan Xie (University of California at Santa Barbara, U.S.A.) |
Page | pp. 109 - 114 |
Keyword | STT-RAM, cache design, memory design |
Abstract | The emerging STT-RAM has attracted a lot of interest from both academia and industry in recent years. It has been considered as a promising replacement of SRAM and DRAM in the cache and memory system design thanks to many advantages. However, the disadvantages of STT-RAM also bring design challenges. This paper introduces state-of-the-art architectural approaches to adopt STT-RAM in the cache and memory system design by taking advantage of the opportunities brought by STT-RAM as well as overcoming the challenges. |
Title | (Invited Paper) Prospects of Efficient Neural Computing with Arrays of Magneto-metallic Neurons and Synapses |
Author | Abhronil Sengupta, Karthik Yogendra, Deliang Fan, *Kaushik Roy (Purdue University, U.S.A.) |
Page | pp. 115 - 120 |
Keyword | Neuromorphic Computing, Spintronics |
Abstract | Non-von Neumann computing models, like Artificial and Spiking Neural Networks, inspired from the functionalities of the human brain, would require devices that can offer a direct mapping to the underlying neuroscience mechanisms for energy-efficient and compact hardware implementation. To that effect, spin-transfer torque phenomena in devices based on lateral spin valves, domain wall motion in magnets and magnetic tunnel junctions can potentially pave the way for spintronic neural computing systems, where spintronic neurons interfaced with spintronic synapses, can directly mimic biological neural and synaptic functionalities. We explore various device structures suitable for such non-Boolean functionalities and demonstrate the potential benefits of such neural computing based on arrays of magneto-metallic neurons and synapses. |
Title | A Complete Approach to Unreachable State Diagnosability via Property Directed Reachability |
Author | *Ryan Berryhill, Andreas Veneris (University of Toronto, Canada) |
Page | pp. 127 - 132 |
Keyword | diagnosis, debugging, reachability, pdr, ic3 |
Abstract | In modern hardware design, substantial manual
effort is required to fix a design when verification discovers a
state unreachable. This paper addresses this growing pain where
given an unreachable target state, a methodology is presented to
return all design locations where a change can be implemented
to make the target state reachable. In contrast to previous state
reachability rectification techniques that use bounded model
checking, our approach addresses the issue using unbounded
model checking. It first enhances the circuit transition relation
by inserting a novel error model construction at each suspect
location. An unbounded model checking algorithm is then applied
to the enhanced transition relation to find which of the suspect
locations can be changed to make the target state reachable.
The use of unbounded model checking allows it to identify the
complete solution set of the problem. As an added benefit, it also
returns a proof that no further solution(s) exist in the form of
an inductive invariant. Empirical results on industrial designs
confirm the theoretical and practical gains of this approach. |
Title | Formally Analyzing Fault Tolerance in Datapath Designs using Equivalence Checking |
Author | Payman Behnam (University of Tehran, Iran), Bijan Alizadeh (University of Tehran, and IPM, Iran), Sajjad Taheri (University of Tehran, Iran), *Masahiro Fujita (University of Tokyo, Japan) |
Page | pp. 133 - 138 |
Keyword | Formal Verification, Equivalence checking, Fault tolerance, Decision Diagrams |
Abstract | In this paper, we present an efficient formal approach to check the equivalence of synthesized Register Transfer Level (RTL) against the high level specification in the presence of pipelining transformations. With the proposed equivalence checking method, fault tolerance issues when some faults happen in the designs can be formally analyzed. Equivalence checking with the specification can reason about how quickly the design can come back to normal operations when some faults including soft errors happen. To increase the scalability of our proposed method, we dynamically divide the designs into several smaller parts called segments by introducing dynamic cut-points. Then we employ Modular Horner Expansion Diagram (M-HED) to check whether the specification and the implementation are equivalent or not. Our proposed method enables us to deal with the equivalence checking problem for behaviorally synthesized designs even in the presence of pipelines for nested loops. The empirical results demonstrate the efficiency and scalability of our proposed method in terms of run-time and memory usage for several large designs synthesized by a commercial behavioral synthesis tool. Average improvements in terms of the memory usage and run time in comparison with SMT- and SAT-based equivalence checking are 16.7× and 111.9×, respectively. |
Title | Coupling Reverse Engineering and SAT to Tackle NP-Complete Arithmetic Circuitry Verification in ~O(# of gates) |
Author | *Yi Diao, Xing Wei (Easy-Logic Technology Limited, Hong Kong), Tak.Kei Lam (The Chinese University of Hong Kong, Hong Kong), Yu.Liang Wu (Easy-Logic Technology Limited, Hong Kong) |
Page | pp. 139 - 146 |
Keyword | SAT, Multiplier, Arithmetic logic, Macro, Formal verification |
Abstract | There are situations (e.g. for reverse engineering or formal verification) circuit designers would need to extract complicated arithmetic circuitry deeply embedded inside a fully synthesized (or manually touched) million-gate flattened netlist without the knowing of module boundary and IO positions. Besides not knowing the IO and boundary, a formal verification task like comparing two netlists implementing (4A+3B)×C and 4A×C+3B×C respectively is quite challenging for it is an NP-Complete Circuit-SAT problem too. To tackle this problem, we propose a novel Complementary Greedy Coupling (CGC) approach coupling reverse engineering and SAT techniques together for each of them only performs well at proving equality or inequality respectively. The scheme is quite powerful, being able to handle commonly implemented arithmetic modules (Ripple/CLA adders, MUX, various multipliers and their combinations) with runtime complexity nearly linear to the number of circuit gates. For an example, our scheme can verify two 32-bit multipliers (Wallace vs Modified-Booth) within 5 seconds (regardless of their equality or inequality), while running SAT alone might take 1010 centuries. We compared our tool Easy-LEC with the two on market commercial tools using the 182 open benchmarks posted for ICCAD CAD Contest 2014. Besides running at least 400 to 1400 times faster, our scheme also solves 32% to 45% more cases (93% vs 61% or 48%). |
Title | NVPsim: A Simulator for Architecture Explorations of Nonvolatile Processors |
Author | Yizi Gu, *Yongpan Liu, Yiqun Wang, Hehe Li, Huazhong Yang (Tsinghua University, China) |
Page | pp. 147 - 152 |
Keyword | nonvolatile processor, simulator, architecture exploration |
Abstract | Nonvolatile processors (NVPs) preserve run-time information when power failure occurs by utilizing nonvolatile memory technologies. This feature enables
NVPs to make forward progress continuously under intermittent
power supply in energy harvesting systems. This paper builds a gem5 based NVP
simulator named NVPsim, which is validated against measured results of a
fabricated prototype with reasonable error rate. Furthermore, to demonstrate the capability of NVPsim for architecture exploration, we
evaluated performance and energy consumption of
different NVP designs varying in the choice of
nonvolatile memory for on-chip caches, the backup strategy and the energy
buffer size. Experimental results indicate that nvSRAM outperforms other
types of nonvolatile memory as the on-chip cache for energy harvesting systems. |
Title | MCSSim: A Memory Channel Storage Simulator |
Author | *Renhai Chen, Zili Shao (The Hong Kong Polytechnic University, Hong Kong), Chia-Lin Yang (National Taiwan University, Taiwan), Tao Li (University of Florida, U.S.A.) |
Page | pp. 153 - 158 |
Keyword | MCSSim, NVDIMMM, Memory Channel Storage |
Abstract | Recently, NVDIMM (Non-Volatile Dual In-line Memory Module) is being widely supported by leading hardware design companies, such as IBM. Nevertheless, existing efforts largely focus on NVDIMM specification and fabrication issues, and the potential performance gains brought by NVDIMM are not fully investigated. In this paper, we present a NVDIMM based simulator called MCSSim to help study the memory channel storage techniques. MCSSim is a cycle-accurate simulator that is elaborated with the consideration of differences between the memory channel interface and the NAND flash memory features. MCSSim is also implemented with the DRAMSim2 [30] simulator thus enabling the simulation of a variety of hybrid memory systems by combining of DRAM DIMM and NVDIMM. We have done some experiments with MCSSim, and the experimental results show the effectiveness of the proposed simulator. |
Title | Trace-Based Context-Sensitive Timing Simulation Considering Execution Path Variations |
Author | *Sebastian Ottlik, Jan Micha Borrmann, Sadik Asbach, Alexander Viehl (FZI Research Center for Information Technology, Germany), Wolfgang Rosenstiel, Oliver Bringmann (University of Tübingen, Germany) |
Page | pp. 159 - 165 |
Keyword | Software Timing Simulation, Instruction Set Simulation, Software Performance Analysis |
Abstract | We present a fast and accurate timing simulation of binary code execution on complex embedded processors.
Underlying block timings are extracted from a preceding hardware execution and differentiated by execution context.
Thereby, complex factors, such as caches, can be reflected accurately without explicit modelling.
Based on timings observed in one hardware execution, timing of numerous other executions for different inputs can be simulated at an average error below 5% for complex applications on an ARM Cortex-A9 processor. |
Title | Generating High Coverage Tests for SystemC Designs Using Symbolic Execution |
Author | *Bin Lin, Zhenkun Yang, Kai Cong, Fei Xie (Portland State University, U.S.A.) |
Page | pp. 166 - 171 |
Keyword | SystemC, Test Generation, Symbolic Execution, Coverage |
Abstract | In this research, we have developed an approach to generating high coverage tests for SystemC designs using symbolic execution. We have applied this approach to a representative set of SystemC designs. The results show that our approach is able to generate tests that provide high code coverage of the designs with modest time and memory usage and to scale to designs of practical sizes. |
Title | Circular-Contour-Based Obstacle-Aware Macro Placement |
Author | *Chien-Hsiung Chiou, Chin-Hao Chang, Szu-To Chen, Yao-Wen Chang (National Taiwan University, Taiwan) |
Page | pp. 172 - 177 |
Keyword | VLSI, Physical Design, Macro Placement, Obstacle |
Abstract | We present an obstacle-aware macro placement algorithm which locates macros to simultaneously optimize wirelength and routability. We propose a circular contour to characterize the region formed by all obstacles. With the circular contour, we can effectively avoid the overlap between movable macros and obstacles, and simultaneously optimize the shape and area of the region for standard-cell placement.
Experimental results show that our algorithm can achieve the best quality, compared to manual designs provided by industry and leading academic mixed-size placers. |
Title | Learning-Based Prediction of Embedded Memory Timing Failures During Initial Floorplan Design |
Author | Wei-Ting J. Chan (UC San Diego, U.S.A.), Kun Young Chung (Samsung Electronics Co. Ltd., Republic of Korea), Andrew B. Kahng (UC San Diego, U.S.A.), Nancy D. MacDonald (ClariPhy Communications, U.S.A.), *Siddhartha Nath (UC San Diego, U.S.A.) |
Page | pp. 178 - 185 |
Keyword | Floorplan, multiphysics, machine learning, Boosting, timing |
Abstract | Embedded memories are critical in SoC designs as they pose challenges in timing-correctness in advanced technology nodes. We propose a learning-based methodology to perform early prediction of timing failure risk given only the netlist, timing constraints and floorplan context.
We save long runtimes of P&R tools with early prediction. Our methodology identifies which memories are at “risk”, and provides guidance for floorplan changes to reduce predicted “risk”. We can predict slack to within 200ps with only floorplan information. |
Title | Stitch Aware Detailed Placement for Multiple E-Beam Lithography |
Author | Yibo Lin (University of Texas at Austin, U.S.A.), *Bei Yu (Chinese University of Hong Kong, Hong Kong), Yi Zou (University of Texas at Austin, U.S.A.), Zhuo Li, Charles J. Alpert (Cadence Design Systems, Inc., U.S.A.), David Z. Pan (University of Texas at Austin, U.S.A.) |
Page | pp. 186 - 191 |
Keyword | Multiple Electron Beam Lithography, Stitch Error, Detailed Placement, Dynamic Programming |
Abstract | As a promising candidate for next generation lithography, multiple e-beam lithography (MEBL) is able to improve manufacturing throughput using parallel beam printing. In MEBL, a layout is split into stripes and the layout patterns are cut by stripe boundaries, then all the stripes are printed in parallel.
If a via pattern or a vertical long wire is overlapping with a stitch, it may suffer from poor printing quality due to the so called stitch error, then the circuit performance may be degraded. In this paper, we propose a comprehensive study on the stitch aware detailed placement to simultaneously minimize the stitch error and optimize other traditional objectives, e.g., wirelength and density. Experimental results show that our algorithms are very effective on modified ICCAD 2014 benchmarks that zero stitch error is guaranteed while the scaled half-perimeter wirelength is very comparable to a state-of-the-art detailed placer. |
Title | Minimum Implant Area-Aware Placement and Threshold Voltage Refinement |
Author | Seong-I Lei, *Wai Kei Mak (National Tsing Hua University, Taiwan), Chris Chu (Iowa State University, U.S.A.) |
Page | pp. 192 - 197 |
Keyword | Detailed placement, Threshold Voltage Assignment, Implant area |
Abstract | Threshold voltage assignment is a very effective
technique to reduce leakage power consumption in modern integrated
circuit (IC) design. As feature size continues to decrease,
the layout constraints (called MinIA constraints) on the
implant area, which determines the threshold voltage of a device,
are becoming increasingly difficult to satisfy. It is necessary
to take these constraints into consideration during the layout
stage. In this paper, we propose to resolve the MinIA constraint
violations by a simultaneous detailed placement and threshold
voltage refinement approach. We present an optimal and efficient
mixed integer-linear programming (MILP)-based algorithm
which guarantees to fix all MinIA constraint violations.
Experimental results demonstrate that our algorithm only perturbs
the original placement and threshold voltage assignment
solutions minimally to eliminate all violations and is fast in practice. |
Title | (Invited Paper) High-Level Synthesis of Accelerators in Embedded Scalable Platforms |
Author | Paolo Mantovani, Giuseppe Di Guglielmo, *Luca P. Carloni (Columbia University, U.S.A.) |
Page | pp. 204 - 211 |
Keyword | SoC, system-level design, high-level synthesis, accelerators, embedded scalable plaftorms |
Abstract | Embedded scalable platforms combine a flexible socketed architecture for heterogeneous system-on-chip (SoC) design and a companion system-level design methodology. The architecture supports the rapid integration of processor cores with many specialized hardware accelerators. The methodology simplifies the design, integration, and programming of the heterogeneous components in the SoC. In particular, it raises the level of abstraction in the design process and guides designers in the application of high-level synthesis (HLS) tools. HLS enables a more efficient design of accelerators with a focus on their algorithmic properties, a broader exploration of their design space, and a more productive reuse across many different SoC projects. |
Title | (Invited Paper) High Quality IP Design using High-Level Synthesis Design Flow |
Author | *Qiang Zhu (Cadence Design Systems, Japan), Masato Tatsuoka (Socionext Inc., Japan) |
Page | pp. 212 - 217 |
Keyword | High Level Synthesis, IP designs, Physically Aware |
Abstract | In this paper we will describe practical experiences about the use of high-level synthesis technologies to achieve higher performance, higher quality, and lower power for IP designs as compared to traditional RTL design. We will demonstrate how the introduction of three key techniques, interface-based design, architectural exploration and congestion-aware high-level synthesis, were utilized to achieve higher quality IP designs. In real application results, we will show significantly better QoR (Quality-of-Results) using high-level synthesis than the traditional RTL design flow by utilizing the above three key technologies. |
Title | (Invited Paper) Designing High-Quality Hardware on a Development Effort Budget: A Study of the Current State of High-Level Synthesis |
Author | Zelei Sun, Keith Campbell, Wei Zuo (UIUC, U.S.A.), Kyle Rupnow, Swathi Gurumani (ADSC, Singapore), Frederic Doucet (Qualcomm, U.S.A.), *Deming Chen (UIUC, U.S.A.) |
Page | pp. 218 - 225 |
Keyword | High-level synthesis, evaluation, coding guidances, optimization, hardware design |
Abstract | High-level synthesis (HLS) promises high-quality hardware with minimal development effort. In this paper, we evaluate the current state-of-the-art in HLS and design techniques based on software references and architecture references. We present a software reference study developing a JPEG encoder from pre-existing software, and an architecture reference study developing an AES block encryption module from scratch in SystemC and SystemVerilog based on a desired architecture. Additionally, we develop micro-benchmarks to demonstrate best-practices in C coding styles that produce high-quality hardware with minimal development effort. Finally, we suggest language, tool, and methodology improvements to improve upon the current state-of-the-art in HLS. |
Title | Speed Binning With High-Quality Structural Patterns From Functional Timing Analysis (FTA) |
Author | *Louis Y.-Z. Lin, Charles H.-P. Wen (Dept. of Elec. Comp. Engr., National Chiao Tung University, Taiwan) |
Page | pp. 238 - 243 |
Keyword | speed-binning, FTA |
Abstract | The operating speed of a chip decides its price in the nanometer era. Thus, design companies require highquality speed binning to maximize their profits. The way they usually rely on is legacy (i.e. structural) tests for speed binning since functional tests are too expensive to derive. Besides legacy and functional tests, recent studies tried to apply the notion of delay testing for deriving speed-binning patterns; nevertheless, all of them could not determine the number of patterns in the meanwhile of taking process variation into consideration. Therefore, in this paper, we propose speed-binning pattern generation (SBPG) method to deterministically generate a high-quality pattern set for speed binning. This SPBG mainly consists of two core techniques: (1) empirical variation sampling (EVS) and (2) functional timing analysis (FTA), which efficiently derives few high-quality patterns from a small number of learning samples. Finally, in experimental results, SBPG achieves a satisfactory accuracy (> 99% on average) for five benchmark circuits under various conditions of process variation, and is shown to be an efficient solution for speed binning. |
Title | Electromigration Recovery Modeling and Analysis under Time-Dependent Current and Temperature Stressing |
Author | Xin Huang (University of California, Riverside, U.S.A.), Valeriy Sukharev (Mentor Graphics Corporation, U.S.A.), Taeyoung Kim (University of California, Riverside, U.S.A.), Haibao Chen (Shanghai Jiao Tong University, China), *Sheldon X.-D. Tan (University of California, Riverside, U.S.A.) |
Page | pp. 244 - 249 |
Keyword | EM, reliability, recovery, analytical model |
Abstract | Electromigration (EM) has been considered to be the major reliability issue for current and future VLSI technologies. Current EM reliability analysis is overloaded by over-conservative and simplified EM models. Particularly the transient recovery effect in the EM-induced stress evolution kinetics has never been treated properly in all the existing analytical EM models. In this article, we propose a new physics-based dynamic compact EM model, which for the first time, can accurately predict the transient hydrostatic stress recovery effect in a confined metal wire. The new dynamic EM model is based on the direct analytical solution of one-dimensional Korhonen’s equation with load driven by any unipolar or bipolar current waveforms under varying temperature. We show that the EM recovery effect can be quite significant even under unidirectional current loads. This healing process is sensitive to temperature, and higher temperatures lead to faster and more complete recovery. Such effect can be further exploited to significantly extend the lifetime of the interconnect wires if the chip current or power can be properly regulated and managed. As a result, the new dynamic EM model can be incorporated with existing dynamic thermal/power/reliability management and optimization approaches, devoted to reliability-aware optimization at multiple system levels (chip/server/rack/data centers). Presented results show that the proposed EM model agrees very well with the numerical analysis results under any time-varying current density and temperature profiles. |
Title | A Novel Low-Cost Dynamic Logic Reconfigurable Structure Strategy for Low Power Optimization |
Author | *Yu-Guang Chen, Wan-Yu Wen, Yun-Ting Wang, You-Luen Lee, Shih-Chieh Chang (National Tsing Hua University, Taiwan) |
Page | pp. 250 - 255 |
Keyword | DVFS, Low Power Design, Dynamic Logic Reconfigurable Structure |
Abstract | Low power design techniques have been extensively applied in modern IC designs to avoid negative side effects from high power density. Unlike Dynamic Voltage and/or Frequency Scaling (DVFS) approaches only applied on a “fixed” design, we propose a dynamic logic reconfigurable structure strategy which allows dynamic switching from a high speed/power logic structure to a low speed/power logic structure. A design with such configurable structure is called Dynamic Logic Reconfigurable Structure (DLRS). Different from approximate computing which trades off between computation accuracy and power, our DLRS designs maintain data integrity. In this paper, we propose novel low-cost DLRS adders and multipliers, and a comprehensive framework for low power designs. We further integrate DLRS with DVFS, which creates more flexibility to trade-off between performance and power consumption. Experimental results show that with DLRS adders and multipliers in three indoor designs, the proposed method can achieve up to 60.05% power reduction compared with traditional DVFS scheme with only 6.55% area overhead. |
Title | An Energy-Efficient Random Number Generator for Stochastic Circuits |
Author | *Kyounghoon Kim (Seoul National University, Republic of Korea), Jongeun Lee (UNIST, Republic of Korea), Kiyoung Choi (Seoul National University, Republic of Korea) |
Page | pp. 256 - 261 |
Keyword | Stochastic computing, stochastic number generator, energy-efficient design, approximate computing |
Abstract | Stochastic circuits provide very high efficiency in terms of gate area and power consumption compared with conventional binary logic. However, they require random bit streams generated by stochastic number generators (SNGs), which account for a significant portion of area and energy offsetting their merits. In this paper, we propose a new SNG that significantly reduces area and energy while improving accuracy in progressive precision. Experimental results show that the proposed SNG reduces energy by more than 72% compared to the state-of-the-art designs. |
Title | Design of an All-Digital Temperature Sensor in 28 nm CMOS Using Temperature-Sensitive Delay Cells and Adaptive-1P Calibration for Error Reduction |
Author | Shang-Yi Li, *Pei-Yuan Chou, Jinn-Shyan Wang (Chung-Cheng University, Taiwan) |
Page | pp. 262 - 267 |
Keyword | temperature sensor, all digital, calibration, zero temperature coefficient, process variation |
Abstract | We describe design techniques, calibration method, and measurement results of an all-digital temperature sensor in 28 nm CMOS. To deal with the issue of Vcc being near the zero-temperature-coefficient point, a new delay cell with much improved temperature sensitivity is proposed. Adaptive 1-point (1P) calibration is proposed to reduce the serious impact due to process variations, while without increasing the calibration cost. Measurement results show that, compared to the conventional 1P calibration, the new method achieves a 32% error reduction. |
Title | Design and Allocation of Loosely Coupled Multi-bit Flip-flops for Power Reduction in Post-Placement Optimization |
Author | *Hyoungseok Moon, Taewhan Kim (Seoul National University, Republic of Korea) |
Page | pp. 268 - 273 |
Keyword | Flip-flop allocation, clock power, Post-placement |
Abstract | Recently, allocating multi-bit flip-flops (MBFFs) as opposed to 1-bit flip-flops has been recognized as one of effective design optimization techniques to reduce clock power. This work tries to eliminate timing and area constraints so that a full benefit of multi-bit flip-flops can be reaped. Precisely, rather than using the conventional structure of multi-bit flip-flops, we introduce a new style of multi-bit flip-flop, called loosely coupled multi-bit flip-flop (LC-MBFF). Utilizing LC-MBFFs, we propose a routability and clock-tree driven multi-bit flip-flop allocation algorithm, which fully explores the diverse allocation of LC-MBFF structures to maximally reduce clock power consumption. |
Title | Thermal Optimization for Memristor-Based Hybrid Neuromorphic Computing Systems |
Author | Chi-Ruo Wu (National Cheng Kung University, Taiwan), Wei Wen (University of Pittsburgh, U.S.A.), *Tsung-Yi Ho (National Tsing Hua University, Taiwan), Yiran Chen (University of Pittsburgh, U.S.A.) |
Page | pp. 274 - 279 |
Keyword | Neuromorphic, Memristor, Thermal |
Abstract | Neuromorphic computing is used for accelerating
the computation of neural network which can simulate the brain
of animal and composed by neurons and synapses. However, the
neuromorphic computing with the traditional computer architecture
leads to serious von Neumann bottleneck because of the gap
between high frequency CPU computation and memory access.
The emerging memristor is an innovation technology for future
VLSI circuits potentially can be acted as both data storage and
computing unit to transform the computer architecture. Furthermore,
the characteristics of memristors include low programming
energy, parallel process, small footprint, non-volatility, etc, which
have attracted significant researches on neuromorphic computing.
However, some important issues such as thermal damage defect
the reliability of memristors. High thermal of memristor is a critical
issue which impacts the reliability of the systems. To estimate
the thermal of the memristor, we formulated the thermal as the
power consumption problem. In this paper, a thermal optimization
algorithm for memristor-based hybrid neuromorphic computing
system is proposed to solve the the reliability issue by the
incremental cluster network flow. Our results show that the maximum
power consumption can be reduced about 31%. |
Title | An Energy-efficient Matrix Multiplication Accelerator by Distributed In-memory Computing on Binary RRAM Crossbar |
Author | *Leibin Ni, Yuhao Wang, Hao Yu (Nanyang Technological University, Singapore), Wei Yang, Chuliang Weng, Junfeng Zhao (Shannon Laboratory, Huawei Technologies Co., Ltd, China) |
Page | pp. 280 - 285 |
Keyword | RRAM, In-memory architecture |
Abstract | Emerging resistive random-access memory (RRAM)
can provide non-volatile memory storage but also intrinsic logic
for matrix-vector multiplication, which is ideal for low-power
and high-throughput data analytics accelerator performed in
memory. However, the existing RRAM-based computing device
is mainly assumed on a multi-level analog computing, whose
result is sensitive to process non-uniformity as well as additional
AD- conversion and I/O overhead. This paper explores the data
analytics accelerator on binary RRAM-crossbar. Accordingly,
one distributed in-memory computing architecture is proposed
with design of according component and control protocol. Both
memory array and logic accelerator can be implemented by
RRAM-crossbar purely in binary, where logic-memory pairs can
be distributed with protocol of control bus. Based on numerical
results for fingerprint matching that is mapped on the proposed
RRAM-crossbar, the proposed architecture has shown 2.86x
faster speed, 154x better energy efficiency, and 100x smaller area
when compared to the same design by CMOS-based ASIC. |
Title | A Racetrack Memory Based In-memory Booth Multiplier for Cryptography Application |
Author | *Tao Luo (Nanyang Technological University, Singapore), Wei Zhang (Hong Kong University of Science and Technology, Hong Kong), Bingsheng He, Douglas Maskell (Nanyang Technological University, Singapore) |
Page | pp. 286 - 291 |
Keyword | Racetrack memory, RSA, Multiplier, Adder |
Abstract | Security is an important concern in cloud comput- ing nowadays. RSA is one of the most popular asymmetric encryption algorithms that are widely used in internet based applications for its public key strategy advantage over symmetric encryption algorithms. However, RSA encryption algorithm is very compute intensive, which would affect the speed and power efficiency of the encountered applications. Racetrack Memory (RM) is a newly introduced promising technology in future storage and memory system, which is perfect to be used in memory intensive scenarios because of its high data density. However, novel designs should be applied to exploit the advantages of RM while avoiding the adverse impact of its sequential access mechanism. In this paper, we present an in-memory Booth multiplier based on racetrack memory to alleviate this problem. As the building block of our multiplier, a racetrack memory based adder is proposed, which saves 56.3% power compared with the state-of-the-art magnetic adder. Integrated with the storage element, our proposed multiplier shows great efficiency in area, power and scalability. |
Title | Look-ahead Schemes for Nearest Neighbor Optimization of 1D and 2D Quantum Circuits |
Author | *Robert Wille (Johannes Kepler University Linz, Austria), Oliver Keszocze (DFKI GmbH, Germany), Marcel Walter, Patrick Rohrs (University of Bremen, Germany), Anupam Chattopadhyay (Nanyang Technological University, Singapore), Rolf Drechsler (University of Bremen, Germany) |
Page | pp. 292 - 297 |
Keyword | quantum circuits, nearest neighbor, technology mapping |
Abstract | Ensuring nearest neighbor compliance of quantum circuits by inserting SWAP gates has heavily been considered in the past. Here, quantum gates are considered which work on non-adjacent qubits. SWAP gates are applied in order to “move” these qubits onto adjacent positions. However, a decision how exactly the SWAPs are “moved” has mainly been made without considering the effect a “movement” of qubits may have on the remaining circuit. In this work, we propose a methodology for nearest neighbor optimization which addresses this problem by means of a look-ahead scheme. To this end, two representative implementations are presented and discussed in detail. Experimental evaluations show that, in the best case, reductions in the number of SWAP gates of 56% (compared to the state-of-the-art methods) can be achieved following the proposed methodology. |
Title | (Keynote Address) Systems of Systems - The Next Frontier of Semiconductor |
Author | *Qi Wang (Cadence Design Systems, Inc., U.S.A.) |
Keyword | Keynote |
Abstract | Most of today’s most exciting new electronic products are not single-function, standalone devices, but rather are multi-function system devices, composed of subsystems, and connected into even larger systems. Being at the core of any electronic system, the semiconductor technology is going through a sea change where tackling the traditional semiconductor issues such as timing, power, and performance becomes insufficient. Additional challenges include time-to-market, functional partitioning, communications protocols, IP selection, hardware-software verification, reliability, safety, and many others. In this presentation, the presenter will summarize the design challenges and highlight some solutions for system design enablement in this increasingly complex environment. |
Title | (Invited Paper) Energy-Efficient System Design for IoT Devices |
Author | Hrishikesh Jayakumar, Arnab Raha, Younghyun Kim, Soubhagya Sutar, Woo Suk Lee, *Vijay Raghunathan (Purdue University, U.S.A.) |
Page | pp. 298 - 301 |
Keyword | Internet of Things, IoT, low power design, energy efficient design |
Abstract | It is projected that, within the coming decade, there will be more than 50 billion smart objects connected to the Internet of Things (IoT). These smart objects, which connect the physical world with the world of computing infrastructure, are expected to pervade all aspects of our daily lives and revolutionize a number of application domains such as healthcare, energy conservation, transportation, etc. In this paper, we present an overview of the challenges involved in designing energy-efficient IoT edge devices and describe recent research that has proposed promising solutions to address these challenges. First, we outline the challenges involved in efficiently supplying power to an IoT device. Next, we discuss the role of emerging memory technologies in making IoT devices energy-efficient. Finally, we discuss the potential impact that approximate computing can have in increasing the energy-efficiency of wearables and other compute- intensive IoT devices. |
Title | (Invited Paper) Energy Delivery for Self-Powered IoT Devices |
Author | Khondker Z. Ahmed, Monodeep Kar, *Saibal Mukhopadhyay (Georgia Tech, U.S.A.) |
Page | pp. 302 - 307 |
Keyword | IoT power delivery, bias gating, high conversion ratio, boost, buck |
Abstract | Distributed small-scale electronics for IoT applications
are on the rise. Power delivery for such electronics requires
innovative design techniques to improve energy efficiency. This
paper summarizes energy delivery challenges for IoT devices and
discusses several design techniques for efficient power delivery
units. Such design solutions cover challenges like energy
harvesting from very low input voltage, maximized energy
harvesting, energy delivery with multiple voltage domains and
design using low voltage devices to sustain higher than
breakdown voltages. |
Title | (Invited Paper) Efficient Embedded Learning for IoT Devices |
Author | Swagath Venkataramani, Kaushik Roy, *Anand Raghunathan (Purdue University, U.S.A.) |
Page | pp. 308 - 311 |
Keyword | Internet of things, Machine learning, Accelerators, Approximate computing, Spintronic Devices |
Abstract | The pervasiveness of IoT devices will usher an unprecedented growth in the amount of digital data produced and consumed. Realizing the rich class of applications enabled by IoT devices requires large-scale machine learning systems to make sense of the raw data and derive meaningful, actionable information. State-of-the-art machine learning algorithms are highly compute and data intensive, posing significant computational challenges across the spectrum of computing devices, from low-power client devices to the cloud. As benefits due to semiconductor technology scaling diminish, addressing the computational gap requires identifying new sources of computing efficiency. In this paper, we highlight 3 approaches viz. machine learning accelerators, approximate computing and post-CMOS technologies that demonstrate significant promise in bridging the efficiency gap. |
Title | (Invited Paper) Computing with Coupled Spin Torque Nano Oscillators |
Author | Karthik Yogendra (Purdue University, U.S.A.), Deliang Fan (Univerisity of Central Florida, U.S.A.), Yong Shim, Minsuk Koo, *Kaushik Roy (Purdue University, U.S.A.) |
Page | pp. 312 - 317 |
Keyword | coupled oscillators, frequency locking, non-Boolean computation, spin torque nano oscillators, spin transfer torque |
Abstract | This paper gives an overview of coupled oscillators and how such oscillators can be efficiently used to perform computations that are unsuitable or inefficient in von-Neumann computing models. The “unconventional computing” ability of coupled oscillatory system is demonstrated through Spin Torque Nano Oscillators (STNOs). Recent experiments on STNOs have demonstrated their frequency of oscillation in few tens of gigahertz range, operating at low input currents. These attractive features and the ability to obtain frequency locking using a variety of techniques, make STNOs an attractive candidate for non-Boolean computing. We discuss coupled STNO systems for applications such as edge detection of an image, associative computing, determination of L2 norm for distance calculation, and pattern recognition. |
Title | ApproxMap: On Task Allocation and Scheduling for Resilient Applications |
Author | *Juan Yi (Chongqing University, China), Qian Zhang, Ye Tian, Ting Wang (The Chinese University of Hong Kong, China), Weichen Liu, Edwin H.-M. Sha (Chongqing University, China), Qiang Xu (The Chinese University of Hong Kong, China) |
Page | pp. 318 - 323 |
Keyword | multiprocessors, resilient application, scheduling, energy efficiency |
Abstract | Many emerging applications are inherently error-resilient and hence do not require exact computation. In this paper, we consider the task allocation and scheduling problem for mapping such applications to voltage-scalable multiprocessor systems. The proposed solution, namely ApproxMap, judiciously determines the mapping and execution sequence of resilient tasks to minimize the energy consumption of the application while meeting their target quality requirements and timing constraints. To be specific, ApproxMap generates energy-efficient yet flexible task schedule at design-time, and conducts lightweight online adjustment according to runtime dynamics for further energye-fficiency improvement. Experimental results on various task graphs demonstrate the efficacy of ApproxMap. |
Title | Energy Optimization of Stochastic Applications with Statistical Guarantees of Deadline and Reliability |
Author | *Xiong Pan, Wei Jiang (University of Electronic Science and Technology of China, China), Ke Jiang (Linköping University, Sweden), Liang Wen, Qi Dong (University of Electronic Science and Technology of China, China) |
Page | pp. 324 - 329 |
Keyword | Energy, Reliability, Soft Real-time, Stochastic task, System-level design |
Abstract | In this paper, we target on energy-efficient design of soft real-time and reliable applications on uniprocessor embedded systems. We consider soft real-time tasks with stochastic execution times with given distribution. Instead of guaranteeing hard real-time constraint, the application may be finished after their deadlines with a certain probability. We utilize Dynamic Voltage and Frequency Scaling (DVFS) to save energy, and also take into account of the impact of DVFS on reliability. Our objective is to minimize the expected energy consumption of the system subject to statistical reliability and deadline constraints. Due to the huge complexity of solving the problem exactly, we develop a fast bi-search approach based on dynamic programming, which can find the near-optimal solution with energy cost at most (1+β) times of the optimal energy and has polynomial time complexity. Extensive experiments and a real-life application were conducted to evaluate the efficiency of the proposed techniques. |
Title | Optimization of Behavioral IPs in Multi-Processor System-on-Chips |
Author | Yidi Liu, *Benjamin Carrion Schafer (the Hong Kong Polytechnic University, Hong Kong) |
Page | pp. 336 - 341 |
Keyword | high level synthesis, multi-processor SoC, behavioral IP, MPSoC |
Abstract | This work shows that behavioral IPs (BIPs) are often over-designed when used in heterogeneous Multi-Processor SoCs (MPSoCs) mainly because they are designed and optimized separately. When inserted in the MPSoC, these IPs often have to wait for data from the master and also access to the bus to return the results. Behavioral IPs have the advantage over traditional RTL-based IPs that they can be re-synthesized with different constraints, which allows the generation of micro-architectures with unique area vs. performance trade-off. This work leverages this and introduces a method to automatically identify the workload of each behavioral IP mapped as a slave on an MPSoC system and re-synthesizes it to maximize its efficiency, i.e. reduce its area and minimize its idle time, without affecting the overall performance. We show the area can be reduced by up to 26.1% compared to the fastest implementation without any performance degradation and on average by 13.21%. Compared to an exhaustive search our method is only on average 5% worse while on average 16x faster. |
Title | A Novel PUF based on Cell Error Rate Distribution of STT-RAM |
Author | *Xian Zhang, Guangyu Sun (Peking University, China), Yaojun Zhang, Yiran Chen, Hai Li (University of Pittsburgh, U.S.A.), Wujie Wen (Florida International University, U.S.A.), Jia Di (University of Arkansas, U.S.A.) |
Page | pp. 342 - 347 |
Keyword | PUF, STTRAM, Spintronic |
Abstract | Physical Unclonable Functions (PUFs) have been widely proposed as security primitives to provide device iden- tification and authentication. Recently, PUFs based on Non- volatile Memory (NVM) are widely proposed since the promise of NVMs’ wide application. In addition, NVM-based PUFs are considered to be more immune to invasive attack and simulation attack than CMOS-based PUFs. However, the existing NVM- based PUF either shows the unreliability under environmental variations or need extra modifications to the IC manufacturing process. In this work, we propose err-PUF, a novel PUF design based on the cell error rate distribution of STT-RAM. Instead of using the distribution directly, we generate a stable finger- print based on a novel concept called Error-rate Differential Pair (EDP) without modifications to the read/write circuits. Comprehensive results demonstrate that err-PUF can achieve sufficient reliability under environmental variations, which can significantly impact the cell error rates. Moreover, compared with existing approaches, err-PUF has a higher speed and lower power consumption with negligible overhead. |
Title | Data Privacy in Non-Volatile Cache: Challenges, Attack Models and Solutions |
Author | Nitin Rathi, *Swaroop Ghosh, Anirudh Iyengar (University of South Florida, U.S.A.), Helia Naeimi (Intel Labs, U.S.A.) |
Page | pp. 348 - 353 |
Keyword | Nonvolatile cache memory, Data Privacy, Attack model, Architecture |
Abstract | Non-volatile memories (NVMs) have drawn significant attention due to complete elimination of bitcell leakage. Among the NVMs, Spin-Transfer-Torque RAM (STTRAM) is considered to be a strong candidate for last level cache (LLC). Although promising STTRAM LLC brings new security challenges that were absent in conventional volatile memories such as Static RAM (SRAM). The root cause is persistent data and the fundamental dependency of the memory technology on ambient parameters such as magnetic field and temperature that can be exploited to compromise the data. We provide a qualitative analysis of the data privacy issues in the emerging nonvolatile cache. We also propose new attack models to compromise the sensitive data in LLC. The encryption technique used to secure the data in main memory and hard disk may not be useful for LLC due to latency overhead. We propose two low-overhead techniques to ensure data privacy in LLC- (a) implementing semi nonvolatile memory (SNVM); and, (b) data erasure at power OFF. Erasing could be energy intensive and may require dedicated battery to work under power failure attacks. To address this concern we reuse the energy stored in power rail after power OFF to erase the bits using a canary circuit to track MTJ write time. The simulation results show 0.6% IPC loss and 1.2% energy overhead during normal operation due to added circuitry. |
Title | Pin Tumbler Lock: A Shift based Encryption Mechanism for Racetrack Memory |
Author | *Hongbin Zhang (Tsinghua University, China), Chao Zhang, Xian Zhang, Guangyu Sun (Peking University, China), Jiwu Shu (Tsinghua University, China) |
Page | pp. 354 - 359 |
Keyword | Racetrack Memory, Encryption, NVM |
Abstract | As various non-volatile memory (NVM) technologies have been adopted in different levels of memory hierarchy, the security issue of protecting information retained in NVM after power-off has become a new challenge, which results in extensive research on data encryption for NVM. Previous encryption approaches, however, have some limitations, such as high design complexity and non-trivial timing and energy overhead. Recently, an emerging NVM called racetrack memory (RM) has been widely investigated because of its advantages of ultra-high storage density
and fast read/write speed. Besides these well-known advantages, we observe that the tape-like structure of RM cell and its unique shift operation can also be leveraged to facilitate NVM data encryption. Base on this observation, we propose an efficient shift based mechanism, named Pin Tumbler Lock (PTL), which completes encryption and decryption by shifting racetracks in several nanoseconds. Experimental results demonstrate that our design can achieve the same security strength of AES-128 with 3.1% performance overhead and 3.7% energy overhead and 1.56% storage cost and 1.6% area cost. |
Title | Routing Path Reuse Maximization for Efficient NV-FPGA Reconfiguration |
Author | Yuan Xue, Patrick Cronin, *Chengmo Yang (University of Delaware, U.S.A.), Jingtong Hu (Oklahoma State University, U.S.A.) |
Page | pp. 360 - 365 |
Keyword | NVM-based FPGA, reuse-aware routing, switch-box reconfiguration |
Abstract | Non-volatile memory-based FPGAs (NV-FPGAs) are expecting to replace traditional SRAM-based FPGAs to achieve higher scalability and lower power consumption. Yet the slow write performance of NVMs challenges FPGA (re)configuration speed and overhead. To efficiently configure switch boxes, this paper proposes a routing path reuse technique. Technical contributions include a mathematical reconfiguration cost model of routing resources, a reuse-aware routing scheme, as well as the incorporation of the proposed scheme into standard VTR CAD tool. |
Title | Dynamic Planning of Local Congestion from Varying-Size Vias for Global Routing Layer Assignment |
Author | Daohang Shi, Edward Tashjian, *Azadeh Davoodi (University of Wisconsin-Madison, U.S.A.) |
Page | pp. 372 - 377 |
Keyword | global routing, layer assignment, local congestion, detailed routing, via modeling |
Abstract | This work is the first to present global routing models for
capturing the impact of local congestion caused by varying-size vias.
The models are then incorporated to dynamically drive a proposed layer
assignment algorithm. This is also the first work to actually evaluate
the impact of global routing solutions using a commercial detailed router. In our experiments we report fewer number of DRC violations by only changing the layer assignment at global routing, and detailed route using the Olympus-SoC of Mentor Graphics. |
Title | Negotiation-Based Track Assignment Considering Local Nets |
Author | *Man-Pan Wong (National Tsing Hua University, Taiwan), Wen-Hao Liu (Cadence Design Systems Inc., U.S.A.), Ting-Chi Wang (National Tsing Hua University, Taiwan) |
Page | pp. 378 - 383 |
Keyword | Routability, congestion, track assignment |
Abstract | Routability has become a very challenging issue in a modern VLSI design flow. Many works use global routing to estimate the routability in early design stages. However, global routing cannot accurately capture local congestion, so it is hard to detect the detailed routability issue. To more accurately estimate the detailed-routing routability, this paper presents a track-assignment-based routability estimator. In this work, wire segments called iroutes are extracted from a global routing result, and then the proposed negotiation-based algorithm assigns these iroutes to proper tracks and minimizes the overlaps between the iroutes. Based on the assignment result, we can judge which regions may have critical routability issues by seeing where more overlaps reside. |
Title | Ordered Escape Routing for Grid Pin Array Based on Min-cost Multi-commodity Flow |
Author | *Fengxian Jiao, Sheqin Dong (Tsinghua University, China) |
Page | pp. 384 - 389 |
Keyword | PCB Routing, Ordered Escape Routing, Min-cost Multi-commodity flow |
Abstract | Ordered Escape routing is a critical issue in high-speed PCB routing. In this paper, for the first time, a Min-cost Multi-commodity Flow (MMCF) approach is proposed to solve the ordered escape routing. The characteristic of grid pin array is analyzed and then a basic network model is used to convert ordered escape routing to MMCF model. To satisfy the constraints of ordered escape routing, three novel transformations, such as non-crossing transformation, ordering transformation and capacity transformation, are used to convert the basic network model to the final correct MMCF model. Experimental results show that our method achieves 100% routability for all the test cases. The method can get both a feasible solution and an optimal solution. Compared to published approaches, our method improves in both wire length and CPU time remarkably. |
Title | (Invited Paper) Efficient Reliability Management in SoCs – An Approximate DRAM Perspective |
Author | Matthias Jung, Deepak M. Mathew, Christian Weis, *Norbert Wehn (University of Kaiserslautern, Germany) |
Page | pp. 390 - 394 |
Keyword | Approximate, DRAM, Reliability, Refresh, Memory |
Abstract | In today's computing systems Dynamic Random Access Memories (DRAMs) have a large influence on performance and contribute significantly to the total power consumption. Thus, recent research activities bring the idea of approximate DRAM into focus to save power and improve performance by lowering the refresh rate or disabling refresh completely. Hence, fast and accurate models are required for a thoroughly exploration of approximate DRAM for error resilient applications. In this paper we present a holistic simulation environment for investigations on approximate DRAM and show the impact on error resilient applications. |
Title | (Invited Paper) Cross-layer Virtual/Physical Sensing and Actuation for Resilient Heterogeneous Many-core SoCs |
Author | *Santanu Sarma, Tiago Mück, Majid Shoushtari, Abbas BanaiyanMofrad, Nikil Dutt (UC Irvine, U.S.A.) |
Page | pp. 395 - 402 |
Keyword | cross-layer, virtual sensor, SoC, CPSoC, MPSoC |
Abstract | We introduce the concepts of cross-layer virtual/physical sensing and actuation to achieve resiliency for the emerging class of heterogeneous many-core Systems-on-Chip (SoCs). Using the CyberPhysical System-on-Chip (CPSoC) concept as an exemplar sensor-rich many-core heterogeneous computing platform, we illustrate how to intrinsically couple on-chip and cross-layer physical and virtual sensing and actuation applied across different layers of the hardware/software system stack to adaptively achieve desired objectives and Quality-of-Service (QoS). We present two sample use cases that exemplify the cross-layer virtual/physical sensing and actuation approach. First, we present SmartBalance, a cross-layer sensing-driven Linux load balancer for energy efficient task execution on heterogeneous MPSoCs. Second, we present “Partially Forgetful Memories”, a software/hardware approach that achieves dynamic memory guard-banding for memory resilience and its application for approximate computing. |
Title | (Invited Paper) On-chip Monitoring and Compensation Scheme with Fine-grain Body Biasing for Robust and Energy-Efficient Operations |
Author | A.K.M. Mahfuzul Islam (University of Tokyo, Japan), *Hidetoshi Onodera (Kyoto University, Japan) |
Page | pp. 403 - 409 |
Keyword | Energy Optimization, Compensation, Monitor, Body Biasing, Scaling |
Abstract | Aggressive technology scaling and strong demand for lowering supply voltage impose a serious challenge in achieving robust and energy-efficient circuit operation. This paper first overviews on device-circuit interactions to enable cross-layer resiliency, and energy optimization. We show that the ability to monitor and control device and circuit characteristics not only in- crease energy-efficiency by more than 20% but also relax the severe design constraints, which were required because of the uncertainties of variability. We then demonstrate two proof-of-concept circuits in a 65 nm process to show variability resiliency and energy optimization with local body biasing. |
Title | (Invited Paper) Embedded Software Reliability Testing by Unit-Level Fault Injection |
Author | Petra R. Maier, Daniel Mueller-Gritschneder, Ulf Schlichtmann (TU Munich, Germany), *Veit B. Kleeberger (Infineon Technologies, Germany) |
Page | pp. 410 - 416 |
Keyword | Reliability, Embedded Software, ISO 26262 |
Abstract | Decreasing device sizes in integrated circuits lead to increasing vulnerability of hardware to errors resulting from radiation, crosstalk or power-supply disturbances. Especially in the automotive domain many tasks of electronics are safety relevant, so that solid error detection and correction is imperative. However, completely safe hardware is too expensive for the cost sensitive automotive market. Hence, software safety mechanisms must deal with errors originating from hardware to ensure safe system behavior. To verify safe system behavior under the influence of hardware errors, fault injection is currently done at integration level, but software redesign at this design stage should be avoided due to high costs. To early detect code vulnerable to hardware errors, we propose fault injection at unit level. Thanks to short simulation scenarios and good parallelization capability, even exhaustive fault injection is possible for multiple representative workloads. Using the results from the fault-injection campaigns, the software designer is able to consider reliability during the implementation phase and avoid costly redesigns. |
Title | (Invited Paper) Thermal Modeling for Energy-Efficient Smart Building With Advanced Overfitting Mitigation Technique |
Author | Wandi Liu, Hai Wang (University of Electronic Science and Technology of China, China), Hengyang Zhao, Shujuan Wang (University of California at Riverside, U.S.A.), Haibao Chen, Yuzhuo Fu (Shanghai Jiaotong University, China), Jian Ma (University of Electronic Science and Technology of China, China), Xin Li (Carnegie Mellon University, U.S.A.), *Sheldon X.-D. Tan (University of California at Riverside, U.S.A.) |
Page | pp. 417 - 422 |
Keyword | Thermal Modeling, Smart Building |
Abstract | Building energy accounts large amount of the total energy consumption, and smart building energy control leads to high energy efficiency and significant energy savings. A compact and accurate building thermal model is important for designing the efficient energy control system. In this paper, we propose an accurate thermal behavior modeling technique for general and complicated buildings. This new modeling technique builds compact thermal model by system identification using temperature and power data obtained from EnergyPlus software, which can provide realistic temperature, weather and power data for buildings. In order to make the best use of data from EnergyPlus and avoid the overfitting problem associated with the system identificatoin method, a cross-validation technique is employed to generate multiple thermal models to find the optimal model order. The final model is then generated by performing a regular system identification using the previously selected order. Experimental results from a case study of a 5-zone building have shown that the proposed method is able to find the optimal model order, and the building models built by the proposed method can achieve 1-3% average errors and less than 10-18% maximum errors for the estimation of zone temperatures for about a one year period. |
Title | (Invited Paper) Modeling, Analysis, and Optimization of Electric Vehicle HVAC Systems |
Author | *Mohammad Abdullah Al Faruque, Korosh Vatanparvar (UC Irvine, U.S.A.) |
Page | pp. 423 - 428 |
Keyword | Electric Vehicle, Battery, HVAC, Climate Control |
Abstract | Major challenges of driving range and battery lifetime in Electric Vehicles (EV) have been addressed by designing more efficient power electronics, advanced embedded hardware, and sophisticated embedded software. Besides the electric motor in EVs, Heating, Ventilation, and Air Conditioning (HVAC) has been seen as a significant contributor to the EV power consumption. The main responsibility of automotive climate controls has been to control the HVAC system in order to maintain the passengers’ thermal comfort. However, the HVAC power consumption and its dynamic behavior may influence the battery lifetime and driving range significantly. Therefore, modeling and analyzing the HVAC system and its thermodynamic behavior may benefit the control designers to integrate the HVAC control and optimization into Battery Management Systems (BMS) for better battery lifetime and driving range. In this paper, the EV architecture, HVAC system dynamic behavior, and battery characteristics are explained and modeled. Automotive climate controls (e.g. battery lifetime-aware automotive climate control) and the benefits gained by system modeling and estimation for different conditions in terms of battery lifetime and driving range are illustrated. Moreover, present and future challenges regarding the HVAC system and control design are explained. |
Title | (Invited Paper) Distributed Reconfigurable Battery System Management Architectures |
Author | *Sebastian Steinhorst (TUM CREATE Ltd., Singapore), Zili Shao (The Hong Kong Polytechnic University, Hong Kong), Samarjit Chakraborty (TU Munich, Germany), Matthias Kauer (TUM CREATE Ltd., Singapore), Shuai Li (The Hong Kong Polytechnic University, Hong Kong), Martin Lukasiewycz, Swaminathan Narayanaswamy (TUM CREATE Ltd., Singapore), Muhammad Usman Rafique, Qixin Wang (The Hong Kong Polytechnic University, Hong Kong) |
Page | pp. 429 - 434 |
Keyword | Battery System Management Architectures (BSMAs), Lithium-Ion Batteries, Smart Cells, Battery Management, Reconfigurability |
Abstract | This paper presents an overview of recent trends
in Battery System Management Architectures (BSMAs). After
introducing the main characteristics of large battery packs, the
state of the art in BSMAs is discussed. Two emerging concepts
are in the focus of this contribution. On the one hand, there is a
development from centralized battery management architectures
with a single control entity towards decentralized management
where the computational resources are distributed across the
battery pack and, hence, move closer to the individual battery
cells. This enables a more scalable and modular battery system
architecture, while, at the same time, posing challenges regarding
hardware and management algorithm design. On the other hand,
the static setup of the series- and parallel-connected cells forming
the battery pack may be developed towards a reconfigurable
architecture such that the electrical topology of the pack can
be adaptively changed. Such reconfigurability could increase the
reliability of battery packs and reduce management efforts such
as cell balancing. At the same time, limited energy efficiency
of the additional hardware poses a challenge. We give an
outlook how these two trends could be combined into distributed
reconfigurable BSMAs. This introduces a set of challenges which
have to be solved in order to benefit from the increased scalability,
reliability and safety such designs could offer. |
Title | (Invited Paper) Minimum-Energy Driving Speed Profiles for Low-Speed Electric Vehicles |
Author | Donkyu Baek, Joonki Hong, *Naehyuck Chang (KAIST, Republic of Korea) |
Page | p. 435 |
Keyword | Driving optimization, Electric vehicles, Speed profile |
Abstract | Electric vehicles (EV) are rapidly invading the previous internal combustion engine vehicle (ICEV) market introducing not only environmental friendliness and a higher efficiency but a better ride quality, comfortness and performance. However, there still remain factors that the EV cannot reach the territory of ICEV such as a limited fully charged driving range per vehicle cost due to a low energy density of batteries compared with petroleum fuel. We formulate an optimization problem that minimizes the total energy consumption for a given route that consists of arbitrary slope variations. |
Title | Multi-version Checkpointing for Flash File Systems |
Author | *Shih-Chun Chou (Department of Computer Science and Information Engineering, National Taiwan University, Taiwan), Yuan-Hao Chang, Yuan-Hung Kuan (Institute of Information Science, Academia Sinica, Taiwan), Po-Chun Huang (Department of Computer Science and Engineering, Yuan Ze University, Taiwan), Che-Wei Tsao (Department of Computer Science and Information Engineering, National Taiwan University, Taiwan) |
Page | pp. 436 - 443 |
Keyword | file system, multi-version checkpointing, reliability |
Abstract | Reliability has become a critical design issue in flash storage systems, because of the adoption of the low-cost, high-error-rate flash chips to fulfill the needs of the fast-growing storage capacity. In this paper, a multi-version checkpointing strategy is proposed to resolve the reliability issue of flash storage systems from the perspective of flash file systems. The proposed strategy can efficiently and effectively utilize checkpoints of file systems to guarantee the integrity and consistency of flash file systems after files or flash pages are corrupted. By utilizing the coexistence fact of multiple versions of the same data in flash memory, a control/recovery mechanism is presented to maintain checkpoints and to recover file systems with minimized management and recovery time overheads. A series of experiments was conducted based on realistic traces that were collected from benchmarks running over flash file systems in Linux operating systems. The results illustrate that the proposed strategy can significantly improve the reliability of flash file systems, as compared with other existing designs. |
Title | Relay-based Key Management to Support Secure Deletion for Resource-Constrained Flash-Memory Storage Devices |
Author | Wei-Lin Wang (Department of Computer Science, National Tsing Hua University, Taiwan), Yuan-Hao Chang (Institute of Information Science, Academia Sinica, Taiwan), *Po-Chun Huang (Department of Computer Science and Engineering, Yuan Ze University, Taiwan), Chia-Heng Tu (Smart Network System Institute, Institute for Information Industry, Taiwan), Hsin-Wen Wei (Department of Electrical Engineering, Tamkang University, Taiwan), Wei-Kuan Shih (Department of Computer Science, National Tsing Hua University, Taiwan) |
Page | pp. 444 - 449 |
Keyword | reliability, flash memory, key management, secure deletion |
Abstract | The support of secure deletion on formatting a file system is to make sure that when a file system is formatted, there is no way to get any file content back again. Due to the fast-growing storage capacity, the performance of secure deletion to file systems on resource-constrained flash storage devices has become a critical issue. In contrast to the existing works that take a long time on overwriting/resetting all the file contents of a file system, we propose an efficient secure deletion scheme to securely delete all the contents of a file system without rewriting file contents. Thus, secure deletion to file systems can be efficiently achieved and can be independent of the device capacity and file systems. A series of experiments was conducted with realistic workloads to evaluate the capability of the proposed scheme. The results show that the proposed scheme achieves secure deletion with limited performance overheads in most cases. |
Title | Peak-to-average Pumping Efficiency Improvement for Charge Pump in Phase Change Memories |
Author | Huizhang Luo (Chongqing University, China), Jingtong Hu (Oklahoma State University, U.S.A.), *Liang Shi (Chongqing University, China), Chun Jason Xue (City University of Hong Kong, Hong Kong), Qingfeng Zhuge (Chongqing University, China) |
Page | pp. 450 - 455 |
Keyword | PCM, Charge Pump |
Abstract | The pumping efficiency of a PCM chip is a concave function of the write
current. Based on the characteristics of the concave function, the
overall pumping efficiency can be improved if the write current
is uniform. In this paper, we propose the peak-to-average (PTA)
write scheme, which smooths the write current fluctuation by
regrouping write units. An off-line optimal Integer Programming
(IP) formulation and an efficient online algorithm are proposed
to achieve this goal. Experimental results show that PTA can
improve the charge pump efficiency greatly with little overhead. |
Title | Exploiting Parallelism of Imperfect Nested Loops with Sibling Inner Loops on Coarse-Grained Reconfigurable Architectures |
Author | *Xinhan Lin, Shouyi Yin, Leibo Liu, Shaojun Wei (Tsinghua University, China) |
Page | pp. 456 - 461 |
Keyword | CGRA, software pipelining, imperfect nested loop, outer-level pipelining, kernel compression |
Abstract | Coarse-grained reconfigurable architecture (CGRA) is a promising platform for loop acceleration, but existing software pipelining methods cannot achieve satisfactory performance on a fair number of imperfect nested loops, especially those with sibling inner loops.
To tackle this problem, this paper makes 2 contributions:
1) a 2-level pipelining method with an effective II optimization strategy for the imperfect loops with sibling inner loops;
2) a novel kernel compression method to reduce oversize kernel.
Experiment results show that our approach can achieve much higher performance than the state-of-the-art approaches at acceptable costs. |
Title | SlowMo – Enhancing Mobile Gesture-Based Authentication Schemes via Sampling Rate Optimization |
Author | *Kent W. Nixon, Xiang Chen, Zhi-Hong Mao, Yiran Chen (University of Pittsburgh, U.S.A.) |
Page | pp. 462 - 467 |
Keyword | gesture, security, sample, rate |
Abstract | In the era of network service, the user authentication become more indispensable but also vulnerable. Traditional user verification approaches such as PIN or pattern lock suffer from easy hacking and replica. A promising approach to continuous user verification on mobile is gesture-based security, Compare to the traditional authentications, the gesture-based security utilize the user interacts with the device as a dynamic authentication pattern in real-time. It has high complexity and better reliability. But it still lack sufficient research on data sampling and preprocessing techniques on classification accuracy. In this work, we develop SlowMo, a novel gesture security technique, and utilize it for user classification in low sampling-rate environments. The proposed algorithm provides maximum classification accuracy at a sampling rate of 4Hz with extreme low power consumption suggesting a more capable adaptation to the security environment. |
Title | Lattice-Based Boolean Diagrams: Canonical, Order-Independent Graphical Representations of Boolean Functions |
Author | Ahmed Nassar, *Fadi J. Kurdahi (University of California, Irvine, U.S.A.) |
Page | pp. 468 - 473 |
Keyword | Boolean functions, Graph representations, decision diagrams |
Abstract | This paper presents lattice-based Boolean diagrams (LBBDs), a graphical representation of Boolean functions that is not derived from binary decision diagrams (BDDs), as well as symbolic manipulation algorithms. It also identifies a class of Boolean functions where LBBDs are demonstrably more efficient to construct, and reason with, when compared to BDDs.
The case studies include ITC99 and MCNC benchmarks, randomly generated cube covers or sum-of-products (SOP) formulas as well as multi-level Boolean formulas. Finally, LBBDs proved to be instrumental to the efficient runtime verification of software over distributed multiprocessor systems. |
Title | BDD Minimization for Approximate Computing |
Author | *Mathias Soeken, Daniel Große, Arun Chandrasekharan, Rolf Drechsler (University of Bremen, Germany) |
Page | pp. 474 - 479 |
Keyword | BDDs, Approximate computing, Algorithms, Optimization |
Abstract | We present Approximate BDD Minimization (ABM) as a problem that has application in approximate computing. Given a BDD representation of a multi-output Boolean function, ABM asks whether there exists another function that has a smaller BDD representation but meets a threshold with respect to an error metric. We present operators to derive approximated functions and present algorithms to exactly compute the error metrics directly on the BDD representation. An experimental evaluation demonstrates the applicability of the proposed approaches. |
Title | MajorSat: A SAT Solver to Majority Logic |
Author | Yu-Min Chou (National Tsing Hua University, Taiwan), Yung-Chih Chen (Yuan Ze University, Taiwan), Chun-Yao Wang, *Ching-Yi Huang (National Tsing Hua University, Taiwan) |
Page | pp. 480 - 485 |
Keyword | Satisfiability, Majority |
Abstract | A majority function can be represented as sum-of-product (SOP) form or product-of-sum (POS) form. However, a Boolean expression including majority functions could be more compact compared to SOP or POS forms. Hence, majority logic provides a new viewpoint for manipulating the Boolean logic. Recently, majority logic attracts more attentions than before and some synthesis algorithms and axiomatic system for majority logic have been proposed. On the other hand, solvers for satisfiability (SAT) problem have a tremendous progress in the past decades. The format of instances for the SAT solvers is the Conjunctive Normal Form (CNF). For the instances that are not expressed as CNF, we have to transform them into CNF before running the SAT-solving process. However, for the instances including majority functions, this transformation might be not scalable and time-consuming due to the exponential growth in the number of clauses in the resultant CNF. As a result, this paper presents a new SAT solver—MajorSat, which is for solving a SAT instance containing majority functions without any transformation. Some techniques for speeding up the solver are also proposed. Besides, we also propose a transformation method that can generate the characteristic function of a majority logic gate. The experimental results show that the MajorSat solver can efficiently solve random instances containing majority functions that CNF SAT solvers, like MiniSat or Lingeling, cannot. |
Title | Fast Synthesis of Threshold Logic Networks with Optimization |
Author | *Yung-Chih Chen, Runyi Wang, Yan-Ping Chang (Yuan Ze University, Taiwan) |
Page | pp. 486 - 491 |
Keyword | Threshold logic, logic synthesis, logic optimization |
Abstract | Threshold logic, a more compact Boolean representation compared to conventional logic gate representation, re-attracted substantial attention from researchers due to the advances of threshold logic implementations with novel nanoscale devices. For the compact representation to be promising, a fast and effective method for transforming a conventional Boolean logic network into a threshold logic network is necessary. This paper presents such a synthesis method for threshold logic based on logic optimization. First, a Boolean logic network is mapped into a threshold logic network by one-to-one mapping. Then, a method is used to optimize the threshold logic network based on eight transformations for reducing gate count. Unlike the previous methods, the proposed method does not require threshold function identification, and thus is much more efficient. The experimental results show that the proposed method is three orders of magnitude faster than a widely used synthesis method. Additionally, the proposed method has a better synthesis quality with an average saving of 28% threshold gates. |
Title | Polysynchronous Stochastic Circuits |
Author | *M. Hassan Najafi, David J. Lilja, Marc Riedel, Kia Bazargan (University of Minnesota, U.S.A.) |
Page | pp. 492 - 498 |
Keyword | Polysynchronous circuits, asynchronous circuits, multi-clock circuits, stochastic computing, clock distribution network |
Abstract | Clock distribution networks (CDNs) are costly in high-performance ASICs. This paper proposes a new approach: splitting clock domains at a very fine level, down to the level of a handful of gates. Each domain is synchronized with an inexpensive clock signal, generated locally. This is possible by adopting the paradigm of stochastic computation, where signal values are encoded as random bit streams. The design method is illustrated with the synthesis of circuits for applications in signal and image processing. |
Title | (Keynote Address) Majority-based Synthesis for Nanotechnologies |
Author | Luca Amaru, Pierre-Emmanuel Gaillardon, *Giovanni De Micheli (Integrated Systems Laboratory, EPFL, Switzerland) |
Page | pp. 499 - 502 |
Keyword | Logic Synthesis, Majority Logic, Nanotechnology |
Abstract | We study the logic synthesis of emerging nanotech- nologies whose elementary devices abstraction is a majority voter. We argue that synthesis tools, natively supporting the majority logic abstraction, are the technology enablers. This is because they allow designers to validate majority-based nanotechnologies on large-scale benchmarks. We describe models and data- structures for logic design with majority-based nanotechnologies and we show results of applying new synthesis algorithms and tools. We conclude that new logic synthesis methods are required to achieve a fair assessment on emerging nanotechnologies. |
Title | (Keynote Address) Software and System Co-optimization in the era of Heterogeneous Computing |
Author | *Michael Gschwind (IBM Thomas J Watson Research Center, U.S.A.) |
Keyword | Keynote |
Abstract | Escalating costs of semiconductor technology and its lagging performance relative to historic trends is motivating acceleration and specialization as more impactful means to increase system value. Targeted specialization is being increasingly pursued as an important way to achieve dramatic improvements in workload acceleration. This requires a broad understanding of workloads, system structures, and algorithms to determine what to accelerate / specialize, and how, i.e., via SW?; via HW?; or via SW+HW? which presents many choices, necessitating co-optimization of SW and HW. In this talk, we will focus on an application driven approach to software and system co-optimization, based on inventing new software algorithms, that have strong affinity to hardware acceleration. A High Level design methodology that is needed to enable targeted specialization in hardware will also be described. |
Title | (Invited Paper) Enabling Multi-Layer Cyber-Security Assessment of Industrial Control Systems through Hardware-in-the-Loop Testbeds |
Author | Anastasis Keliris, Charalambos Konstantinou, Nektarios Georgios Tsoutsos (New York University, U.S.A.), Raghad Baiad, *Michail Maniatakos (New York University Abu Dhabi, United Arab Emirates) |
Page | pp. 511 - 518 |
Keyword | security, industrial control systems, testbed, firmware |
Abstract | Industrial Control Systems (ICS) are under modernization towards increasing efficiency, reliability, and controllability. Despite the numerous benefits of interconnecting ICS components, the wide adoption of Information Technologies (IT) has introduced new security challenges and vulnerabilities to industrial processes, previously obscured by the systems' custom designs. Towards securing the backbone of critical infrastructure, selection of the proper assessment environment for performing cyber-security assessments is crucial. In this paper, we present a layered analysis of vulnerabilities and threats in ICS components, that identifies the need for including real hardware components in the assessment environment. Moreover, we advocate the suitability of Hardware-In-The-Loop testbeds for ICS cyber-security assessment and present their advantages over other assessment environments. |
Title | (Invited Paper) Security Analysis on Consumer and Industrial IoT Devices |
Author | Jacob Wurm, Khoa Hoang, Orlando Arias (University of Central Florida, U.S.A.), Ahmad-Reza Sadeghi (TU Darmstadt, Germany), *Yier Jin (University of Central Florida, U.S.A.) |
Page | pp. 519 - 524 |
Keyword | IoT Security, Hardware Security, IoT Devices |
Abstract | The fast development of Internet of Things (IoT) and cyber-physical systems (CPS) has triggered a large demand of smart devices which are loaded with sensors collecting information from their surroundings, processing it and relaying it to remote locations for further analysis. The wide deployment of IoT devices and the pressure of time to market of device development have raised security and privacy concerns. In order to help better understand the security vulnerabilities of existing IoT devices and promote the development of low-cost IoT security methods, in this paper, we use both commercial and industrial IoT devices as examples from which the security of hardware, software, and networks are analyzed and backdoors are identified. A detailed security analysis procedure will be elaborated on a home automation system and a smart meter proving that security vulnerabilities are a common problem for most devices. Security solutions and mitigation methods will also be discussed to help IoT manufacturers secure their products. |
Title | (Invited Paper) Covert Channels Using Mobile Device’s Magnetic Field Sensors |
Author | Nikolay Matyunin (Technische Universität Darmstadt, Germany), *Jakub Szefer (Yale University, U.S.A.), Sebastian Biedermann, Stefan Katzenbeisser (Technische Universität Darmstadt, Germany) |
Page | pp. 525 - 532 |
Keyword | Hardware Security, Side-Channel, Covert-Channel, Magnetic |
Abstract | This paper presents a new covert channel using smartphone magnetic sensors. We show that modern smartphones are capable to detect the magnetic field changes induced by different computer components during I/O operations. In particular, we are able to create a covert channel between a laptop and a mobile device without any additional equipment, firmware modifications or privileged access on either of the devices. We present two encoding schemes for the covert channel communication and evaluate their effectiveness. |
Title | (Invited Paper) Multi-valued Arbiters for Quality Enhancement of PUF Responses on FPGA Implementation |
Author | *Siarhei S. Zalivaka (Nanyang Technological University, Singapore), Alexander V. Puchkov, Vladimir P. Klybik, Alexander A. Ivaniuk (Belarusian State University of Informatics and Radioelectronics, Belarus), Chip-Hong Chang (Nanyang Technological University, Singapore) |
Page | pp. 533 - 538 |
Keyword | Physical Unclonable Function, Arbiter, Hardware security |
Abstract | One main problem encountered in the FPGA implementation of Arbiter based Physical Unclonable Function (A-PUF) is the response instability caused by the metastability of delay flip-flop. This paper presents a new multi-arbiter approach to extract more entropy to extend the number of response bits to a single challenge. New multi-arbiter schemes based on the insertion of either a four-flip-flop arbiter or SR latch arbiter after each pair of multiplexers in the configurable paths are proposed to detect the metastable state when two copies of test pulse arrive at the arbiter inputs almost simultaneously. The detected metastable states are distinguishable by the encoded multiple valued outputs of the arbiter. The codes corresponding to the metastable states collectively form a deterministic ternary state that can be recoded to one of the stable states to improve the uniqueness and reliability of the PUF. Our analysis shows that the proposed design can generate robust and reliable challenge-response pairs with a uniqueness of 0.4982 and a reliability of 0.9985 at the expense of
a relatively small FPGA resource overhead. |
Title | Every Test Makes a Difference: Compressing Analog Tests to Decrease Production Costs |
Author | Seyed Nematollah Ahmadyan (University of Illinois at Urbana-Champaign, U.S.A.), Suriyaprakash Natarajan (Intel, U.S.A.), *Shobha Vasudevan (University of Illinois at Urbana-Champaign, U.S.A.) |
Page | pp. 539 - 544 |
Keyword | Stress Test, Compression, Random tree, Optimization |
Abstract | We introduce a methodology for automated test compression during electrical stress testing of analog and mixed signal circuits. This methodology optimally extracts only portions of a functional test that electrically stress the nets and devices of an analog circuit. We model test compression as a problem of optimizing functional of the transient response. We present a random tree based approach to find optimal solutions for these computationally hard integrals. We demonstrate with an op-amp, VCO and CMOS inverter that the method consistently reduces the length of each test by an average of 93%. |
Title | Re-thinking Polynomial Optimization: Efficient Programming of Reconfigurable Radio Frequency (RF) Systems by Convexification |
Author | Fa Wang, Shihui Yin, Minhee Jun, *Xin Li, Tamal Mukherjee, Rohit Negi, Larry Pileggi (Carnegie Mellon University, U.S.A.) |
Page | pp. 545 - 550 |
Keyword | Polynomial Optimization, Sequential Semidefinite Programming |
Abstract | Reconfigurable radio frequency (RF) system has emerged as a promising avenue to achieve high communication performance while adapting to versatile commercial wireless environment. In this paper, we propose a novel technique to optimally program a reconfigurable RF system in order to achieve maximum performance and/or minimum power. Our key idea is to adopt an equation-based optimization method that relies on general purpose, non-convex polynomial performance models to determine the optimal configurations of all tunable circuit blocks. Most importantly, our proposed approach guarantees to find the globally optimal solution of the non-convex polynomial programming problem by solving a sequence of convex semidefinite programming (SDP) problems based on convexification. A reconfigurable RF front-end example designed for WLAN 802.11g demonstrates that the proposed method successfully finds the globally optimal configuration, while other traditional techniques often converge to local optima. |
Title | An Efficient Trajectory-based Algorithm for Model Order Reduction of Nonlinear Systems via Localized Projection and Global Interpolation |
Author | Chenjie Yang, *Fan Yang, Xuan Zeng (Fudan University, China), Dian Zhou (Fudan University, University of Texas at Dallas, China) |
Page | pp. 551 - 556 |
Keyword | Trajectory, Model Order Reduction |
Abstract | In this paper, we propose a new, efficient trajectory- based model order reduction algorithm for nonlinear systems via localized projection and global interpolation. We employ an efficient procedure to transform the smaller localized reduced-order models into a set of equivalent reduced-order models with consistent global coordinate. The reduced-orders for the nonlinear systems are then obtained by globally interpolating the much smaller localized reduced-order models. |
Title | STORM: A Nonlinear Model Order Reduction Method via Symmetric Tensor Decomposition |
Author | Jian Deng, Haotian Liu, Kim Batselier, Yu-Kwong Kwok, *Ngai Wong (The University of Hong Kong, Hong Kong) |
Page | pp. 557 - 562 |
Keyword | circuit modeling, nonlinear circuit, model order reduction, symmetric tensor decomposition, polynomial system |
Abstract | In this paper, a novel symmetric tensor-based order-reduction method (STORM) is presented for simulating large-scale nonlinear systems. Compared to the recent tensor-based nonlinear model order reduction (TNMOR) algorithm, STORM shows advantages in two aspects. First, STORM avoids the assumption of the existence of a low-rank tensor approximation. Second, with the use of the symmetric tensor decomposition, STORM allows significantly faster computation and less storage complexity than TNMOR. Numerical experiments demonstrate the superior computational efficiency and accuracy of STORM. |
Title | Footfall – GPS Polling Scheduler for Power Saving on Wearable Devices |
Author | *Kent W. Nixon, Xiang Chen, Yiran Chen (University of Pittsburgh, U.S.A.) |
Page | pp. 563 - 568 |
Keyword | GPS, scheduler, map-matching, wearable |
Abstract | Wrist-worn wearable fitness devices, such as FitBit and Apple Watch, have become popular in recent years. Runners can use the GPS embedded in these wearable devices to log the route taken during their exercise, providing vital feedback on pace and distance traveled. Unfortunately, continuous polling for GPS data results in a significant adverse impact on device battery life, e.g., many flagship wearables need to be charged for as frequently as every two days or even less. In this work, we propose Footfall – an intelligent GPS scheduler that can utilize data from alternative sensors on a device to greatly reduce GPS utilization while still maintaining minimum location accuracy. Compared to existing implementations, Footfall system can achieve on average a 75% reduction in total power consumption, while only inducing a 5% discrepancy in location accuracy, which is sufficient for the tar-geted applications. |
Title | CP-FPGA: Computation Data-Aware Software/Hardware Co-design for Nonvolatile FPGAs based on Checkpointing Techniques |
Author | *Zhe Yuan, Yongpan Liu, Hehe Li, Huazhong Yang (Tsinghua University, China) |
Page | pp. 569 - 574 |
Keyword | FPGA, nonvolatile, checkpoint, IOT |
Abstract | With the booming trend of internet of things (IoT), reconfigurable devices, such as FPGAs, have drawn lots of attentions due to their flexible and high-performance capability. However, commercial FPGAs suffer from high leakage power consumption, which makes zero-leakage nonvolatile FPGA (nvFPGA) promising. This paper proposes a hardware/software co-design based nvFPGA with efficient checkpointing strategy. With nonvolatile checkpointing BRAM (CBRAM), it maintains both computation data as well as configuration when power-off to avoid expensive rollbacks due to data loss. A checkpointing location-aware technique is used to balance computation rollback overheads and backup energy. Experimental results show that the proposed checkpointing strategy can reduce 45.8% backup data of nvFPGA when system-level power gating happens. |
Title | Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks |
Author | Mohammad Motamedi, *Philipp Gysel, Venkatesh Akella, Soheil Ghiasi (University of California, Davis, U.S.A.) |
Page | pp. 575 - 580 |
Keyword | DCNN, Accelerator, Design Space Exploration, Deep Convolutional Neural Network |
Abstract | Deep Convolutional Neural Networks (DCNNs) have proven to be very effective in many pattern recognition applications. To meet performance and energy-efficiency constraints, various hardware accelerators have been developed. In this paper, we propose an FPGA-based accelerator which can handle convolutional layers with large hyperparameters. We present a design space exploration algorithm to find the optimal architecture that leverages all sources of parallelism. To the best of our knowledge, we improve the state-of-art for AlexNet on a large FPGA by 1.9X. |
Title | LRADNN: High-Throughput and Energy-Efficient Deep Neural Network Accelerator using Low Rank Approximation |
Author | *Jingyang Zhu (Hong Kong University of Science and Technology, Hong Kong), Zhiliang Qian (Shanghai Jiao Tong University, China), Chi-Ying Tsui (Hong Kong University of Science and Technology, Hong Kong) |
Page | pp. 581 - 586 |
Keyword | Deep Neural Network, Low Rank Approximation, VLSI archiecture, Energy Efficiency |
Abstract | In this work, we propose an energy-efficient hardware accelerator for Deep Neural Network (DNN) using Low Rank Approximation (LRADNN). Using this scheme, inactive neurons in each layer of the DNN are dynamically identified and the corresponding computations are then bypassed. Accordingly, both the memory accesses and the arithmetic operations associated with these inactive neurons can be saved. Therefore, compared to the architectures using the direct feed-forward algorithm, LRADNN can achieve a higher throughput as well as a lower energy consumption with negligible prediction accuracy loss (within 0.1%). We implement and synthesize the proposed accelerator using TSMC 65nm technology. From the experimental results, an energy reduction ranging from 31% to 53% together with an increase in the throughput from 22% to 43% can be achieved. |
Title | Sequence-Pair-Based Placement and Routing for Flow-Based Microfluidic Biochips |
Author | *Qin Wang, Yizhong Ru, Hailong Yao (Tsinghua University, China), Tsung-Yi Ho (National Tsing Hua University, Taiwan), Yici Cai (Tsinghua University, China) |
Page | pp. 587 - 592 |
Keyword | Flow-based microfluidic biochips, Sequence-pair-based placement, Flow-channel crossings avoidance, Placement adjustment |
Abstract | Flow-based microfluidic biochips are attracting increasing attention
with successful applications in lab-on-a-chip experiments and
point-of-care diagnosis. Physical design for flow-based biochips
determines the number of flow-channel intersections, and thus
affects the number of microvalves. As reducing microvalves will
significantly improve the overall design quality and reliability,
physical design is of great importance. Typically, physical design
consists of two major stages, i.e., component placement and
routing. Existing works follow the step-by-step scheme, which
perform placement and routing separately. The lack of interactions
between the two design stages results in degraded design with
large number of unfavorable channel intersections and microvalves.
This paper presents a novel placement and routing method based
on the sequence-pair representation, which seamlessly integrates
placement and routing stages and allows iterative placement adjustment
upon routing feedbacks. Experimental results show that
compared with the existing work, the proposed method obtains
average 54.10% improvement in flow-channel crossings, 42.15%
improvement in total chip area, and 23.43% improvement in total
channel length. |
Title | Congestion- and Timing-Driven Droplet Routing for Pin-Constrained Paper-Based Microfluidic Biochips |
Author | *Jain-De Li, Sying-Jyan Wang (National Chung Hsing University, Taiwan), Katherine Shu-Min Li (National Sun Yat-sen University, Taiwan), Tsung-Yi Ho (National Tsing Hua University, Taiwan) |
Page | pp. 593 - 598 |
Keyword | Paper-based Digital Microfluidic (PB-DMF), Digital Microfluidic, Global Routing, Escape Routing |
Abstract | Paper-based microfluidic chips provide a novel way to carry out microfluidic analysis. Such chips achieve “lab-on-paper” instead of traditional “lab-on-chips”. The paper substrate is attractive because it is cost-effective, easy to use and disposable. The routing problem of paper-based digital microfluidic (PB-DMF) biochips is to realize bio-chemical operations on paper with inkjet printing techniques. In this paper, we propose a routing scheme targeting multiple preprogrammed droplet paths such that both routability and wire-length are optimized in a paper layer. Compared with previous digital microfluidic (DMF), the proposed paper-based DMF needs only one integrated paper layer instead of two layers of control and signal layers in the traditional DMF. Experimental results on a set of paper chip applications show the effectiveness, efficiency and scalability of the proposed algorithm. |
Title | Chain-Based Pin Count Minimization for General-Purpose Digital Microfluidic Biochips |
Author | *Yung-Chun Lei, Chen-Shing Hsu, Juinn-Dar Huang (National Chiao Tung University, Taiwan), Jing-Yang Jou (National Central University, Taiwan) |
Page | pp. 599 - 604 |
Keyword | Digital microfluidic biochip, pin count minimization, Lab on a chip |
Abstract | Minimizing the number of external control pins is one of the most important optimization objectives in digital microfluidic biochip (DMFB) design especially as the chip size gets even bigger. So far, only few works focus on this issue for general-purpose DMFBs. In this paper, we present a pin count minimization algorithm based on sophisticated electrode chaining on regular or irregular electrode arrays. The key idea of the proposed method is that actuation information can be implied from previous neighborhood electrodes to later ones throughout the chain. Experimental results show that the pin count reduction can be near 50% in large DMFBs. |
Title | A Routability-Driven Flow Routing Algorithm for Programmable Microfluidic Devices |
Author | Yi-Siang Su (National Taiwan University, Taiwan), *Tsung-Yi Ho (National Tsing Hua University, Taiwan), Der-Tsai Lee (National Taiwan University, Taiwan) |
Page | pp. 605 - 610 |
Keyword | Biochip, Microfluidics, Routing |
Abstract | Biochips that are made of Micro Electro Mechanical Systems (MEMS) are concerned by everyone in recent years. The advantages of biochips are high accuracy and fast reaction rate with only a small volume consumption of samples and reagents. Among various types of biochips,
flow-based microfluidic biochips receive much attention recently, especially the programmable microfluidic device (PMD). PMDs are capable of performing multitude functions in one platform without requiring any hardware modifications. As the size of chips increase,
ow routing becomes more complicated. Traditional method to manually control multiple flows is inefficient and it may not have feasible assay completion time. Fortunately, PMDs has high potential to
route flows with pure software programs to overcome the drawbacks of traditional methods. However, naive software program that simply minimizing assay completion time may cause flow-congestion problems and unexpected mixing between different assays. To conduct a viable experiment, a feasible program should not only minimize assay completion time but also consider congestion problems and fluidic constraint. Therefore, we formulate the flow routing problem and propose a routability-driven flow routing algorithm which considers the fluidic constraint and minimizes the assay completion time on PMDs. |
Title | (Invited Paper) Advanced Multi-Patterning and Hybrid Lithography Techniques |
Author | *Fedor G. Pikus, Andres Torres (Mentor Graphics, U.S.A.) |
Page | pp. 611 - 616 |
Keyword | DFM, MP, DSA, semiconductor, manufacturing |
Abstract | We present an overview of several techniques that are used when the layout pitch and feature size become significantly smaller than the minimum resolution of the lithographic process. We consider several multi-patterning (MP) techniques, in which a single layer is decomposed into two or more masks and printed in multiple stages. We also introduce the direct self-assembly (DSA) technology, where features several times smaller than the minimum lithographic resolution form spontaneously due to self-assembling, or self-organizing, formation of block copolymers. |
Title | (Invited Paper) Recent Research Development and New Challenges in Analog Layout Synthesis |
Author | *Mark Po-Hung Lin (National Chung Cheng University, Taiwan), Yao-Wen Chang (National Taiwan University, Taiwan), Chih-Ming Hung (MediaTek, Taiwan) |
Page | pp. 617 - 622 |
Keyword | Analog layout, placement, routing, migration, knowledge mining |
Abstract | Analog and mixed-signal integrated circuits play an important role in many modern emerging system-on-chip (SoC) design applications. With the expansion of the markets of those applications, the demands of analog/mixed-signal ICs have been dramatically increased. Although analog/mixed-signal ICs have gained more and more importance and demands in modern SoC applications, the development of analog electronic design automation (EDA) tools is still farther behind that of digital EDA tools. As a result, analog/mixed-signal IC design, especially the analog layout design, is still a manual, time-consuming, and error-prone task. In order to speedup modern SoC design for large varieties of emerging applications, it is required to develop novel analog/mixed-signal IC deign methodologies and algorithms, as well as new analog EDA tools. The purpose of this paper is to summarize recent research progress during the past decade, address new analog layout design challenges in advanced technology nodes, and facilitate more research activities in analog layout synthesis. |
Title | (Invited Paper) To Detect, Locate, and Mask Hardware Trojans in Digital Circuits by Reverse Engineering and Functional ECO |
Author | *Xing Wei, Yi Diao, Yu-Liang Wu (Easy-Logic Technology Ltd., Hong Kong) |
Page | pp. 623 - 630 |
Keyword | Hardware Trojan, Formal verification, Reverse Engineer, Functional ECO, Logic Rewiring |
Abstract | During the design phase, a specification may be tampered directly by dishonest engineers (or industry spy), or may be tampered indirectly through the use of malicious modules from a third party Intellectual Property (3PIP) block vendors. During integration and fabrication, the chips may also be tampered by the untrusted system integrator. Particularly for high-end commercial or classified military chips, Hardware Trojan (HT) Detect-Locate-and-Mask (DL&M) is crucially necessary so as to make sure a design is produced exactly as the original specification. Our objectives are (1) to detect any functionality difference which might be caused by bugs or HTs, (2) to locate/output the difference circuitry to correct the bugs or to investigate the tampering intention or purpose, and (3) to kill (mask) the HTs by restoring the chip’s functionality back to golden with a minimum circuitry change. Besides blocking the plotted damage in an early stage and pointing the spy source by revealing the HT intention, the masking circuit revision must also be minimized to avoid affecting the chip performance (timing) too much. In this paper, we propose a scheme that integrates reverse engineering, formal verification, functional ECO, and logic rewiring to detect, locate and mask Hardware Trojans with minimized cost. This formal verification based scheme can guarantee catching 100% of the hidden combinational circuit HTs and can handle multiple HTs (no number limit) automatically in one run. Some techniques within our scheme won the first places of the CAD Contests at ICCAD 2012, 2013, and 2014 |
Title | Aging-aware High-level Physical Planning for Reconfigurable Systems |
Author | Zana Ghaderi, *Eli Bozorgzadeh (University of California, Irvine, U.S.A.) |
Page | pp. 631 - 636 |
Keyword | Aging mitigation, Reconfigurable systems, Physical planning, Performance degradation, Aging analysis |
Abstract | Due to advanced silicon technology, reconfigurable
system-on-chip devices such as FPGAs are increasingly becoming
sensitive to aging effects. This paper presents a high-level physical planning with reconfiguration strategy in order to mitigate the aging-induced delay degradation in FPGA resources. The proposed solution is an offline framework composed of an aging-aware floorplanner coupled with a proactive aging-aware reconfiguration policy which generates checkpoints aperiodically for runtime reconfiguration. We consider BTI and HCI aging mechanisms and consider the BTI-based aging recovery during idle periods using aging history map. The experiments demonstrate that sequence of configurations generated by this scheme provides average aging-rate reduction on FPGA resources and application performance by 53.2% and 17.5%, respectively. |
Title | Hardware Reliability Margining for the Dark Silicon Era |
Author | Liangzhen Lai, *Puneet Gupta (UCLA, U.S.A.) |
Page | pp. 637 - 642 |
Keyword | dark silicon, reliability, margining |
Abstract | Hardware reliability margin should be derived from the
worst-case aging scenario, which typically occurs when the circuits are
operating at peak performance state with the highest operating voltage
and frequency.
However, as integrated circuits enter the ``dark silicon'' era, it is
impossible to power up all circuits throughout the entire lifetime.
Reliability margining in absence of architecture-level power/thermal
constraints can be overly pessimistic.
In this work, we propose a margining scheme that employs
the power/thermal contexts and system management policies
to derive the actual worst-case workload pattern for different
reliability phenomena.
Our experiment results show that at 60% dark ratio, conventional margining
approach can overestimate the aging degradation due to EM and BTI by
up to 3-7X and 18% respectively. Our margining method is able
to eliminate these over-pessimism and results in about 20% delay margin and
40%-60% metal width margin reduction. |
Title | ACR: Enabling Computation Reuse for Approximate Computing |
Author | *Xin He (Chinese Academy of Sciences, China), Guihai Yan, Yinhe Han, Xiaowei Li (Institute of Computing Technology, Chinese Academy of Sciences, China) |
Page | pp. 643 - 648 |
Keyword | Approximate computing, Computation reuse |
Abstract | Approximate computing, which trades off computation quality (e.g, accuracy) and computation efforts, has becoming a promising technique to improve performance for many mission-non-critical and error-tolerant applications. The computations in such applications usually exhibit superior value locality, i.e, computations performed by a function or code region are very likely to reproduce ``similar'' results. Reusing the similar results can bypass redundant computations, as long as ``exact'' results are not mandatory. However, conventional computation reuse techniques are less effective in approximate computing paradigm. The input values of two computation instances have to be identical to reuse one for another, hence ``exact'' in nature. We propose ACR, an approximate computation reuse framework, to enable computation reuse for approximate computing. ACR relaxes the exact matching in inputs to some extent regulated by ``similarity'' quantification, thereby shifting the exact computation reuse paradigm to its approximate counterpart. We furthermore propose an input significance-aware similarity quantification scheme through statistical approaches. Experimental result shows ACR could effectively exploit the potential of computation reuse for approximate computing and reduce 47.6\% computations on average for a set of approximate applications. |
Title | Work hard, sleep well - Avoid irreversible IC wearout with proactive rejuvenation |
Author | *Xinfei Guo, Mircea R. Stan (University of Virginia, U.S.A.) |
Page | pp. 649 - 654 |
Keyword | irreversible wearout, boundary, sleep-when-getting-tired, negative turbo boost |
Abstract | Various wearout mechanisms have both a reversible and an irreversible (permanent) part, with some, like BTI and EM having a significant reversible part, while others, like HCI, being mostly irreversible. In this paper we make two contributions. First, we show that the boundary between the reversible and irreversible parts of wearout is not fixed, with the irreversible part becoming at least partially reversible under the right conditions of active accelerated recovery and stress/recovery scheduling. Second, we show that there are certain stress/recovery schedules that can (almost) completely eliminate irreversible wearout, thus allowing significant reductions in necessary design margins. The experiments were done on commercial FPGAs fabricated in a 40nm technology. To fully repair and avoid the irreversible wearout, we propose a biology-inspired sleep-when-getting-tired strategy. The strategy can achieve >60x design margin reduction and ~9% average performance improvement within a 10-year lifetime constraint compared to the no-recovery case. Potential system level implementations (a negative “turbo-boost” like strategy) in multicore and NoC systems are also presented. |
Title | Netlist Reverse Engineering for High-Level Functionality Reconstruction |
Author | *Travis Meade, Shaojie Zhang, Yier Jin (University of Central Florida, U.S.A.) |
Page | pp. 655 - 660 |
Keyword | Reverse Engineering, IP Security, Netlist Analysis |
Abstract | In a modern IC design flow, from specification development to chip fabrication, various security threats are emergent. Of particular concern are modifications made to third-party IP cores and commercial off-the-shelf (COTS) chips where no golden models are available for comparisons. Toward this direction, we develop a tool, named Reverse Engineering Finite State Machine (REFSM), that helps end-users reconstruct a high-level description of the control logic from a flattened netlist. We demonstrate that REFSM effectively recovers circuit control logic from netlists with varying degrees of complexity. Experimental results also showed that the developed tool can easily identify malicious logic from a flattened (or even obfuscated) netlist. If combined with chip level reverse engineering techniques, the developed REFSM tool can help detect the insertion of hardware Trojans in fabricated circuits. |
Title | Assessing CPA Resistance of AES with Different Fault Tolerance Mechanisms |
Author | Hoda Pahlevanzadeh, Jaya Dofe, *Qiaoyan Yu (University of New Hampshire, U.S.A.) |
Page | pp. 661 - 666 |
Keyword | AES, correlation power analysis, fault tolerance, partial guessing entropy, FPGA |
Abstract | Countermeasures for Advanced Encryption Standard (AES) to thwart side-channel attack and fault attack are typically investigated in a separate fashion. There is lack of thorough investigation on how one countermeasure specifically for one attack affects the efficiency of another attack. In this work, we consider three different fault detection (FD) methods − double modular redundancy (DMR), inverse function (inverse), and even parity check code (parity). We perform FPGA-based systematic analysis to investigate the impact of FD schemes on the correlation power analysis (CPA) resistance of a complete AES implementation. Moreover, the power model used in the existing work is Hamming weight rather than the powerful Hamming distance one. Our experimental results show that, in some scenarios, the use of fault detection mechanisms in AES improves the resistance against CPA. For instance, applying a parity FD to the AES’s S-Box makes it harder to retrieve the key than the case without any FD protection. |
Title | SPARTA: A Scheduling Policy for Thwarting Differential Power Analysis Attacks |
Author | *Ke Jiang, Petru Eles, Zebo Peng, Sudipta Chattopadhyay (Linköping University, Sweden), Lejla Batina (Radboud University, Netherlands) |
Page | pp. 667 - 672 |
Keyword | Real-time systems, Security, Countermeasure, DPA attacks |
Abstract | Embedded systems (ESs) have been widely used in various application domains. It is very important to design ESs that guarantee functional correctness of the system under strict timing constraints. Such systems are known as the real-time embedded systems (RTESs). More recently, RTESs started to be utilized in safety and reliability critical areas, which made the overlooked security issues, especially confidentiality of the communication, a serious problem. Differential power analysis attacks (DPAs) pose serious threats to confidentiality protection mechanisms, i.e., implementations of cryptographic algorithms, on embedded platforms. In this work, we present a scheduling policy, SPARTA, that thwarts DPAs. Theoretical guarantees and preliminary experimental results are presented to demonstrate the efficiency of the SPARTA scheduler. |
Title | Analysis and Vulnerability Exploration of Current Secure Scan Designs |
Author | Yanhui Luo, *Aijiao Cui (Harbin Institute of Technology Shenzhen Graduate School, China), Huawei Li (Chinese Academy of Sciences, China), Gang Qu (University of Maryland College Park, U.S.A.) |
Page | pp. 673 - 678 |
Keyword | secure scan design, scan-based side-channel attack, obfuscating scan chain |
Abstract | Scan design has become another side-channel of leaking confidential information inside crypto chips. Methods based on obfuscating scan chain order have been proposed as effective countermeasures. In this paper, we analyze the existing secure scan designs from the angle whether they need a complete chain state and rely on any specific scan chain order. We show that all existing attacks do not rely on specific scan chain order. As an example, for the recently proposed ROS countermeasure, we demonstrate, how an attacker can access the complete state of the scan chain and hence defeat the countermeasure. |
Title | Laplacian Eigenmaps and Bayesian Clustering Based Layout Pattern Sampling and Its Applications to Hotspot Detection and OPC |
Author | *Tetsuaki Matsunawa (Toshiba Corp., Japan), Bei Yu (Chinese University of Hong Kong, Hong Kong), David Z. Pan (University of Texas at Austin, U.S.A.) |
Page | pp. 679 - 684 |
Keyword | Pattern Sampling, Clustering, OPC, Hotspot Detection |
Abstract | Effective layout pattern sampling is a fundamental component for lithography process optimization, hotspot detection, and model calibration.
Existing pattern sampling algorithms rely on either vector quantization or heuristic approaches.
However, it is difficult to manage these methods due to the heavy demands of prior knowledges, such as high-dimensional layout features and manually tuned hypothetical model parameters.
In this paper we present a self-contained layout pattern sampling framework, where no manual parameter tuning is needed.
To handle high dimensionality and diverse layout feature types, we propose a nonlinear dimensionality reduction technique with kernel parameter optimization.
Furthermore, we develop a Bayesian model based clustering, through which automatic sampling is realized without arbitrary setting of model parameters.
The effectiveness of our framework is verified through a sampling benchmark suite and two applications, lithography hotspot detection and optical proximity correction. |
Title | Balancing Lifetime and Soft-Error Reliability to Improve System Availability |
Author | *Junlong Zhou (University of Notre Dame, East China Normal University, U.S.A.), X. Sharon Hu, Yue Ma (University of Notre Dame, U.S.A.), Tongquan Wei (East China Normal University, China) |
Page | pp. 685 - 690 |
Keyword | Lifetime, Soft-Error, Reliability, Availability, MTTF |
Abstract | CMOS scaling has greatly increased concerns for lifetime reliability due to permanent faults and soft-error reliability due to transient faults. Most existing works only focus on one of the two reliability concerns, but often times techniques used to increase one type of reliability may adversely impact the other type. A few efforts do consider both types of reliability together and use two different metrics to quantify the two types of reliability. However, for many systems, the concern of the user is to maximize system availability by improving the mean time to failure (MTTF), regardless of whether the failure is caused by permanent faults or transient faults. Addressing this concern requires a uniform metric to measure the effect due to both types of faults. In this paper, we derive a novel analytical expression for calculating the MTTF due to transient faults. Using this new formula and an existing method to evaluate system MTTF, we formulate and solve the problem of maximizing system availability with consideration of permanent faults, transient faults, and throughput constraint. Extensive simulations of synthetic task sets and benchmarks based on real-world applications were performed to validate our algorithm. |
Title | A Closed-Form Stability Model for Cross-Coupled Inverters Operating in Sub-Threshold Voltage Region |
Author | *Tatsuya Kamakari, Jun Shiomi, Tohru Ishihara, Hidetoshi Onodera (Kyoto University, Japan) |
Page | pp. 691 - 696 |
Keyword | Cross-coupled inverter, Yield, Stability, Analytical Model, Sub-threshold Voltage |
Abstract | A cross-coupled inverter which is an essential element of on-chip memory subsystems plays an important role in synchronous LSI circuits.
In this paper, an analytical stability model for a cross-coupled inverter operating in a sub-threshold voltage region is proposed.
The proposed model analytically shows that the minimum operating voltage of the cross-coupled inverter distributes normally in a high-sigma region if the distribution of the threshold voltage is Gaussian.
The minimum supply voltage at which the yield of the cross-coupled inverter becomes a specific value can be accurately derived by a simple calculation using the model. Monte-Carlo simulation assuming a commercial 28nm process technology demonstrates the accuracy and the validity of the proposed model.
Based on the model, this paper shows strategies for variation tolerant memory design. |
Title | Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products |
Author | Samyoung Bang (Samsung Electronics, Republic of Korea), Kwangsoo Han, Andrew B. Kahng, *Mulong Luo (University of California at San Diego, U.S.A.) |
Page | pp. 697 - 704 |
Keyword | interconnection, crosstalk, DRAM, timing |
Abstract | Crosstalk-induced delay change is a critical challenge
to physical design of long interconnect channels in DRAM products
at the 2x and 1x technology nodes. Due to severe cost challenges
in a high-volume, commodity market, layout resources including
channel width, buffers, and number of metal routing layers are
extremely scarce. We describe a new channel optimizer that reduces
crosstalk-induced delay uncertainty, weighted by signal criticality and aware of signal activity correlations (e.g., to reduce delay uncertainty by mutual shielding). Instead of the typical signal net permutation strategy, we apply (pessimistic) timing-driven swizzling to minimize the delay uncertainty cost function. Contributions of this work include (1) an accurate and efficient analytical crosstalk delay calculator, (2) scalability up to hundreds of signals and tracks in the routing channel through use of greedy and decomposition strategies as well as a pair-swapping heuristic, and (3) experimental studies that demonstrate up to 24% reduction of the worstcase
criticality-weighted delay uncertainty (or, 34ps of absolute
delay uncertainty reduction) compared with the typical signal
permutation approach. |
Title | (Invited Paper) Reliability, Adaptability and Flexibility in Timing: Buy a Life Insurance for Your Circuits |
Author | *Ulf Schlichtmann (TU Munich, Germany), Masanori Hashimoto (Osaka University, Japan), Iris Hui-Ru Jiang (National Chiao Tung University, Taiwan), Bing Li (TU Munich, Germany) |
Page | pp. 705 - 711 |
Keyword | Aging Analysis, Timing Adaptation, Criticality-dependency-aware Timing |
Abstract | At nanometer manufacturing technology nodes, process variations affect circuit performance significantly. In addition, performance deterioration of circuits due to aging effects is also increasing. Consequently, a large timing margin is required to maintain yield. To combat the pessimism and the resulting overdesign, aging analysis with high-level models, on-chip timing margin monitoring and tuning, and flexible delay models of flip-flops can be deployed. This paper gives an overview of the state of the art of applying these techniques to improve the health of circuits. |
Title | A High Performance Reliable NoC Router |
Author | *Lu Wang, Sheng Ma, Zhiying Wang (College of Computer, National University of Defense Technology, China) |
Page | pp. 712 - 718 |
Keyword | Reliability, Network-on-chip, High performance, Router design |
Abstract | Aggressive scaling of CMOS process technology allows the fabrication of highly integrated chips, and enables the design of multiprocessors system-on-chip connected by the network- on-chip (NoC). However, it brings about widespread reliability challenges.
Aiming to tackle the permanent faults on the router components, we propose a high performance, high reliability and low cost router design based on a generic 2-stage router. Four fault tolerant strategies are added in our reliable router. We exploit a double routing strategy for the routing computation(RC) failure, a default winner strategy for the virtual channel allocation (VA), a runtime arbiter selection strategy for the switch allocation (SA) failure and a double bypass bus strategy for the crossbar failure. Different from previous reliable routers, our design leverages the feature of pipeline optimization and routing algorithm to maintain the performance in fault tolerance especially under heavy network loads. Besides, our proposed router provides higher reliability with lower hardware consumption than previous reliable router designs. |
Title | Dynamic Admission Control for Real-Time Networks-On-Chips |
Author | *Adam Kostrzewa, Selma Saidi, Leonardo Ecco, Rolf Ernst (TU Braunschweig, Germany) |
Page | pp. 719 - 724 |
Keyword | real-time, safety, overlay network |
Abstract | Networks-on-Chip (NoCs) for real-time
systems require solutions for safe and predictable
sharing of network resources. In this work, we present
a mechanism for a global and dynamic admission con-
trol in NoCs designed for real-time systems. It in-
troduces an overlay network to synchronize transmis-
sions using arbitration units called Resource Managers
(RMs), which allows a global and work-conserving
scheduling. We present a formal worst-case timing
analysis for the proposed mechanism and demonstrate
that this solution not only exposes higher performance
in simulation but, even more importantly, consistently
reaches smaller formally guaranteed worst-case laten-
cies than TDM for realistic levels of system’s utiliza-
tion. Our mechanism does not require modification of
routers and therefore can be used together with any
architecture utilizing non-blocking routers. |
Title | FoToNoC: A Hierarchical Management Strategy Based on Folded Torus-Like Network-on-Chip for Dark Silicon Many-Core Systems |
Author | *Lei Yang, Weichen Liu, Weiwen Jiang, Mengquan Li, Juan Yi, Edwin Hsing-Mean Sha (Chongqing University, China) |
Page | pp. 725 - 730 |
Keyword | Dark silicon, System performance, Network-on-Chip, Temperature |
Abstract | In this dark silicon era, techniques have been developed to selectively activate nonadjacent cores in physical locations to maintain the safe temperature and allowable power budget on a many-core chip. This will result in unexpected increase in the communication overhead due to longer average distance between active cores in a typical mesh-based Network-on-Chip (NoC), and in turn reduce the system performance and energy efficiency. In this paper, we present FoToNoC, a Folded Torus-like NoC, and a hierarchical management strategy on top of it, to address this tradeoff problem for heterogeneous many-core systems. Optimizations of chip temperature, inter-core communication, application performance, and system energy consumption are well isolated in FoToNoC, and addressed in different design phases and aspects. A cluster-based hierarchical strategy is proposed to manage the system adaptively in several different control levels. Compared with mesh-based systems on a set of synthetic and real benchmarks, FoToNoC can achieve on average 39.4% performance improvement when similar temperature conditions are maintained, and the proposed strategy can further reduce the total energy consumption by up to 42.0%. |
Title | Analytical ThruChip Inductive Coupling Channel Design Optimization |
Author | *Li-Chung Hsu, Junichiro Kadomoto, So Hasegawa, Atsutake Kosuge, Yasuhiro Take, Tadahiro Kuroda (Keio University, Japan) |
Page | pp. 731 - 736 |
Keyword | TCI, 3-D IC, ThruChip, Inductive Coupling |
Abstract | ThruChip interface (TCI) is an emerg- ing 3-D integrated circuit stacking technology. TCI utilizes on-chip inductor to build vertical communi- cation channel in near field distance and has been proved to stand comparison with through-silicon-via (TSV) in data rate, power, and reliability. Moreover, it is also cost-effective in manufacturing due to its wireless nature. In this paper, an analytical method is proposed to find near-optimal TCI inductive cou- pling channel solution. The experiment results show an average 16.8% transmitting current reduction and shrink design time from days to a few minutes. |
Title | Extending Trace History Through Tapered Summaries in Post-silicon Validation |
Author | *Sandeep Chandran, Preeti Ranjan Panda, Smruti R. Sarangi (Indian Institute of Technology Delhi, India), Deepak Chauhan, Sharad Kumar (Freescale Semiconductors India Pvt Ltd, India) |
Page | pp. 737 - 742 |
Keyword | Post-silicon Validation Methodology, Online filtering, Tapered Summaries |
Abstract | On-chip trace buffers are increasingly being used for at-speed debug during post-silicon validation. However, the activity history captured by these buffers is small due to their limited size. We propose a novel scheme that extends the captured trace history (by upto 162%) by using a portion of the trace buffer to also store summaries of trace messages. We describe an Overlapped architecture that uses reduced number of ports to capture tapered summaries. We demonstrate the usefulness of the proposed methodology for debugging various classes of bugs encountered during post-silicon validation. |
Title | Novel Applications of Deep Learning Hidden Features for Adaptive Testing |
Author | *Bingjun Xiao (University of California, Los Angeles, U.S.A.), Jinjun Xiong (IBM Research, U.S.A.), Yiyu Shi (University of Notre Dame, U.S.A.) |
Page | pp. 743 - 748 |
Keyword | Adaptive testing, DNN, Big data |
Abstract | Adaptive test of integrated circuits (IC) promises to increase
the quality and yield of products with reduced manufacturing
test cost compared to traditional static test flows. Based on recent progress on machine learning, this
paper proposes a novel deep learning based method for adaptive
test. In this paper, we start from a trained
deep neuron network (DNN) with a much higher accuracy than
the conventional test flow for the pass and fail prediction. We
further develop two novel applications by leveraging the features
learned from DNN: one to enable partial testing, i.e., make
decisions on pass and fail without finishing the entire test flow,
and two to enable dynamic test ordering, i.e., changing the
sequence of tests adaptively. Experiment results show significant
improvement on the accuracy and effectiveness of our proposed
method. |
Title | Mixed 01X-RSL-Encoding for Fast and Accurate ATPG with Unknowns |
Author | *Dominik Erb, Karsten Scheibler (University of Freiburg, Germany), Michael A. Kochte (University of Stuttgart, Germany), Matthias Sauer (University of Freiburg, Germany), Hans-Joachim Wunderlich (University of Stuttgart, Germany), Bernd Becker (University of Freiburg, Germany) |
Page | pp. 749 - 754 |
Keyword | Unknown values, test generation, Restricted symbolic logic, SAT, stuck-at faults |
Abstract | Unknown (X) values in a design introduce pessimism
in conventional test generation algorithms which results in a loss of
fault coverage. This pessimism is reduced by a more accurate modeling
and analysis. Unfortunately, accurate analysis techniques highly
increase runtime and limit scalability. One promising technique to
prevent high runtimes while still providing high accuracy is the use
of restricted symbolic logic (RSL). However, also pure RSL-based
algorithms reach their limits as soon as millon gate circuits need to
be processed.
In this paper, we propose new ATPG techniques to overcome such
limitations. An efficient hybrid encoding combines the accuracy of
RSL-based modeling with the compactness of conventional threevalued
encoding. A low-cost two-valued SAT-based untestability
check is able to classify most untestable faults with low runtime. An
incremental and event-based accurate fault simulator is introduced
to reduce fault simulation effort.
The experiments demonstrate the effectiveness of the proposed
techniques. Over 97% of the faults are accurately classified. Both
the number of aborts and the total runtime are significantly reduced
compared to the state-of-the-art pure RSL-based algorithm. For
circuits up to a million gates, the fault coverage could be increased
considerably compared to a state-of-the-art commercial tool with
very competitive runtimes. |
Title | Test and Diagnosis Pattern Generation for Dynamic Bridging Faults and Transition Delay Faults |
Author | *Cheng-Hung Wu, Saint James Lee, Kuen-Jong Lee (National Cheng Kung Univ., Taiwan) |
Page | pp. 755 - 760 |
Keyword | Fault diagnosis, transition fault, dynamic bridging fault, diagnosis pattern generation, test compaction |
Abstract | A dynamic bridging fault (DBF) induces a transition delay on a circuit node and hence has similar effects as a transition delay fault (TDF). However the causes of these two types of faults are quite different: a DBF is due to the bridging effects between two circuit nodes, while a TDF is due to a node itself or the logic connected to the node. Thus in addition to detecting these two types of faults, it is also important to distinguish them such that the exact sources of defects can be identified during the yield ramping process. In this paper we present an efficient test and diagnosis pattern generation procedure to detect DBFs and TDFs as well as to distinguish them. We first analyze the dominance relation between a DBF and its corresponding TDF. A novel circuit model called the inverse DBF (IDBF) model is then developed which can transform the problem of distinguishing a pair of a DBF and a TDF into the problem of detecting the inverse DBF. The pattern generation process can then be done by using an ATPG tool for dynamic bridging faults. We believe this is the first work to distinguish TDFs and DBFs. A complete procedure to generate both test and diagnosis patterns to detect all testable TDFs and DBFs in addition to distinguishing them is then presented. In this flow all TDFs and DBFs as well as all fault pairs between the two types of faults can be modeled in a single circuit and dealt with in a few ATPG runs. Thus the pattern generation process is quite efficient and very compact pattern sets can be obtained by utilizing the test pattern compaction feature of the ATPG tool. Experimental results on ISCAS89 benchmarks show that our procedure can detect all detectable TDFs and DBFs and up to 99.96% diagnosis resolution for these faults is achieved. |
Title | Flexible Transition Metel Dichalcogenide Field-Effect Transistors: A Circuit-Level Simulation Study of Delay and Power under Bending, Process Variation, and Scaling |
Author | Ying-Yu Chen (University of Illinois at Urbana-Champaign, U.S.A.), *Morteza Gholipour (Babol University of Technology, Iran), Deming Chen (University of Illinois at Urbana-Champaign, U.S.A.) |
Page | pp. 761 - 768 |
Keyword | transition metal dichalcogenide, TMDFET, flexible electronics, modeling, simulation |
Abstract | In this paper, a new and efficient SPICE model of flexible transition metal dichalcogenide field-effect transistors (TMDFETs) is developed for different types of materials, considering effects when scaling the transistor size down to the 16-nm technology node. Extensive circuit-level simulations are performed using this model, and the delay and power performance of TMDFET circuits with different amounts of bending are reported. Simulation results indicate that delay and power trade-off can be done in TMDFET circuits via bending. Effects from process variation are also evaluated via circuit simulations. Finally, our cross-technology and scaling studies show that while TMDFETs perform better than Si-based transistors in terms of energy-delay product (EDP) at 180-nm and 90-nm technology nodes (the best being 12.7% and 40.7% of that of Si-based transistors, respectively), their EDPs are worse than Si-based transistors (at least 4.9x of that of the best performing Si-based transistor) on the 16-nm technology node. Such a compact model would enable SPICE-level circuit simulation for early assessment, design, and evaluation of futuristic TMDFET-based flexible circuits targeting advanced technology nodes. |
Title | Non-Volatile Non-Shadow Flip-Flop using Spin Orbit Torque for Efficient Normally-off Computing |
Author | *Rajendra Bishnoi, Fabian Oboril, Mehdi B. Tahoori (Karlsruhe Institute of Technology, Germany) |
Page | pp. 769 - 774 |
Keyword | Spin orbit torque, low power, flip-flop, write avoidance, power gate |
Abstract | With technology downscaling, it is very challenging to deal with static power. Conventional CMOS and Non-Volatile flip-flops cannot provide effective solution for such problem. We propose a novel Non-Volatile Non-Shadow flip-flop using Spin Orbit Torque based MTJ cells. In this design, we exploit the high speed, low energy and high reliability features of SOT devices to employ them as active components of the flip-flop. This enables efficient normally-off computing by allowing very aggressive power gating for both short and long standby periods. Experimental results show that the NVNS-FF has similar energy and timing characteristics as conventional CMOS-based flip-flops in active mode, and at the same time it allows to reduce the static power by 5X compared to backup flip-flops. |
Title | Optimal Co-Scheduling of HVAC Control and Battery Management for Energy-Efficient Buildings Considering State-of-Health Degradation |
Author | *Tiansong Cui, Shuang Chen (University of Southern California, U.S.A.), Yanzhi Wang (Syracuse University, U.S.A.), Qi Zhu (University of California, Riverside, U.S.A.), Shahin Nazarian, Massoud Pedram (University of Southern California, U.S.A.) |
Page | pp. 775 - 780 |
Keyword | HVAC Control, Battery, Smart Building |
Abstract | The heating, ventilation and air conditioning (HVAC) system accounts for half of the energy consumption of a typical building. Additionally, the need for HVAC changes over hours and days as does the electric energy price. Level of comfort of the building occupants is, however, a primary concern, which tends to overwrite pricing. Dynamic HVAC control under a dynamic energy pricing model while meeting an acceptable level of occupants' comfort is thus critical to achieving energy efficiency in buildings in a sustainable manner. Finally, there is the possibility that the building is equipped with some renewable source of power such as solar panels mounted on the rooftop. The presence of a battery energy storage system in a target building would enable peak power shaving by adopting a suitable charge and discharge schedule for the battery, while simultaneously meeting building energy efficiency and user satisfaction. Achieving this goal requires detailed information (or predictions) about the amount of local power generation from the renewable source plus the power consumption load of the building. This paper addresses the co-scheduling problem of HVAC control and battery management to achieve energy-efficient buildings, while also accounting for the degradation of the battery state-of-health during charging and discharging operations (which in turn determines the amortized cost of owning and utilizing a battery storage system). A time-of-use dynamic pricing scenario is assumed and various energy loss components are considered including power dissipation in the power conversion circuitry as well as the rate capacity effect in the battery. A global optimization framework targeting the entire billing cycle is presented and an adaptive co-scheduling algorithm is provided to dynamically update the optimal HVAC air flow control and the battery charging/discharging decision in each time slot during the billing cycle to mitigate the prediction error of unknown parameters. Experimental results show that the proposed algorithm achieves up to 15% in the total electric utility cost reduction compared with some baseline methods. |
Title | Accurate Remaining Range Estimation for Electric Vehicles |
Author | *Joonki Hong, Sangjun Park, Naehyuck Chang (Korea Advanced Institute of Science and Technology, Republic of Korea) |
Page | pp. 781 - 786 |
Keyword | Electric vehicle, Range estimation, Modeling methodology, EV power model, Regression |
Abstract | EV drivers have range anxiety because of a short driving range of the EV. In this paper, we emphasize that accurate remaining range estimation can efficiently mitigate the range anxiety of EV drivers. Just like the analogous concepts used in the power estimation of digital circuits, remaining range estimation consists of the two consecutive steps, driving profile estimation and power consumption estimation. We come up with a hybrid modeling methodology, and decreased the estimation error down to 2.52%. |