Title | Ultra-Low Power Design - the Road to Disappearing Electronics |
Author(s) | *Jan M. Rabaey (University of California, Berkeley) |
Page(s) | pp. 7 - 12 |
Abstract | Progress in semiconductor technology scaling combined with advances in wireless technology enable the emergence of a third wave of computing. For this technology, often called ambient intelligence , to really break through and to realize its lofty ambitions, a dramatic reduction in power dissipation of all its components (RF, mixed signal, digital, clock, power generation, interfaces) is required. As the size of a node is directly proportional to its power generation, storage and consumption patters, only ultra low-power design will lead to truly disappearing electronics. |
Title | Energy-Aware Dynamic Task Scheduling Applied to a Real-Time Multimedia Application on an Xscale Board |
Author(s) | Chantal Ykman-Couvreur, Francky Catthoor, Johan Vounckx, Andy Folens, Filip Louagie, *Rudy Lauwereins (IMEC) |
Page(s) | pp. 47 - 54 |
Keywords | dynamic task scheduling, system-level energy minimization, dynamic voltage scaling, real design |
Abstract | Energy consumption is a major issue when running real-time dynamic applications on portable devices. This energy is mainly dissipated on the processors, and it can be reduced by Dynamic Voltage Scaling (DVS). However for applications with dynamic behavior and task creation, this is not feasible at design time. To avoid overhead, existing run-time schedulers only have a local view inside active tasks. Our approach combines a design-time scheduling exploration with a low-complexity run-time scheduling. In this paper, a 3D rendering application, with important load variations, and running on an Xscale processor that allows two voltages, is used as case study to demonstrate the effectiveness of our scheduling. For this case study, our average energy gain compared with state-of-the-art intra-task DVS is up to 40%. |
Title | LSI Power Network Analysis with On-chip Wire Inductance |
Author(s) | *Atsushi Muramatsu, Masanori Hashimoto, Hidetoshi Onodera (Kyoto University) |
Page(s) | pp. 55 - 60 |
Keywords | pwer distribution networks, power grids, inductance |
Abstract | On-chip power/ground wires have been modeled in resistance and capacitance. However, with increase of clock frequency, on-chip wire inductance plays an important role in power/ground distribution analysis. So far we usually analyze power network with inductance of package and bonding, but without on-chip wire inductance. In this paper, we examine behaviors of LSI power network from the point of transmission line theory, and demonstrate that voltage fluctuation propagates as an electromagnetic wave. We evaluate relation between decoupling capacitance position and noise suppression effect, moreover we reveal that placing decoupling capacitance close to current load is necessary for noise reduction. We also show that impact of on-chip inductance becomes small when on-chip decoupling capacitance is well placed according to local power consumption. |
Title | Power Supply Noise Reduction with Design for Manufacturability |
Author(s) | *Hiroyuki Tsujikawa, Kenji Shimazaki, Shozo Hirano, Kazuhiro Sato, Masanori Hirofuji, Junichi Shimada, Mitsumi Ito, Kiyohito Mukai (Matsushita Electric Industrial Co., Ltd.) |
Page(s) | pp. 61 - 65 |
Keywords | Power, Voltage, Noise, DFM, CMP |
Abstract | In this paper, a solution for reducing power-supply voltage noise in LSI microchips was presented. The proposed design methodology is also considering a design for manufacturability (DFM) at the same time of power integrity. The method was successfully applied to the design of system-on-chip (SOC), achieving a 12.9~14.6% noise reduction in power-supply voltage and the uniformity of pattern density for chemical mechanical polish (CMP). |
Title | An IR-Drop Minimization by Optimizing Number and Location of Power Supply Pads |
Author(s) | *Takashi Sato (Kyoto University), Masanori Hashimoto (Osaka University), Hidetoshi Onodera (Kyoto University) |
Page(s) | pp. 66 - 72 |
Keywords | IR-drop, pad assignment, optimization, circuit reduction |
Abstract | An efficient pad assignment algorithm to minimize
voltage drop on a power distribution network is proposed. Combination
of the successive pad assignment (SPA) algorithm and the incremental
matrix inversion (IMI) provides an efficient optimization for both
location and number of power supply pads. The SPA algorithm creates an
equivalent resistance matrix for current sink points and candidate pads
only. It preserves both pad candidates and power consumption points as
external ports so that the topological modification due to connecting or
disconnecting voltage sources to or from candidate pads are suitably
represented without re-generating circuit matrices. Using sub-matrix of
the resistance matrix, SPA greedily searches next pad location that
minimizes worst voltage drop. Each time the next pad candidate to add
is tested, IMI reduces computational complexity from O(n3) to
O(n2). Experimental results show that the proposed algorithm
efficiently enumerates pad assignment order in practical time. |
Title | Topology-Oriented Design of Current Mirrors Using Evolutionary Graph Generation System |
Author(s) | *Masanori Natsui, Naofumi Homma, Takafumi Aoki (Graduate School of Information Sciences, Tohoku University, Japan), Tatsuo Higuchi (Faculty of Engineering, Tohoku Institute of Technology, Japan) |
Page(s) | pp. 78 - 84 |
Keywords | evolutionary computation, genetic algorithms, circuit synthesis, analog circuits |
Abstract | This paper presents an efficient graph-based evolutionary
optimization technique called Evolutionary Graph Generation
(EGG) for automated circuit synthesis, and its application
to the topology-oriented design of analog circuit structures.
Key features of EGG are to employ a graph-based
representation of individuals and to manipulate the graph
structures directly instead of encoding the
structures into indirect representations, such as bit
strings and trees.
The potential capability of EGG has been investigated
through the topology-oriented design of nMOS current mirrors
which are widely used as the fundamental building blocks of
many analog circuits. |
Title | MOSFET Layout Design for Electrical Performance Improvement |
Author(s) | *Philip Beow Yew Tan (Silterra Malaysia Sdn. Bhd. & University Science Malaysia), Albert Victor Kordesch (Silterra Malaysia Sdn. Bhd.), Othman Sidek (University Science Malaysia) |
Page(s) | pp. 85 - 89 |
Keywords | MOSFET, XMOS, Cross MOS, cross poly gate, ring oscillator |
Abstract | In this paper, we discuss a new method of designing the MOSFET (MOS field effect transistor) layouts to improve their electrical performance. A new poly gate layout design has been introduced. Instead of using a straight poly gate, we used a cross poly gate. By adding an extra poly line across the existing poly line at 90 degrees, we have formed the Cross MOS (XMOS) transistor. The XMOS transistor has higher drain current (Id) and switching speed compared to the conventional MOS transistor. |
Title | A Thermal-aware Sigma-Delta Modulator for CMOS Monolithic Temperature Sensors |
Author(s) | *Suhow Wu, Herming Chiueh (National Chiao Tung University) |
Page(s) | pp. 90 - 94 |
Keywords | ADC, Thermal Management, system-on-chip, temperature sensors, sigmal-delta modulation |
Abstract | In this research, a low power high accuracy analog to digital converters (ADCs) with higher and wider temperature range is designed. The proposed design plays an important role in modern system-on-chip (SoC) thermal management system, which acts as an interface circuitry between monolithic temperature sensors and thermal management unit. Several challenges exist, including temperature range, power dissipation requirements and processing technology compatibility, which cause the design of such an ADC become a difficult work. This paper examines a practical design of oversampling ADC based on sigma-delta modulation with thermal considerations. The prototype is fabricated in a TSMC 0.25ìm standard CMOS process. The experimental result achieves seven bits resolutions and dissipates only a few mw which fulfill the system requirements of targeting thermal management design. Besides these requirements, the processing scaling of proposed ADC is considered in order to be compatible with targeting SoC fabrication technology. The integration of monolithic temperature sensor, proposed ADC and thermal management unit keep the progressing trend of modern VLSI and high-density circuits without thermal problems. |
Title | Real-Time Segmentation of Large-Scale Images by Pipeline Processing with Small-Size Cell-Network |
Author(s) | *Hidekazu Adachi, Takashi Morimoto, Osamu Kiriyama, Zhaomin Zhu, Tetsushi Koide, Hans Juergen Mattausch (Research Center for Nanodevices and Systems, Hiroshima University) |
Page(s) | pp. 95 - 102 |
Keywords | Image-Segmentation, Real-Time, Cell-Network, Pipeline Processing, Large-Scale Image |
Abstract | This paper presents an image-segmentation architecture, which is based on pipeline processing with a small-size cell network, and is called Subdivided-Image-Approach (SIA). The proposed SIA divides an input images into small tiles. These tiles are then segmented in sequential order and the total image-segmentation result is obtained by putting together the partial segmentation results of all tiles. The presented architecture can complete the segmentation of QVGA-size (320x240 pixels) to SXGA-size(1280x1024 pixels) images within 10msec at operating frequencies from 2MHz to 35MHz, respectively. This performance is confirmed with a 94mm2 cell network for 41x33 pixels fabricated in 350nm CMOS technology and operating with a power dissipation between 19mW and 329mW in the required frequency range. Consequently, the requirements of mobile applications, namely low power dissipation and compact integration, can be satisfied with the proposed SIA architecture. |
Title | A Parameterized On-Chip-Bus-Compliant FDWT/IDWT Accelerator IP Generator |
Author(s) | Chih-Chun Chang, *Youn-Long Lin (National Tsing Hua University, Hsin-Chu, Taiwan 300) |
Page(s) | pp. 113 - 120 |
Keywords | DWT, JPEG2000, IP Generator, SOC |
Abstract | We propose a software tool for automatic generation of hardware accelerators for performing Discrete Wavelet Transform (DWT) with user-specified coefficient parameters. In addition to (5, 3) and (9, 7) DWT filters adopted by the next generation JPEG2000 image compression standard, other useful filters such as (9, 3), (6, 10), and (2, 2) can also be generated. The generated hardware IPs can perform both forward and inverse transform (FDWT and IDWT). We analyze variable life time for register allocation with low power consumption and apply register retiming technology to improve circuit performance. Our tool also produces on-chip-bus interface circuit compliant with the AMBA protocol together with associated device driver so that the generated IPs is ready for SOC integration. We verify the proposed approach by integrating generated IPs into an SOC platform running JPEG2000 application software. Experimental results demonstrated that the proposed approach is indeed effective in enhancing the productivity of hardware accelerator IP design. |
Title | VLSI Implementation of a 3D Sound Movement System |
Author(s) | *Nobuyuki Iwanaga, Takao Onoye (Dept. Information Systems Eng., Osaka University), Wataru Kobayashi, Kazuhiko Furuya (Arnis Sound Technologies, Co., Ltd.), Isao Shirakawa (Graduate School of Applied Informatics, University of Hyogo) |
Page(s) | pp. 121 - 125 |
Keywords | 3D sound localization, head-related transfer function, sound localization, VLSI implementation, sound movement control |
Abstract | This paper describes VLSI implementation of a sound movement system, which is to be used on mobile applications. A low cost 3D sound localization and coefficient interpolation algorithm is exploited in order to realize the sound movement at all points in 3D space. The proposed system is synthesized from Verilog-HDL description, which requires only 30,000 gates. To demonstrate the effectiveness of the proposed 3D sound movement, a prototype system is implemented on FPGA board and subjective tests are performed. |
Title | An Application Specific Network-on-Chip (ASNOC) Design with Binary Tree Architecture |
Author(s) | *Yuan-Long Jeang, Win-Hsien Huang, Wei-Feng Fang, Jain-Zhou Huang, Nan-Long Tsai, Chien-Cheng Ou (National Kaohsiung University of Applied Sciences,Department of Electronic Engineering,Kaohsiung, Taiwan, R.O.C) |
Page(s) | pp. 137 - 142 |
Keywords | System-on-a-chip, network on chip, globally asynchronous network, locally synchronous bus, wormhole routing |
Abstract | A mix-mode network on-chip (NOC) interconnection architecture is proposed in this paper. The proposed architecture makes use of a globally asynchronous communication network and a locally synchronous bus. First, a local bus is given for a group of cores so that all communications within this local bus are exclusive in time. In order to represent the ratio of communications of this local bus, user has to provide a communication ratio (CR) of each pair of local bus groups. After that, the two local buses with the highest CR are grouped to be the first switching point for the globally asynchronous network. Then, one can regard the two groups using a switching point as a new group. The new CR hence can be determined from the new and each other local bus group. Similar process is performed to form the next switching point. Finally, a binary tree (BT) is built by setting each internal tree node a switching point while each leaf a local bus. In addition, the switching circuit cost can be decreased while the performance is increased. The simulation results show that the proposed architecture of NOC is better than the general purposed SPIN architecture [4]. |
Title | The Design and Implementation of a Multimedia Coprocessor for ARM7 Microprocessors |
Author(s) | *Tse-Chen Yeh, Ing-Jer Huang (Dept. of Computer Science and Engin., National Sun Yat-Sen University) |
Page(s) | pp. 143 - 147 |
Keywords | coprocessor, multimedia, acceleration |
Abstract | For satisfying the requirement of multimedia
computation, we propose a multimedia coprocessor (MMCOP)
for ARM7 microprocessors. MMCOP can accelerate the typical
multimedia calculation by SIMD execution. To suit different
multimedia application, we implement adaptive long transfer
instructions of MMCOP. In this paper, we explain design and
verification of ARM7 coprocessor. And we also provide the
comparison of synthesis report on coprocessor integration and extension instruction set of ARM microprocessor. |
Title | A Simulation Environment for Asynchronous Codesign |
Author(s) | *Satoshi Tsutsumi, Hideharu Amano (Keio University) |
Page(s) | pp. 155 - 161 |
Keywords | codesign, simulation, asynchronous, SystemC |
Abstract | Recent advanced semiconductor processes enable to
integrate numerous and complex systems on a single chip.
The modularity derived from asynchronous design becomes
favorable in system performance or design productivity to
design large and complex systems in an advanced processes.
In this paper, a HW/SW codesign system for large-scale
asynchronous systems is discussed. As the first step,
simulation library for asynchronous systems
is developed for verifying functionality and estimating
performance in the early stage of design.
Here, a simulator for SoC (System on a Chip) design using
the library, which includes asynchronous CPU and dedicated
hardware is developed, and the simulation speed is evaluated. |
Title | Extracting Structural and Communication Information of SystemC Descriptions |
Author(s) | *Fábio Prudente, Edna Barros (Centro de Informatica, Universidade Federal de Pernambuco) |
Page(s) | pp. 162 - 168 |
Keywords | system level design, systemc descriptions |
Abstract | System-Level design languages are needed, in the System-on-a-Chip (SoC) scenario, in order to model complex and heterogeneous systems at distinct abstraction levels, and to allow design space exploration and validation of such systems. SystemC is becoming a de-facto standard design language, addressed to these needs. Although SystemC is ANSI C++ compliant, standard C++ tools are not suited for system modeling. A standard C++ tool is not able recognize some SystemC constructs. Instead, only classes, its fields and methods, from the underlying C++, will be seen. Without understanding the SystemC semantics, EDA tools are not able to perform any specific analysis over the modeled system. In this work, we present an approach for recognizing and extracting SystemC constructs from an Abstract Semantic Graph (ASG) generated by a based on gcc front-end tool. The resulting graph is about thousand times smaller than the ASG graph obtained by the gcc based extraction tool. |
Title | Software Execution Time Back-annotation Method for High Speed Hardware-Software Co-simulation |
Author(s) | *Michiaki Muraoka, Noriyoshi Itoh, Rafael K. Morizawa, Hiroyuki Yamashita , Takao Shinsha (STARC) |
Page(s) | pp. 169 - 175 |
Keywords | System level design, Hardware-software co-simulation, Architecture simulation, Transaction level model, Time back-annotation |
Abstract | This paper describes an acceleration method of the software simulation speed by back-annotating the execution time of the software on a target embedded CPU for a hardware-software co-simulation. This time back-annotation algorithm consists of three major steps: The basic block extraction from the software, the execution time calculation of the basic blocks and the time annotation into the software source codes. The simulation performance of the resulting time back-annotated software source codes, using a C-based hardware-software co-simulator, is more than ten times faster than the conventional ISS when the execution time on a target embedded CPU is taken into consideration. This means the simulation speed will be sufficient for a system-level verification, architecture-level performance evaluation and software verification of SoCs. |
Title | Wire Length Distribution of SoC considering Macro Block Shapes |
Author(s) | *Takanori Kyogoku, Hidenari Nakashima, Junpei Inoue, Naohiro Takagi, Hiyouko Shinoki, Kenichi Okada , Kazuya Masu (Precision and Intelligence Laboratory, Tokyo Institute of Technology) |
Page(s) | pp. 176 - 180 |
Keywords | Wire length distribution, SoC, circuit complexity |
Abstract | A system on chip (SoC) contains many macro blocks. In this paper, we propose a wire length distribution model for global interconnect between macro blocks. The proposed model can consider macro cell shapes, placement, and interac-tion between logic complexity and inter-macro wire length. The wire length distribution of SoC is derived from the num-ber of input terminals of each macro cell. |
Title | Interconnect Synthesis for Lithography and Manufacturability in Deep Submicron Design |
Author(s) | Sameer Pujari (SUNY Binghamton ECE), Ryon M. Smey (SUNY Binghamton CSD/InternetCAD), Tan Yan (University of Kitakyushu), Hannibal H. Madden (AVS), *Patrick H. Madden (SUNY Binghamton CSD/University of Kitakyushu) |
Page(s) | pp. 181 - 188 |
Keywords | lithography, manufacturability, interconnect, routing |
Abstract | Effective design for large VLSI systems requires abstraction; problems
are simply too complex to be addressed directly. To obtain good
results, it is necessary that the abstractions still capture the basic
nature of the problem.
Deep submicron lithography has placed new constraints on circuit
layout; these constraints are frequently counterintuitive, and are
hard to model with current design rules. When design tools ignore the
constraints, there is a need for a great deal of ``back end'' work to
fix violations--the difficulty of these fixes has resulted in a push
towards very restrictive design rules.
To enable aggressive design without major violations, better
abstractions are needed. In this paper, we focus on circuit
interconnect, and develop a ``straw man''
approach to considering the lithography and manufacturability
challenges of deep submicron design.
We propose Mead-and-Conway style rules, and the concept of a ``virtual
layer'' to separate mask-based constraints from silicon-based
constraints. We also discuss a prototype routing tool that uses the
virtual layer approach. |
Title | Compact Modeling and Experimental Verification of Substrate Resistance in Lightly Doped Substrates |
Author(s) | Hai Lan, Tze Wee Chen, Chi On Chui, *Robert W. Dutton (Stanford University) |
Page(s) | pp. 189 - 195 |
Keywords | substrate noise, compact model, mixed-signal |
Abstract | This paper presents a synthesized compact modeling methodology for substrate noise coupling in lightly doped silicon process. Rigorous 3-D device simulations reveal the distinctive decaying trend of noise coupling in far field and near field regions. A new compact, scalable model is proposed to accommodate both far field and near field effects as well as contact sizes, perimeters, and separations. A test chip in a lightly doped P-type substrate, consisting of various combinations of substrate noise coupling configurations, is fabricated and tested. The proposed compact model is validated by the measurement data from the test chip. |
Title | Realization of Digital Noise Emulator for Characterization of Systems Exposed to Substrate Noise |
Author(s) | Yi-Chang Lu, Jae Wook Kim (Stanford University), *Nobuhiko Nakano (Keio University), David Colleran (Stanford University), Patrick Yue (Carnegie Mellon University), Robert W. Dutton (Stanford University) |
Page(s) | pp. 196 - 203 |
Keywords | substrate noise |
Abstract | Frequency and timing of digital clocks, digital switching activities, and number of transistors in digital blocks are the key parameters to model switching noise generated by complicated digital systems. In this paper, a Digital Noise Emulator (DNE) is implemented to study how these parameters impact the performance of a ring-typed-VCO-based PLL. In addition, the proposed DNE can be used for noise cancellation to improve PLL performance in the presence of deterministic noise. |
Title | Passive-Assured Rational Function Approach for Compact Modeling of On-chip Passive Components |
Author(s) | *Zuochang Ye, Zhiping Yu (Institute of Microelectronics, Tsinghua University) |
Page(s) | pp. 204 - 208 |
Keywords | compact modeling, rational function, passivity |
Abstract | A linear passive network can be modelled to be terminal-equivalent by approximating its admittance matrix Y using rational functions in the frequency domain. The physical behavior of the original system mandates that the generated model should not release more energy than it absorbs, i.e., the passivity must be guaranteed. Previous modeling approaches using rational function can only assure passivity at the sampling frequencies. In this paper, a new method is proposed, which is based on the linearization of a so-called discriminant polynomial and guarantees the model passivity in the entire frequency range. The method uses the linear-constrained least-squares programming and is applicable to general N-port networks. The detailed algorithm and results from the application of the method to an on-chip spiral inductor in RF CMOS design are presented. |
Title | Statistical Analysis of Clock Skew Variation |
Author(s) | *Masanori Hashimoto, Tomonori Yamamoto, Hidetoshi Onodera (Kyoto University) |
Page(s) | pp. 214 - 219 |
Keywords | clock skew, statistifal analysis, manufacturing variability, power supply fluctuation, temperature gradient |
Abstract | This paper discusses clock skew due to manufacturing variability and
environmental change. In clock tree design, transition time constraint is
an important design parameter that controls clock skew and power dissipation.
In this paper, we evaluate clock skew under several variability models,
and demonstrate the relationship among clock skew, transition time
constraint and power dissipation. |
Title | Statistical Timing Analysis with Global Variations and Path Reconvergence |
Author(s) | Lizheng Zhang, Yuhen Hu (University of Wisconsin-Madison), *Charlie Chung-Ping Chen (National Taiwan University) |
Page(s) | pp. 220 - 227 |
Keywords | statistical timing analysis, correlation |
Abstract | Block based statistical timing analysis (STA) tools often yield
less accurate results when timing variables become correlated
due to global source of variations and path reconvergence.
To the best of our knowledge, no good solution is available handling
both types of correlations simultaneously.
In this paper, we present a novel statistical timing algorithm, AMECT
(Asymptotic MAX/MIN approximation & Extended Canonical Timing model),
that produces accurate timing estimation by handling both types of
correlations simultaneously.
Firstly, a linear mixing operator is used to approximate the nonlinear
MAX/MIN operator by moment matching. Secondly, an extended
canonical timing model is developed to evaluate and decompose correlations
between arbitrary timing variables. Finally, an intelligent pruning method
is designed enabling trade-off runtime with accuracy.
Tested with ISCAS benchmark suites, AMECT shows both high accuracy
and high performance compared with Monte Carlo simulation results:
with distribution estimation error <1.5% while with around 350X speed up on
a circuit with 5355 gates. |
Title | SIRE/M: A Homogeneous Intermediate Format for VHDL-AMS Mixed Signal Simulation |
Author(s) | *Hamid Reza Ghasemi, Zainalabedin Navabi (CADLAB, ECE Department, University of Tehran) |
Page(s) | pp. 228 - 234 |
Keywords | VHDL-AMS, Mixed Signal Simulation, SIRE/M, simulation |
Abstract | Abstract- VHDL-AMS is a recent modeling language that is suitable for modeling mixed signal circuits. A straight forward implementation of a VHDL-AMS simulator is using two separate analog and digital simulation engines. Using this approach results in an easy implementation, but the performance of the resulting simulator is degraded because of some basic overheads. We present an alternative approach that eliminates the overheads using intermediate formats to build mixed engines. In this paper we introduce an extensible object oriented intermediate format (SIRE/M ) that improves the performance by eliminating overheads based on this approach. Then we describe the implementation of a mixed signal simulation engine based on SIRE/M and justify the improvements by several mixed signal examples. |
Title | A Layout-Aware Circuit Sizing Model Using Parametric Analysis |
Author(s) | *I-Lun Tseng, Adam Postula (School of ITEE, The University of Queensland, Australia) |
Page(s) | pp. 235 - 240 |
Keywords | GBLD, parameterized layout, parametric analysis, circuit sizing |
Abstract | We propose a circuit sizing model that takes layout parasitics into account. The circuit and layout parameters are stored in a parameterized layout description format, GBLD. The layout parasitics are stored as closed form expressions. Layout optimization tools can modify the layout and recalculate parasitics on the fly. If the results of sensitivity analysis are passed to those tools, optimization for performance can be achieved with relatively few iterations involving time consuming circuit simulations. |
Title | LARTTE: A Posynomial-Based Lagrangian Relaxation Tuning Tool for Fast and Effective Gate-Sizing and Multiple Vt Assignment |
Author(s) | Hsinwei Chou (University of Wisconsin-Madison), Yu-Hao Wang (Incentia Design Systems), *Charlie Chung-Ping Chen (National Taiwan University) |
Page(s) | pp. 241 - 248 |
Keywords | Circuit Tuning, Gate Sizing, Multiple Vt, Timing Optimization, Power Reduction |
Abstract | In this paper, we propose a novel method for fast and effective gate-sizing and multiple Vt assignment using Lagrangian Relaxation (LR) and posynomial modeling. Our algorithm optimizes a circuit's delay and power consumption subject to slew rate constraints, and can readily take process variation into account. We first use SPICE to generate accurate delay and power models in posynomial form for standard cells, then formulate a large-scale, convex optimization problem based on these models. Finally, we perform LR to solve for the globally-optimal (optimality is with respect to the posynomial approximation-based optimization problem, without discretization.) set of transistor sizes and Vts (with discretization) for each gate. Our key contribution is that we show for the first time that using posynomial models, LR-based circuit tuning can be carried out in a |
Title | Zero Overhead Loop Techniques for Application Specific Instruction-set Processors |
Author(s) | *Shinsuke Kobayashi (University of Tokyo), Kentaro Mita, Yoshinori Takeuchi, Masaharu Imai (Osaka University) |
Page(s) | pp. 255 - 261 |
Keywords | ASIP, Zero Overhead Loop, Processor Synthesis, Compiler Generation |
Abstract | We propose a new retargetable Zero
Overhead Loop (ZOL) techniques for Application Specific
Instruction-set Processors. The techniques consist
of configurable hardware and retargetable software development
tools including compiler and assembler. By
defining the parameters for configuration including loop
block indication, nesting levels, and the number of iterations,
the hardware can be customized. Moreover, the
method can generate the target compiler and assembler
that can deal with ZOL instructions, considering the
parameters of retargetable hardware. In experiments,
various ZOL type processors were designed. It took
only about 4 hours to design 19 processors. The results
show that designers using our method can explore the
ASIP design space including the ZOL efficiently. |
Title | Behavioral Model Construction for Formal Verification of Advanced On-chip Bus Protocols |
Author(s) | *Yosuke Kakiuchi (Osaka University), Akira Kitajima (Osaka Electro-Communication University), Kiyoharu Hamaguchi, Toshinobu Kashiwabara (Osaka University) |
Page(s) | pp. 282 - 289 |
Keywords | verification, bus protocol, behavioral model, AMBA, CWL |
Abstract | This paper addresses verification of interactions between IPs, that is, bus protocol with formal techniques.
Usual models for verification of bus protocols are composed only from finite state machines, and as a result, it is difficult to describe complex protocols. In this paper, we propose an extended behavior model so that we can handle split transactions or pipeline transactions more naturally and show how to construct models from formal specifications written in some language. We also report experimental results on verification of the AMBA protocol specification with symbolic model checker.
This shows that, even through rather complex behavioral models, formal verification of bus protocols with advanced features can be done in a reasonable time. |
Title | On Debugging Assistance in Assertion-Based Verification |
Author(s) | *Bao-Ren Huang, Tzung-Jr Tsai, Chien-Nan Liu (Dept. of EE, National Central University) |
Page(s) | pp. 290 - 295 |
Keywords | assertion, debigging, HDL |
Abstract | In the verification process, debugging is also a hard and time-consuming process and is often done by designers themselves. Because most design errors occur in the early design stages, there are also some approaches proposed for debugging HDL designs. The authors in [7] proposed a method to give a rank to each error candidate. In this way, the debugging efforts can be reduced because designers only have to trace several items in the front of list. However, due to the lack of internal information of the circuit, the estimation of error possibility may still not very accurate. In this paper, we propose a method to use the extra observability provided by assertions to make a better estimation of error possibility. Using our approach, the error ranking can be more accurate than that in previous approach such that the debugging efforts can be further reduced. The effectiveness of our improvements can be shown in the experiments. |
Title | Matlab based Environment for Designing DSP Systems using IP Blocks |
Author(s) | Nacer-Eddine Zergainoh, Katalin Popovici, *Ahmed Amine Jerraya (TIMA Laboratory), Pascal Urard (STMicroelectronics) |
Page(s) | pp. 296 - 302 |
Keywords | system-level design, DSP, IP-based design, Matlab, SystemC |
Abstract | In this paper, we propose an efficient IP block based design environment for high throughput VLSI Systems. The flow generates SystemC Register Transfer Level (RTL) architecture, starting from a Matlab functional model described as a netlist of functional IP. The refinement model inserts automatically control structures to manage delays induced by the use of RTL IPs. It also inserts a control structure to coordinate the execution of parallel clocked IP. The delays may be managed by registers or by counters included in the control structure. The experimentations show that the approach can produce efficient RTL architecture and allow for huge saving of time. |
Title | Arithmetic Description Language and Its Application to Parallel Multiplier Design |
Author(s) | *Naofumi Homma, Kazuya Ishida, Takafumi Aoki (Tohoku University), Tatsuo Higuchi (Tohoku Institute of Technology) |
Page(s) | pp. 319 - 326 |
Keywords | computer arithmetic, hardware description language, formal verification, hardware algorithms, number system |
Abstract | This paper presents the basic concept of arithmetic description language called ARITH. The use of ARITH makes possible (i) formal description of arithmetic algorithms including those using unconventional number systems, (ii) formal verification of described arithmetic algorithms, and (iii) translation of arithmetic algorithms to equivalent HDL codes. In this paper, we demonstrate the potential of ARITH through an experimental design of parallel multipliers using radix-4 modified Booth's algorithm and (4;2) compressor tree architecture. |
Title | Area efficient Wave-Pipelined Adder Using Redundant Binary Encoding |
Author(s) | *Tatsuya Yamamoto, Kiyofumi Tanaka (Japan Advanced Institute of Science and Technology ) |
Page(s) | pp. 327 - 332 |
Keywords | wave-pipeline, redundant arithmetic unit, circuit design |
Abstract | With the popularization of multimedia applications, there is
a demand of high speed data processing.
Wave-pipeline is one of the techniques for raising the
throughput of arithmetic units used in these applications.
We replace the conventional arithmetic unit with
Redundant Arithmetic Unit (RAU) in order to prevent the increase of
the area before and after wave-pipelining. For all bits, the difference
between maximum and minimum propagation delays are almost the same in RAU.
Therefore, very few delay elements are needed when designing a wave-pipelined
RAU.
Also, according to our experimental results on adders,
the area of a wave-pipelined RAU is much smaller than
a wave-pipelined ripple adder and carry-lookahead adder. |
Title | Error Diagnosis Technique Based on Boolean Resubstitution |
Author(s) | Toshifumi Sugane, *Ryosuke Arai, Takayuki Iida, Hiroshi Inoue, Nobutaka Kuroki, Masahiro Numa (Kobe University) |
Page(s) | pp. 339 - 344 |
Keywords | error diagnosis, design error, ECO, incremental synthesis, Boolean resubstitution |
Abstract | In an LSI design process, Engineering Change Orders (ECO’s) are often given due to logic design errors, changes of specification, and timing issues. In these cases, it is impossible to assume the type of correction. Therefore, a rectification technique for various kinds of error models is needed. This paper presents an error diagnosis technique with insertion of additional circuits to be applicable to various kinds of error models. By this technique, after locating the insertion part and computing the function of the additional circuit, rectification is performed by re-synthesis of the additional circuit. Moreover, the size of the additional circuit is reduced by reusing the existing circuit based on Boolean re-substitution. |
Title | Extraction of Subcircuits for Incremental Synthesis Based on Error Diagnosis |
Author(s) | *Takayuki Iida, Toshifumi Sugane, Takahiro Iwasaki, Hiroshi Inoue, Nobutaka Kuroki, Masahiro Numa, Keisuke Yamamoto (Kobe University) |
Page(s) | pp. 345 - 350 |
Keywords | ECO, incremental synthesis, error diagnosis, design error |
Abstract | We present an approach to extract subcircuits that may contain design errors from larger circuit for incremental synthesis based on an error diagnosis technique. This approach employs not only structural matching between two circuits, but also functional matching of the equivalent points based on BDD's. Experimental results have shown that the proposed subcircuit extraction technique is effective to enlarge the circuit size to be diagnosed and to shorten the processing time for logic diagnosis. |
Title | A Patchwork-like Partitioning Method for Engineering Change Orders in Redesign of High Performance LSIs |
Author(s) | *Yuichi Nakamura, Ko Yoshikawa (NEC Corp.), Takeshi Yoshimura (Waseda Univ.) |
Page(s) | pp. 351 - 356 |
Keywords | ECO, Logic Design, Partitioning |
Abstract | This paper presents a novel design method for large-scale, high performance LSIs based on a patchwork-like partitioning technique for ECOs (engineering change orders). Even if only small design changes are made after the place and route process, applying a re-layout can be very time consuming. The proposed method partitions the design into several parts before the first place and route. When design changes occur in HDL, several parts relating to the changes are simply exchanged. The netlist for the changed design remains almost the same as the original one. For the partitioning, we use multiple-fan-out-points as partition borders. Experimental evaluation of our method showed that when a small change was made in the RTL description, the revised part of the circuit had only about 87 gates in the average case. This greatly reduces the re-layout time required to implement an ECO. |
Title | Analog Topological Placement with Symmetry Constraints Using a O(n loglog n) Evaluation Algorithm |
Author(s) | Karthik Krishnamoorthy, Sarat C. Maruvada, *Florin Balasa (University of Illinois at Chicago) |
Page(s) | pp. 379 - 386 |
Keywords | placement, analog layout, topological representations, B-trees |
Abstract | This paper presents a novel algorithm for device-level
analog placement with symmetry constraints. Based on the
exploration of symmetric-feasible binary tree representations
of the layout, the novel approach employs an efficient model
of priority queue introduced by Johnson and brought to the
attention of the CAD community by Tang and Wong. The use of
this data structure entails a worst-case complexity of
O(n loglog n) for each code evaluation, which is better than
of any other existent topological placement algorithm
supporting symmetry. |
Title | Floorplan Design for 3-D ICs |
Author(s) | Lei Cheng, Liang Deng, *Martin D.F. Wong (University of Illinois at Urbana-Champaign) |
Page(s) | pp. 395 - 401 |
Keywords | floorplan, slicing, constraint, 3D |
Abstract | In this paper we present a floorplanning algorithm for
3-D ICs. The problem can be formulated as that of packing a given set of 3-D rectangular blocks while minimizing a suitable cost function. Our algorithm is based on a generalization of the classical 2-D slicing floorplans to 3-D slicing floorplans. A new encoding scheme of slicing floorplans (2-D/3-D) and its associated set of moves form the basis of the new simulated annealing based algorithm. The best known algorithm for packing 3-D rectangular blocks is based on simulated annealing using sequence-triple floorplan representation. Experimental results show that our
algorithm produces packing results on average 3% better than the sequence-triple based algorithm under the same annealing parameters, and our algorithm runs much faster (17 times for problems containing $100$ blocks) than the sequence-triple. Moreover, our algorithm can be extended to handle various types of placement constraints while the sequence-triple based algorithm does not have such capabilities. Finally, when specializing to 2-D problems, our algorithm is a new 2-D slicing floorplanning algorithm. We are excited to report the surprising results that our new 2-D floorplanner has produced slicing floorplans for the
two largest MCNC benchmarks ami$33$ and ami$49$ which have the smallest areas (among all slicing/non-slicing floorplans) ever reported in the literature. |
Title | Minimization in the Number of Empty Rooms on Floorplan by Dissection Line Merge |
Author(s) | *Chikaaki Kodama, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology) |
Page(s) | pp. 402 - 407 |
Keywords | Sequence-pair, empty room, floorplan , dissection line, wiring channel |
Abstract | This paper discusses how to minimize the number of empty rooms
on a floorplan corresponding to a placement of $n$ modules.
In a floorplan, a room is said to be empty
if no module is assigned to it.
Since a floorplan obtained from a given module placement may
have redundant empty rooms,
redundant wiring channels and wire bends may be also generated.
Hence, in order to reduce redundant channels and wire bends,
removal of empty rooms is required.
For this purpose, we formulate a problem of removing
as many empty rooms as possible by merging dissection lines
on a floorplan converted from sequence-pair extracted by FAST-gridding.
Then, we propose a method of obtaining the optimal solution
of the above problem in $O(n)$ time.
The number of empty rooms in the resultant floorplan is reduced to
$n-\lfloor \sqrt{4n-1} \rfloor$ or less. |
Title | Area-Array I/O Clustering in Design Cost and Performance Optimization |
Author(s) | *Hung-Ming Chen (Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan), I-Min Liu (Cadence Design Systems), Martin D.F. Wong (ECE Department, University of Illinois at Urbana-Champaign, Urbana, IL 61801) |
Page(s) | pp. 408 - 413 |
Keywords | Area-Array I/O, Optimization, Physical Design |
Abstract | I/O placement has always been a concern in modern IC design.
Due to flip-chip technology, I/O can be placed throughout the whole chip without long wires from the periphery of the chip. However, because of I/O placement constraints in design cost and performance, I/O buffer planning becomes a pressing problem. During the early stages of circuits
and packaging co-design, I/O layout should be evaluated to optimize design cost and to avoid product failures.
In this paper, our objective is to better an existing/initial standard cell placement by I/O clustering, considering design cost reduction and signal integrity preservation. We formulate it as a minimum cost flow
problem minimizing $\alpha W + \beta D$, where $W$ is the I/O wirelength of the placement and $D$ is the total voltage drop in the power network. The experimental results on the MCNC benchmarks show that our method achieves better timing performance and averagely over 30\% design cost reduction when compared with the conventional design rule of thumb popularly used by circuit designers. |
Title | Application of LUT Cascades to Numerical Function Generators |
Author(s) | Tsutomu Sasao (Kyushu Institute of Technology), *Jon T. Butler (Naval Postgraduate School), Marc D. Riedel (California Institute of Technology) |
Page(s) | pp. 422 - 429 |
Keywords | LUT cascade, numeric circuit, piecewise linear approximation |
Abstract | The availability of large, inexpensive memory has made it possible to realize numerical functions, such as the reciprocal, square root, and trigonometric functions, using a look-up table. This is much faster than by software. However, a naive look-up method requires unreasonably large memory. In this paper, we show the use of a look-up table (LUT) cascade to realize a piecewise linear approximation to the given function. Our approach yields memory of reasonable size and is significantly more accurate than any existing method. |
Title | A Design Algorithm for Sequential Circuits using LUT Rings |
Author(s) | *Hiroki Nakahara, Tsutomu Sasao, Munehiro Matsuura (Department of Computer Science and Electronics Kyushu Institute of Technology) |
Page(s) | pp. 430 - 437 |
Keywords | Programmable logic device, Reconfigurable logic, LUT cascade, Logic synthesis, BDD-for-CF |
Abstract | This paper shows a design method for a sequential circuit by using
a Look-Up Table (LUT) ring. An LUT ring consists of memories,
a programmable interconnection network, a feed-back register,
an output register, and a control circuit. It sequentially emulates
an LUT cascade that represents the state transition functions
and the output functions. We present two algorithms for synthesizing
a sequential circuit by an LUT ring: The first one partitions the
outputs into groups, and realize them by LUT cascades. The second
one reduces the evaluation time by using unused memories. We also
compare the LUT ring with other methods to realize sequential circuits. |
Title | Cost Functions for the Design of Dynamically Reconfigurable Processor Architectures |
Author(s) | *Tobias Oppold, Thomas Schweizer, Tommy Kuhn, Wolfgang Rosenstiel (University of Tuebingen) |
Page(s) | pp. 443 - 450 |
Keywords | reconfigurable, cost functions |
Abstract | There are a growing number of reconfigurable architectures that combine the advantages of a hardwired implementation (performance, power consumption) with the advantages of a software solution (flexibility, time to market). Today, there are devices on the market that can be dynamically reconfigured at run-time within one clock cycle. But the benefits of these architectures can only be utilized if applications can be mapped efficiently. In this paper we describe a design approach for reconfigurable architectures that takes into account the three aspects architecture, compiler, and applications. To realize the proposed design flow we developed a synthesizable architecture model. From this model we obtain estimations for speed, area, and power that are used to provide the compiler with the necessary timing information and to optimize the architecture. |
Title | Dynamic Reconfigurable RF Circuit Design |
Author(s) | *Kenichi Okada, Yoshiaki Yoshihara, Hirotaka Sugawara, Kazuya Masu (Tokyo Institute of Technology) |
Page(s) | pp. 451 - 457 |
Keywords | Reconfigurable, RF, VCO, process fluctuation |
Abstract | This paper proposes a dynamic reconfigurable architecture for analog RF circuits. The architecture consists of RF circuits and a control circuit. The RF circuits can be reconfigured by vias voltages of transistors and variable passive components, and they can also be switched by the block level. The proposed architecture can realize the multi-band/mode RF circuit in single chip for the Software Defined Radio, which achieves considerable reduction of circuit area and power consumption. On the other hand, we can obtain robust RF circuits by the dynamic reconfiguration for the process fluctuation, the dynamic change of temperature, etc. |
Title | Asynchronous Dynamically Reconfigurable Logic LSIs Suitable For Technology Scaling |
Author(s) | *Hideyuki Ito, Ryusuke Konishi, Hiroshi Nakada, Yuichi Okuyama, Akira Nagoya (NTT Corporation), Tomonori Izumi, Yukihiro Nakamura (Kyoto University) |
Page(s) | pp. 458 - 465 |
Keywords | Reconfigurable Hardware, Asynchronous Circuit, Technology Scaling, Dynamic Reconfiguration |
Abstract | Results from development of a couple of dynamically reconfigurable logic LSIs based on the asynchronous circuit design are described. The LSIs, PCA-1 and PCA-2, are developed to realize the concept of Plastic Cell Architecture (PCA), which is oriented toward general purpose computing based on autonomous reconfigurability of hardware. In addition to a structural feature of PCA as the homogeneous cell array, we introduced the asynchronous circuit design as an efficient realization of the original PCA concept. Consequently, developed LSIs acquired an ultimate local connectivity. Comparison between circuit performances of PCA-1 and PCA-2 shows suitability for miniaturization. |
Title | An Optimization Method in Floating-point to Fixed-point Conversion using Positive and Negative Error Analysis and Sharing of Operations |
Author(s) | *Nobuhiro Doi (Waseda University), Takashi Horiyama (Kyoto University), Masaki Nakanishi (Nara Institute of Science and Technology), Shinji Kimura (Waseda University) |
Page(s) | pp. 466 - 471 |
Keywords | HDL, compiler, high-level synthesis, fixed-point optimization, range estimation |
Abstract | In application specific hardware design, designers usually
use fixed-point operations because of the area and speed.
So if the reference algorithms include floating point
operations, designer should do the tedious jobs converting
floating point operations to fixed point operations. The
paper shows an automatic conversion method using non-linear
programming technique. For the conversion, we use the positive
and negative error analysis on the control data flow graph.
We also considered the sharing of registers and operation
units in the conversion. The effect of the proposed method
is shown on several examples. |
Title | Dynamic Voltage and Frequency Scaling Techniques for Heterogeneous Multi-Processor Architecture in Future Nanometer Technologies |
Author(s) | *Yutetsu Takatsukasa, Kazutoshi Kobayashi, Hidetoshi Onodera (Department of Communications and Computer Engineering, Kyoto University) |
Page(s) | pp. 477 - 482 |
Keywords | DVS, multi-processor, leakage, heterogeneous |
Abstract | In this paper, we evaluate power consumption of a heterogeneous multi-processor architecture in a dynamic voltage and frequency scaling environment. The heterogeneous architecture has a capability to reduce total power including active and leakage power in the future leaky nanometer process, since it can frequently turn off unused processors.
The energy consumption of a processor depends on its load and the amount of leakage current. In the nanometer era in which the leakage current is inevitable, a heterogeneous multi-processor consumes less energy compared to a homogeneous multiprocessor or a single processor.
On a multi-processor architecture, the load imbalance among processors reduces efficiency of DVFS. However, a thread relocation can relieve the load imbalance. A simple thread relocation reduces 20% of energy consumption; hence, the heterogeneous multi-processor system can cope with widespread load range. |
Title | A Sub-Operation Parallelism Optimization Algorithm in HW/SW Partitioning for SIMD Processor Cores |
Author(s) | *Hideki Kawazu, Jumpei Uchida, Yuichiro Miyaoka (Dept. of Computer Science, Waseda University), Nozomu Togawa (Dept. of Information and Media Science, The University of Kitakyushu), Masao Yanagisawa, Tatsuo Ohtsuki (Dept. of Computer Science, Waseda University) |
Page(s) | pp. 483 - 490 |
Keywords | processor synthesis, SIMD type instruction, hardware/software partitioning, hardware/software cosynthesis, sub-operation parallelism |
Abstract | A b-bit SIMD functional unit has n k-bit sub-functional units in itself, where b = k * n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a processor core do not necessarily execute n-parallel operations. Depending on an application program, some
of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a processor core with small area under the given timing constraint. We expect that we can obtain processor core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown. |
Title | Highly Efficient Switch Architecture Based on Banked Memory with Multiple Ports |
Author(s) | *Takayuki Fujii (Hiroshima University), Kazuhiko Kobayashi (Hiroshima City University), Tetsushi Koide, Hans Juergen Mattausch (Hiroshima University), Tetsuo Hironaka (Hiroshima City University) |
Page(s) | pp. 491 - 498 |
Keywords | network switch, multi-port memory, bank structure |
Abstract | The rapid increase in the Internet traffic requires a drastic performance improvement of the network switches which serve as the connection nodes of the network. However, the required improvement is hardly expectable with the existing switch structures.
In this paper, we propose a network switch which uses a bank structure multi-port memory as switch-fabric solution for high switching performance.
Furthermore, an efficient packet-scheduling algorithm is developed for maximized throughput.
The simulation results of the performance improvement verify a 120% increase in throughput when compared with a conventional crossbar structure with equal resource in memory storage capacity. |
Title | Life at the End of CMOS Scaling |
Author(s) | Organizer & Moderator: Rob Rutenbar (Carnegie Mellon University), Panelists: Sani Nassif (IBM Austin Research Labs), Jan Rabaey (University of California, Berkeley), H.-S. Philip Wong (Stanford University), Kazuo Yano (Hitachi Ltd.) |
Page(s) | p. 501 |
Abstract | It is obvious that CMOS will continue to scale aggressively for at least one or two more decades. Devices have been demonstrated working at roughly 10nm. However, there is a large gap between
"working devices and design methodology". Manufacturing variations
continue to increase at all length scales - from neighborhoods with a
few devices to entire chips and wafers. The devices themselves
continue to behave less well, e.g., leakage is an enormous problem now
and continues to worsen. How will we cope with these problems as we
continue toward the end of the CMOS semiconductor roadmap? And what
happens after CMOS? Is there a most promising alternative emerging
from the set of competing post-CMOS options? The panelists will
discuss both problems and solutions, ranging from devices to circuits
to systems. |