The 12th Workshop on Synthesis And System Integration of Mixed Information technologies
Final Technical Program

Remark: The presenter of each paper is marked with "*".

Technical Program: SIMPLE version DETAILED version

Author Index: HERE

Opening
Session Type: Lecture
Time: Monday October 18, 2004, 9:15 - 9:30
Location: Kaga

Keynote
Session Type: Lecture
Time: Monday October 18, 2004, 9:30 - 10:45
Location: Kaga
Chairperson(s): Hidetoshi Onodera (Kyoto Univ.)

Title	Future Directions of Silicon Devices
Author(s)	*Chenming Hu (University of California, Berkeley)
Page(s)	pp. 3 - 4
Abstract	CMOS device scaling is facing formidable challenges in meeting the expectation of high performance, low power, and high density. It is also poised to seize some new opportunities such as mobility scaling to overcome the speed/power barrier, mew gate-stack materials and/or new device structures to overcome the gate-length/leakage barrier, and nonvolatile memory and universal memory to enlarge the market.

Invited Talk 1
Session Type: Lecture
Time: Monday October 18, 2004, 11:00 - 11:45
Location: Kaga
Chairperson(s): Tadahiro Kuroda (Keio Univ.)

Title	Ultra-Low Power Design - the Road to Disappearing Electronics
Author(s)	*Jan M. Rabaey (University of California, Berkeley)
Page(s)	pp. 7 - 12
Abstract	Progress in semiconductor technology scaling combined with advances in wireless technology enable the emergence of a third wave of computing. For this technology, often called ambient intelligence , to really break through and to realize its lofty ambitions, a dramatic reduction in power dissipation of all its components (RF, mixed signal, digital, clock, power generation, interfaces) is required. As the size of a node is directly proportional to its power generation, storage and consumption patters, only ultra low-power design will lead to truly disappearing electronics.

Invited Talk 2
Session Type: Lecture
Time: Monday October 18, 2004, 13:15 - 14:00
Location: Kaga
Chairperson(s): Nobuyuki Nishiguchi (STARC)

Title	Model to Hardware Closure for nm Generation Technologies
Author(s)	*Sani R. Nassif (IBM Austin Research Laboratory)
Page(s)	pp. 15 - 20
Abstract	In this paper, we review the various sources of variability that impact circuit performance -with a special emphasis on timing and on power. We then propose the notion of Model to Hardware correlation, defined as the set of activities that are implemented to characterize, model and simulate the behavior of a design in order to insure predictability of manufacturing results.

Poster Session 1: SoC Design / Low Power
Session Type: Poster
Time: Monday October 18, 2004, 14:00 - 15:45
Location: Oral/Discussion in Kaga
Chairperson(s): Nozomu Togawa (Univ. of Kitakyushu), Akihisa Yamada (Sharp Co.)

1-1

Title	Leakage Power Considerations for Processor Array-Based Vision Systems
Author(s)	*Jason Schlessman, Wayne Wolf (Princeton University)
Page(s)	pp. 23 - 26
Keywords	Computer Vision
Abstract	This paper presents work pertaining to power and energy considerations for low-level computer vision systems, specifically those employing processor arrays. Motivation for considering these systems as well as an overview of their implementation is given. A common method of reducing area within these systems, through utilization of an array of smaller dimension, is analyzed for leakage power efficiency. This analysis is discussed for an edge detection system utilizing a Sobel operator. Results are given for changes in leakage power for both changes in process technology sizing, as well as changes in array dimension.

1-2

Title	A ROBDD-Based Generalized Nodal Control Scheme for Standby Leakage Power Reduction
Author(s)	Hsinwei Chou (University of Wisconsin-Madison), *Charlie Chung-Ping Chen (National Taiwan University)
Page(s)	pp. 27 - 33
Keywords	Leakage Power Reduction, ROBDD
Abstract	We present in this paper the method of Generalized Nodal Control for standby leakage power reduction. First, a sparse set of input and internal nodes to control are identified using a novel algorithm based on Reduced Ordered Binary Decision Diagrams (ROBDDs) and Fiduccia-Mattheyses Partitioning. Then, customized control gates with

1-3

Title	Low Power and Fault Tolerant Encoding Methods for On-Chip Data Transfer
Author(s)	*Satoshi Komatsu, Masahiro Fujita (University of Tokyo)
Page(s)	pp. 34 - 40
Keywords	low power, fault tolerance, bus encoding
Abstract	Abstract— Energy consumption is one of the most critical constraints in the current VLSI system designs. In addition, fault tolerance of VLSI systems is also one of the most important requirements in the current shrunk VLSI technologies. This paper presents an impact of the low power encoding on the fault tolerant data encoding methods in on-chip data transfer scheme. Experiments using SPEC2000 benchmark programs show that the proposed methods can reduce signal transitions by up to 33% on the bus with fault tolerance.

1-4

Title	An Analytical Power Model for Synthesized Register Files Considering Address Dependencies
Author(s)	*Akihiko Higuchi, Kazutoshi Kobayashi, Hidetoshi Onodera (Kyoto University)
Page(s)	pp. 41 - 46
Keywords	System-Level, Power Estimation, ISS, Synthesizable Processor
Abstract	This paper proposes an analytical power model for synthesized register files using access patterns. Register files and execution units in a processor core consume much power. Therefore, power reduction in these units is effective. Our research intends to reduce power of register files on synthesizable soft-core processors. The proposed model considering address dependency estimates power precisely compared with the conventional power model considering only data dependency and more rapidly than time-consuming gate-level logic simulations.

1-5

Title	Energy-Aware Dynamic Task Scheduling Applied to a Real-Time Multimedia Application on an Xscale Board
Author(s)	Chantal Ykman-Couvreur, Francky Catthoor, Johan Vounckx, Andy Folens, Filip Louagie, *Rudy Lauwereins (IMEC)
Page(s)	pp. 47 - 54
Keywords	dynamic task scheduling, system-level energy minimization, dynamic voltage scaling, real design
Abstract	Energy consumption is a major issue when running real-time dynamic applications on portable devices. This energy is mainly dissipated on the processors, and it can be reduced by Dynamic Voltage Scaling (DVS). However for applications with dynamic behavior and task creation, this is not feasible at design time. To avoid overhead, existing run-time schedulers only have a local view inside active tasks. Our approach combines a design-time scheduling exploration with a low-complexity run-time scheduling. In this paper, a 3D rendering application, with important load variations, and running on an Xscale processor that allows two voltages, is used as case study to demonstrate the effectiveness of our scheduling. For this case study, our average energy gain compared with state-of-the-art intra-task DVS is up to 40%.

1-6

Title	LSI Power Network Analysis with On-chip Wire Inductance
Author(s)	*Atsushi Muramatsu, Masanori Hashimoto, Hidetoshi Onodera (Kyoto University)
Page(s)	pp. 55 - 60
Keywords	pwer distribution networks, power grids, inductance
Abstract	On-chip power/ground wires have been modeled in resistance and capacitance. However, with increase of clock frequency, on-chip wire inductance plays an important role in power/ground distribution analysis. So far we usually analyze power network with inductance of package and bonding, but without on-chip wire inductance. In this paper, we examine behaviors of LSI power network from the point of transmission line theory, and demonstrate that voltage fluctuation propagates as an electromagnetic wave. We evaluate relation between decoupling capacitance position and noise suppression effect, moreover we reveal that placing decoupling capacitance close to current load is necessary for noise reduction. We also show that impact of on-chip inductance becomes small when on-chip decoupling capacitance is well placed according to local power consumption.

1-7

Title	Power Supply Noise Reduction with Design for Manufacturability
Author(s)	*Hiroyuki Tsujikawa, Kenji Shimazaki, Shozo Hirano, Kazuhiro Sato, Masanori Hirofuji, Junichi Shimada, Mitsumi Ito, Kiyohito Mukai (Matsushita Electric Industrial Co., Ltd.)
Page(s)	pp. 61 - 65
Keywords	Power, Voltage, Noise, DFM, CMP
Abstract	In this paper, a solution for reducing power-supply voltage noise in LSI microchips was presented. The proposed design methodology is also considering a design for manufacturability (DFM) at the same time of power integrity. The method was successfully applied to the design of system-on-chip (SOC), achieving a 12.9~14.6% noise reduction in power-supply voltage and the uniformity of pattern density for chemical mechanical polish (CMP).

1-8

Title	An IR-Drop Minimization by Optimizing Number and Location of Power Supply Pads
Author(s)	*Takashi Sato (Kyoto University), Masanori Hashimoto (Osaka University), Hidetoshi Onodera (Kyoto University)
Page(s)	pp. 66 - 72
Keywords	IR-drop, pad assignment, optimization, circuit reduction
Abstract	An efficient pad assignment algorithm to minimize voltage drop on a power distribution network is proposed. Combination of the successive pad assignment (SPA) algorithm and the incremental matrix inversion (IMI) provides an efficient optimization for both location and number of power supply pads. The SPA algorithm creates an equivalent resistance matrix for current sink points and candidate pads only. It preserves both pad candidates and power consumption points as external ports so that the topological modification due to connecting or disconnecting voltage sources to or from candidate pads are suitably represented without re-generating circuit matrices. Using sub-matrix of the resistance matrix, SPA greedily searches next pad location that minimizes worst voltage drop. Each time the next pad candidate to add is tested, IMI reduces computational complexity from O(n³) to O(n²). Experimental results show that the proposed algorithm efficiently enumerates pad assignment order in practical time.

1-9

Title	High Speed and Low Energy Lateral BJT-CMOS Inverter
Author(s)	*Toshiro Akino (Kinki University)
Page(s)	pp. 73 - 77
Keywords	Lateral BJT, SOI
Abstract	A new operation mode for a partially depleted CMOS inverter on SOI is proposed, and a hybrid lateral BJT-CMOS inverter circuit is designed and simulated. The scheme utilizes the gated lateral npn or pnp BJT inherent of n- or p-channel MOSFETs. Forward current is applied to the base terminal of the channel MOSFETs, with a normal pull-up or pull-down MOSFET as a current source, where each drain terminal is connected to the corresponding base terminal of the inverter. A logic scheme is also proposed to control the gates of the pull-up or pull-down MOSFETs in switching states using output signals made from two CMOS inverters with different resistance ratios. Circuit simulation using 0.35 um BSIM3v3 model parameters for MOSFETs and a current gain of bF = 100 for BJTs, the speed of the lateral BJT CMOS inverter with control logic (LB-CMOS+CL) is shown to be 3.1 times faster than that of 3-stage CMOS inverter designed on the basis of logical effort for driving a load capacitance of 0.3542 pF at Vdd = 1.2 V. The energy consumption of the LB-CMOS+CL is also approximately 17% lower than that of the 3-stage CMOS inverter.

1-10

Title	Topology-Oriented Design of Current Mirrors Using Evolutionary Graph Generation System
Author(s)	*Masanori Natsui, Naofumi Homma, Takafumi Aoki (Graduate School of Information Sciences, Tohoku University, Japan), Tatsuo Higuchi (Faculty of Engineering, Tohoku Institute of Technology, Japan)
Page(s)	pp. 78 - 84
Keywords	evolutionary computation, genetic algorithms, circuit synthesis, analog circuits
Abstract	This paper presents an efficient graph-based evolutionary optimization technique called Evolutionary Graph Generation (EGG) for automated circuit synthesis, and its application to the topology-oriented design of analog circuit structures. Key features of EGG are to employ a graph-based representation of individuals and to manipulate the graph structures directly instead of encoding the structures into indirect representations, such as bit strings and trees. The potential capability of EGG has been investigated through the topology-oriented design of nMOS current mirrors which are widely used as the fundamental building blocks of many analog circuits.

1-11

Title	MOSFET Layout Design for Electrical Performance Improvement
Author(s)	*Philip Beow Yew Tan (Silterra Malaysia Sdn. Bhd. & University Science Malaysia), Albert Victor Kordesch (Silterra Malaysia Sdn. Bhd.), Othman Sidek (University Science Malaysia)
Page(s)	pp. 85 - 89
Keywords	MOSFET, XMOS, Cross MOS, cross poly gate, ring oscillator
Abstract	In this paper, we discuss a new method of designing the MOSFET (MOS field effect transistor) layouts to improve their electrical performance. A new poly gate layout design has been introduced. Instead of using a straight poly gate, we used a cross poly gate. By adding an extra poly line across the existing poly line at 90 degrees, we have formed the Cross MOS (XMOS) transistor. The XMOS transistor has higher drain current (Id) and switching speed compared to the conventional MOS transistor.

1-12

Title	A Thermal-aware Sigma-Delta Modulator for CMOS Monolithic Temperature Sensors
Author(s)	*Suhow Wu, Herming Chiueh (National Chiao Tung University)
Page(s)	pp. 90 - 94
Keywords	ADC, Thermal Management, system-on-chip, temperature sensors, sigmal-delta modulation
Abstract	In this research, a low power high accuracy analog to digital converters (ADCs) with higher and wider temperature range is designed. The proposed design plays an important role in modern system-on-chip (SoC) thermal management system, which acts as an interface circuitry between monolithic temperature sensors and thermal management unit. Several challenges exist, including temperature range, power dissipation requirements and processing technology compatibility, which cause the design of such an ADC become a difficult work. This paper examines a practical design of oversampling ADC based on sigma-delta modulation with thermal considerations. The prototype is fabricated in a TSMC 0.25ìm standard CMOS process. The experimental result achieves seven bits resolutions and dissipates only a few mw which fulfill the system requirements of targeting thermal management design. Besides these requirements, the processing scaling of proposed ADC is considered in order to be compatible with targeting SoC fabrication technology. The integration of monolithic temperature sensor, proposed ADC and thermal management unit keep the progressing trend of modern VLSI and high-density circuits without thermal problems.

1-13

Title	Real-Time Segmentation of Large-Scale Images by Pipeline Processing with Small-Size Cell-Network
Author(s)	*Hidekazu Adachi, Takashi Morimoto, Osamu Kiriyama, Zhaomin Zhu, Tetsushi Koide, Hans Juergen Mattausch (Research Center for Nanodevices and Systems, Hiroshima University)
Page(s)	pp. 95 - 102
Keywords	Image-Segmentation, Real-Time, Cell-Network, Pipeline Processing, Large-Scale Image
Abstract	This paper presents an image-segmentation architecture, which is based on pipeline processing with a small-size cell network, and is called Subdivided-Image-Approach (SIA). The proposed SIA divides an input images into small tiles. These tiles are then segmented in sequential order and the total image-segmentation result is obtained by putting together the partial segmentation results of all tiles. The presented architecture can complete the segmentation of QVGA-size (320x240 pixels) to SXGA-size(1280x1024 pixels) images within 10msec at operating frequencies from 2MHz to 35MHz, respectively. This performance is confirmed with a 94mm² cell network for 41x33 pixels fabricated in 350nm CMOS technology and operating with a power dissipation between 19mW and 329mW in the required frequency range. Consequently, the requirements of mobile applications, namely low power dissipation and compact integration, can be satisfied with the proposed SIA architecture.

1-14

Title	System on Programmable Chip Platform Based Design of JPEG-2000 Entropy Coder
Author(s)	Riad Benmouhoub, Imed Aouadi, *Omar Hammami (ENSTA)
Page(s)	pp. 103 - 106
Keywords	fpga, xilinx, jpeg-2000
Abstract	The JPEG-2000 image compression standard is increasingly gaining widespread importance. The rich variety of features makes it highly suitable for a large spectrum of applications but at the same time its associated complexity makes it hard to optimize for particular implementations. One of the key step during the processing is entropy coding which takes about 70 % of the execution time. We propose in this paper an analysis and hardware design of this entropy coder in the system framework of the xilinx virtex-II pro chips

1-15

Title	Efficient Hardware Architecture of a New Simple Public-Key Cryptosystem for Real-Time Data Processing
Author(s)	*Chengnan Jin, Nobuhiro Doi (Waseda University), Hatsukazu Tanaka (Kobe University), Shigeki Imai (Sharp Corporation), Shinji Kimura (Waseda University)
Page(s)	pp. 107 - 112
Keywords	651286
Abstract	This paper proposes an efficient architecture for deciphering of a new simple public-key cryptosystem to obtain real-time performance in motion picture processing. In the new simple public-key cryptosystem, the deciphering process just consists of two multiplications and one addition with 1024-bit length. This is simpler than that of RSA, the most widely used public-key cryptosystem. The multiplication is implemented based on Montgomery Modular Multiplication. For performing fast long bit addition, we have exploited the Carry Save Adder based method and Redundant Binary Adder based method. By using 4-adder cascade, we have obtained the real-time performance with 80MHz clock.

1-16

Title	A Parameterized On-Chip-Bus-Compliant FDWT/IDWT Accelerator IP Generator
Author(s)	Chih-Chun Chang, *Youn-Long Lin (National Tsing Hua University, Hsin-Chu, Taiwan 300)
Page(s)	pp. 113 - 120
Keywords	DWT, JPEG2000, IP Generator, SOC
Abstract	We propose a software tool for automatic generation of hardware accelerators for performing Discrete Wavelet Transform (DWT) with user-specified coefficient parameters. In addition to (5, 3) and (9, 7) DWT filters adopted by the next generation JPEG2000 image compression standard, other useful filters such as (9, 3), (6, 10), and (2, 2) can also be generated. The generated hardware IPs can perform both forward and inverse transform (FDWT and IDWT). We analyze variable life time for register allocation with low power consumption and apply register retiming technology to improve circuit performance. Our tool also produces on-chip-bus interface circuit compliant with the AMBA protocol together with associated device driver so that the generated IPs is ready for SOC integration. We verify the proposed approach by integrating generated IPs into an SOC platform running JPEG2000 application software. Experimental results demonstrated that the proposed approach is indeed effective in enhancing the productivity of hardware accelerator IP design.

1-17

Title	VLSI Implementation of a 3D Sound Movement System
Author(s)	*Nobuyuki Iwanaga, Takao Onoye (Dept. Information Systems Eng., Osaka University), Wataru Kobayashi, Kazuhiko Furuya (Arnis Sound Technologies, Co., Ltd.), Isao Shirakawa (Graduate School of Applied Informatics, University of Hyogo)
Page(s)	pp. 121 - 125
Keywords	3D sound localization, head-related transfer function, sound localization, VLSI implementation, sound movement control
Abstract	This paper describes VLSI implementation of a sound movement system, which is to be used on mobile applications. A low cost 3D sound localization and coefficient interpolation algorithm is exploited in order to realize the sound movement at all points in 3D space. The proposed system is synthesized from Verilog-HDL description, which requires only 30,000 gates. To demonstrate the effectiveness of the proposed 3D sound movement, a prototype system is implemented on FPGA board and subjective tests are performed.

Invited Talk 3
Session Type: Lecture
Time: Monday October 18, 2004, 15:45 - 16:30
Location: Kaga
Chairperson(s): Takafumi Aoki (Tohoku Univ.)

Title	Nanodevices Beyond Silicon: Device and Circuit Implications
Author(s)	*H.-S. Philip Wong (Center for Integrated Systems, Stanford University, Stanford)
Page(s)	pp. 129 - 133
Abstract	In this paper, we survey the salient characteristics of nanodevices beyond "silicon CMOS". Carbon nanotubes, semiconductor nanowires, and devices made of molecules are discussed as examples. New devices may have characteristics different from conventional Si CMOS. We discuss the circuit implications of these new devices. Opportunities offered by new fabrication techniques such as self-assembly may enable device fabrication beyond the lithographic limit. An example of new circuit architecture based on sublithographic nanodevice array is described.

Poster Session 2: System Level Design / Physical Design
Session Type: Poster
Time: Monday October 18, 2004, 16:30 - 18:15
Location: Oral/Discussion in Kaga
Chairperson(s): Tomonori Izumi (Kyoto Univ.), Hiroshi Murata (Univ. of Kitakyushu)

2-1

Title	An Application Specific Network-on-Chip (ASNOC) Design with Binary Tree Architecture
Author(s)	*Yuan-Long Jeang, Win-Hsien Huang, Wei-Feng Fang, Jain-Zhou Huang, Nan-Long Tsai, Chien-Cheng Ou (National Kaohsiung University of Applied Sciences,Department of Electronic Engineering,Kaohsiung, Taiwan, R.O.C)
Page(s)	pp. 137 - 142
Keywords	System-on-a-chip, network on chip, globally asynchronous network, locally synchronous bus, wormhole routing
Abstract	A mix-mode network on-chip (NOC) interconnection architecture is proposed in this paper. The proposed architecture makes use of a globally asynchronous communication network and a locally synchronous bus. First, a local bus is given for a group of cores so that all communications within this local bus are exclusive in time. In order to represent the ratio of communications of this local bus, user has to provide a communication ratio (CR) of each pair of local bus groups. After that, the two local buses with the highest CR are grouped to be the first switching point for the globally asynchronous network. Then, one can regard the two groups using a switching point as a new group. The new CR hence can be determined from the new and each other local bus group. Similar process is performed to form the next switching point. Finally, a binary tree (BT) is built by setting each internal tree node a switching point while each leaf a local bus. In addition, the switching circuit cost can be decreased while the performance is increased. The simulation results show that the proposed architecture of NOC is better than the general purposed SPIN architecture [4].

2-2

Title	The Design and Implementation of a Multimedia Coprocessor for ARM7 Microprocessors
Author(s)	*Tse-Chen Yeh, Ing-Jer Huang (Dept. of Computer Science and Engin., National Sun Yat-Sen University)
Page(s)	pp. 143 - 147
Keywords	coprocessor, multimedia, acceleration
Abstract	For satisfying the requirement of multimedia computation, we propose a multimedia coprocessor (MMCOP) for ARM7 microprocessors. MMCOP can accelerate the typical multimedia calculation by SIMD execution. To suit different multimedia application, we implement adaptive long transfer instructions of MMCOP. In this paper, we explain design and verification of ARM7 coprocessor. And we also provide the comparison of synthesis report on coprocessor integration and extension instruction set of ARM microprocessor.

2-3

Title	ParSyC: An Efficient SystemC Parser
Author(s)	*Görschwin Fey, Daniel Große, Tim Cassens, Christian Genz, Tim Warode, Rolf Drechsler (University of Bremen)
Page(s)	pp. 148 - 154
Keywords	SystemC, Parsing
Abstract	Due to the ever increasing complexity of circuits and systems new methodologies for system design are mandatory. Languages that enable modeling at higherlevels of abstraction but also allow for concise hardware descriptions offer a promissing way into this direction. One such language is SystemC. The research comunity faces a huge overhead when dealing with SystemC: no public domain software to retrieve a formal model from a SystemC description is available. In this paper we propose a SystemC parser, that allows to retrieve a description of SystemC in form of an Abstract Syntax Tree. Moreover we apply the parser for synthesis of register transfer level descriptions. By this a formal model can be retrieved from a SystemC description. Parsing the high-level description and building a formal model are necessary to enable further research in the area of new design methodologies.

2-4

Title	A Simulation Environment for Asynchronous Codesign
Author(s)	*Satoshi Tsutsumi, Hideharu Amano (Keio University)
Page(s)	pp. 155 - 161
Keywords	codesign, simulation, asynchronous, SystemC
Abstract	Recent advanced semiconductor processes enable to integrate numerous and complex systems on a single chip. The modularity derived from asynchronous design becomes favorable in system performance or design productivity to design large and complex systems in an advanced processes. In this paper, a HW/SW codesign system for large-scale asynchronous systems is discussed. As the first step, simulation library for asynchronous systems is developed for verifying functionality and estimating performance in the early stage of design. Here, a simulator for SoC (System on a Chip) design using the library, which includes asynchronous CPU and dedicated hardware is developed, and the simulation speed is evaluated.

2-5

Title	Extracting Structural and Communication Information of SystemC Descriptions
Author(s)	*Fábio Prudente, Edna Barros (Centro de Informatica, Universidade Federal de Pernambuco)
Page(s)	pp. 162 - 168
Keywords	system level design, systemc descriptions
Abstract	System-Level design languages are needed, in the System-on-a-Chip (SoC) scenario, in order to model complex and heterogeneous systems at distinct abstraction levels, and to allow design space exploration and validation of such systems. SystemC is becoming a de-facto standard design language, addressed to these needs. Although SystemC is ANSI C++ compliant, standard C++ tools are not suited for system modeling. A standard C++ tool is not able recognize some SystemC constructs. Instead, only classes, its fields and methods, from the underlying C++, will be seen. Without understanding the SystemC semantics, EDA tools are not able to perform any specific analysis over the modeled system. In this work, we present an approach for recognizing and extracting SystemC constructs from an Abstract Semantic Graph (ASG) generated by a based on gcc front-end tool. The resulting graph is about thousand times smaller than the ASG graph obtained by the gcc based extraction tool.

2-6

Title	Software Execution Time Back-annotation Method for High Speed Hardware-Software Co-simulation
Author(s)	*Michiaki Muraoka, Noriyoshi Itoh, Rafael K. Morizawa, Hiroyuki Yamashita , Takao Shinsha (STARC)
Page(s)	pp. 169 - 175
Keywords	System level design, Hardware-software co-simulation, Architecture simulation, Transaction level model, Time back-annotation
Abstract	This paper describes an acceleration method of the software simulation speed by back-annotating the execution time of the software on a target embedded CPU for a hardware-software co-simulation. This time back-annotation algorithm consists of three major steps: The basic block extraction from the software, the execution time calculation of the basic blocks and the time annotation into the software source codes. The simulation performance of the resulting time back-annotated software source codes, using a C-based hardware-software co-simulator, is more than ten times faster than the conventional ISS when the execution time on a target embedded CPU is taken into consideration. This means the simulation speed will be sufficient for a system-level verification, architecture-level performance evaluation and software verification of SoCs.

2-7

Title	Wire Length Distribution of SoC considering Macro Block Shapes
Author(s)	*Takanori Kyogoku, Hidenari Nakashima, Junpei Inoue, Naohiro Takagi, Hiyouko Shinoki, Kenichi Okada , Kazuya Masu (Precision and Intelligence Laboratory, Tokyo Institute of Technology)
Page(s)	pp. 176 - 180
Keywords	Wire length distribution, SoC, circuit complexity
Abstract	A system on chip (SoC) contains many macro blocks. In this paper, we propose a wire length distribution model for global interconnect between macro blocks. The proposed model can consider macro cell shapes, placement, and interac-tion between logic complexity and inter-macro wire length. The wire length distribution of SoC is derived from the num-ber of input terminals of each macro cell.

2-8

Title	Interconnect Synthesis for Lithography and Manufacturability in Deep Submicron Design
Author(s)	Sameer Pujari (SUNY Binghamton ECE), Ryon M. Smey (SUNY Binghamton CSD/InternetCAD), Tan Yan (University of Kitakyushu), Hannibal H. Madden (AVS), *Patrick H. Madden (SUNY Binghamton CSD/University of Kitakyushu)
Page(s)	pp. 181 - 188
Keywords	lithography, manufacturability, interconnect, routing
Abstract	Effective design for large VLSI systems requires abstraction; problems are simply too complex to be addressed directly. To obtain good results, it is necessary that the abstractions still capture the basic nature of the problem. Deep submicron lithography has placed new constraints on circuit layout; these constraints are frequently counterintuitive, and are hard to model with current design rules. When design tools ignore the constraints, there is a need for a great deal of ``back end'' work to fix violations--the difficulty of these fixes has resulted in a push towards very restrictive design rules. To enable aggressive design without major violations, better abstractions are needed. In this paper, we focus on circuit interconnect, and develop a ``straw man'' approach to considering the lithography and manufacturability challenges of deep submicron design. We propose Mead-and-Conway style rules, and the concept of a ``virtual layer'' to separate mask-based constraints from silicon-based constraints. We also discuss a prototype routing tool that uses the virtual layer approach.

2-9

Title	Compact Modeling and Experimental Verification of Substrate Resistance in Lightly Doped Substrates
Author(s)	Hai Lan, Tze Wee Chen, Chi On Chui, *Robert W. Dutton (Stanford University)
Page(s)	pp. 189 - 195
Keywords	substrate noise, compact model, mixed-signal
Abstract	This paper presents a synthesized compact modeling methodology for substrate noise coupling in lightly doped silicon process. Rigorous 3-D device simulations reveal the distinctive decaying trend of noise coupling in far field and near field regions. A new compact, scalable model is proposed to accommodate both far field and near field effects as well as contact sizes, perimeters, and separations. A test chip in a lightly doped P-type substrate, consisting of various combinations of substrate noise coupling configurations, is fabricated and tested. The proposed compact model is validated by the measurement data from the test chip.

2-10

Title	Realization of Digital Noise Emulator for Characterization of Systems Exposed to Substrate Noise
Author(s)	Yi-Chang Lu, Jae Wook Kim (Stanford University), *Nobuhiko Nakano (Keio University), David Colleran (Stanford University), Patrick Yue (Carnegie Mellon University), Robert W. Dutton (Stanford University)
Page(s)	pp. 196 - 203
Keywords	substrate noise
Abstract	Frequency and timing of digital clocks, digital switching activities, and number of transistors in digital blocks are the key parameters to model switching noise generated by complicated digital systems. In this paper, a Digital Noise Emulator (DNE) is implemented to study how these parameters impact the performance of a ring-typed-VCO-based PLL. In addition, the proposed DNE can be used for noise cancellation to improve PLL performance in the presence of deterministic noise.

2-11

Title	Passive-Assured Rational Function Approach for Compact Modeling of On-chip Passive Components
Author(s)	*Zuochang Ye, Zhiping Yu (Institute of Microelectronics, Tsinghua University)
Page(s)	pp. 204 - 208
Keywords	compact modeling, rational function, passivity
Abstract	A linear passive network can be modelled to be terminal-equivalent by approximating its admittance matrix Y using rational functions in the frequency domain. The physical behavior of the original system mandates that the generated model should not release more energy than it absorbs, i.e., the passivity must be guaranteed. Previous modeling approaches using rational function can only assure passivity at the sampling frequencies. In this paper, a new method is proposed, which is based on the linearization of a so-called discriminant polynomial and guarantees the model passivity in the entire frequency range. The method uses the linear-constrained least-squares programming and is applicable to general N-port networks. The detailed algorithm and results from the application of the method to an on-chip spiral inductor in RF CMOS design are presented.

2-12

Title	An Analytical Phase Response Model for 3-stage Ring Oscillators
Author(s)	*Jaijeet Roychowdhury (University of Minnesota)
Page(s)	pp. 209 - 213
Keywords	oscillator, macromodelling
Abstract	We present a simple analytical model for capturing phase errors in 3-stage ring oscillators. The model, based on a simple but useful idealization of the ring oscillator, is provably exact for small noise perturbations. Requiring knowledge only of the amplitude and frequency of the oscillator, the model is ideally suited for early design exploration at the system and circuit levels. Despite its simplicity and purely analytical form, our model correctly captures the time-dependent sensitivity of oscillator phase to external perturbations. It is thus well suited for estimating both qualitative and quantitative features of ring oscillator phase response to internal noises, as well as to power, ground and substrate interference. The nonlinear nature of the model makes it suitable for predicting injection locking as well. Comparisons of the new model with existing phase models are provided.

2-13

Title	Statistical Analysis of Clock Skew Variation
Author(s)	*Masanori Hashimoto, Tomonori Yamamoto, Hidetoshi Onodera (Kyoto University)
Page(s)	pp. 214 - 219
Keywords	clock skew, statistifal analysis, manufacturing variability, power supply fluctuation, temperature gradient
Abstract	This paper discusses clock skew due to manufacturing variability and environmental change. In clock tree design, transition time constraint is an important design parameter that controls clock skew and power dissipation. In this paper, we evaluate clock skew under several variability models, and demonstrate the relationship among clock skew, transition time constraint and power dissipation.

2-14

Title	Statistical Timing Analysis with Global Variations and Path Reconvergence
Author(s)	Lizheng Zhang, Yuhen Hu (University of Wisconsin-Madison), *Charlie Chung-Ping Chen (National Taiwan University)
Page(s)	pp. 220 - 227
Keywords	statistical timing analysis, correlation
Abstract	Block based statistical timing analysis (STA) tools often yield less accurate results when timing variables become correlated due to global source of variations and path reconvergence. To the best of our knowledge, no good solution is available handling both types of correlations simultaneously. In this paper, we present a novel statistical timing algorithm, AMECT (Asymptotic MAX/MIN approximation & Extended Canonical Timing model), that produces accurate timing estimation by handling both types of correlations simultaneously. Firstly, a linear mixing operator is used to approximate the nonlinear MAX/MIN operator by moment matching. Secondly, an extended canonical timing model is developed to evaluate and decompose correlations between arbitrary timing variables. Finally, an intelligent pruning method is designed enabling trade-off runtime with accuracy. Tested with ISCAS benchmark suites, AMECT shows both high accuracy and high performance compared with Monte Carlo simulation results: with distribution estimation error <1.5% while with around 350X speed up on a circuit with 5355 gates.

2-15

Title	SIRE/M: A Homogeneous Intermediate Format for VHDL-AMS Mixed Signal Simulation
Author(s)	*Hamid Reza Ghasemi, Zainalabedin Navabi (CADLAB, ECE Department, University of Tehran)
Page(s)	pp. 228 - 234
Keywords	VHDL-AMS, Mixed Signal Simulation, SIRE/M, simulation
Abstract	Abstract- VHDL-AMS is a recent modeling language that is suitable for modeling mixed signal circuits. A straight forward implementation of a VHDL-AMS simulator is using two separate analog and digital simulation engines. Using this approach results in an easy implementation, but the performance of the resulting simulator is degraded because of some basic overheads. We present an alternative approach that eliminates the overheads using intermediate formats to build mixed engines. In this paper we introduce an extensible object oriented intermediate format (SIRE/M ) that improves the performance by eliminating overheads based on this approach. Then we describe the implementation of a mixed signal simulation engine based on SIRE/M and justify the improvements by several mixed signal examples.

2-16

Title	A Layout-Aware Circuit Sizing Model Using Parametric Analysis
Author(s)	*I-Lun Tseng, Adam Postula (School of ITEE, The University of Queensland, Australia)
Page(s)	pp. 235 - 240
Keywords	GBLD, parameterized layout, parametric analysis, circuit sizing
Abstract	We propose a circuit sizing model that takes layout parasitics into account. The circuit and layout parameters are stored in a parameterized layout description format, GBLD. The layout parasitics are stored as closed form expressions. Layout optimization tools can modify the layout and recalculate parasitics on the fly. If the results of sensitivity analysis are passed to those tools, optimization for performance can be achieved with relatively few iterations involving time consuming circuit simulations.

2-17

Title	LARTTE: A Posynomial-Based Lagrangian Relaxation Tuning Tool for Fast and Effective Gate-Sizing and Multiple Vt Assignment
Author(s)	Hsinwei Chou (University of Wisconsin-Madison), Yu-Hao Wang (Incentia Design Systems), *Charlie Chung-Ping Chen (National Taiwan University)
Page(s)	pp. 241 - 248
Keywords	Circuit Tuning, Gate Sizing, Multiple Vt, Timing Optimization, Power Reduction
Abstract	In this paper, we propose a novel method for fast and effective gate-sizing and multiple Vt assignment using Lagrangian Relaxation (LR) and posynomial modeling. Our algorithm optimizes a circuit's delay and power consumption subject to slew rate constraints, and can readily take process variation into account. We first use SPICE to generate accurate delay and power models in posynomial form for standard cells, then formulate a large-scale, convex optimization problem based on these models. Finally, we perform LR to solve for the globally-optimal (optimality is with respect to the posynomial approximation-based optimization problem, without discretization.) set of transistor sizes and Vts (with discretization) for each gate. Our key contribution is that we show for the first time that using posynomial models, LR-based circuit tuning can be carried out in a

Invited Talk 4
Session Type: Lecture
Time: Tuesday October 19, 2004, 8:45 - 9:30
Location: Kaga
Chairperson(s): Katsuhiko Ueda (Matsushita Electric Ind.)

Title	A DSP Core for Portable Multimedia Application
Author(s)	*Chein-Wei Jen (SoC Technology Center, ITRI, Taiwan)
Page(s)	pp. 251 - 252

Poster Session 3: System Level Design / Logic Design
Session Type: Poster
Time: Tuesday October 19, 2004, 9:30 - 11:15
Location: Oral/Discussion in Kaga
Chairperson(s): Shin-ichi Minato (Hokkaido Univ.), Hiroyuki Tomiyama (Nagoya Univ.)

3-1

Title	Zero Overhead Loop Techniques for Application Specific Instruction-set Processors
Author(s)	*Shinsuke Kobayashi (University of Tokyo), Kentaro Mita, Yoshinori Takeuchi, Masaharu Imai (Osaka University)
Page(s)	pp. 255 - 261
Keywords	ASIP, Zero Overhead Loop, Processor Synthesis, Compiler Generation
Abstract	We propose a new retargetable Zero Overhead Loop (ZOL) techniques for Application Specific Instruction-set Processors. The techniques consist of configurable hardware and retargetable software development tools including compiler and assembler. By defining the parameters for configuration including loop block indication, nesting levels, and the number of iterations, the hardware can be customized. Moreover, the method can generate the target compiler and assembler that can deal with ZOL instructions, considering the parameters of retargetable hardware. In experiments, various ZOL type processors were designed. It took only about 4 hours to design 19 processors. The results show that designers using our method can explore the ASIP design space including the ZOL efficiently.

3-2

Title	Buffering Hardware Nested Loop of Parameterized and Embedded DSP Core
Author(s)	*Ya-Lan Tsao, Wei-Hao Chen, Shyh-Jye Jou (National Central Univ.)
Page(s)	pp. 262 - 265
Keywords	DSP, parameterized, instruction buffer
Abstract	Abstract - In this paper, a hardware nested looping structure with an instruction buffer memory is proposed for the parameterized and embedded DSP core. An optional buffer memory for the instructions in the loop is used to save much of the memory accessing power consumption during the transaction of the program memory fetching. The size of the instruction buffer memory and nested loop depth are parameterized parameters. Design examples show that the hardware overhead for the nested hardware looping scheme is only 4.20% and saves 12% power consumption.

3-3

Title	A Methodology for Automated Test Generation for LISA Processor Models
Author(s)	*Olaf Lüthje (CoWare, Inc.)
Page(s)	pp. 266 - 273
Keywords	LISA, verification, test, processor
Abstract	A methodology is presented for automatic generation of test vectors for LISA processor models. The methodology has been implemented as a prototype. The processor model is analyzed with a new analysis approach and conditions are extracted that have to be fulfilled by a set of test cases in order to achieve 100% code coverage of the processor model. The conditions are formulated as constraint satisfaction problems (CSP). The tests are generated by finding solutions for these CSP. As proof of concept, the presented methodology has been applied to two processor models: a very simple example processor and an instruction accurate model of the ARM7.

3-4

Title	Design Understanding by Automatic Property Generation
Author(s)	Rolf Drechsler, *Görschwin Fey (University of Bremen)
Page(s)	pp. 274 - 281
Keywords	property checking, design methodology
Abstract	Only a concise synthesis and verification flow allows to cope with complex circuits and systems consisting of several million components. In the meantime, verification has become the dominating factor causing up to 80% of the overall design costs. But still verification can only be applied if a formal model exists. For this the initial translation of the specification - given as a workbook in natural language - to a formal description on register-transfer level (RTL) is usually not checked. By this, current verification approaches do not provide any design understanding. In this paper we propose a new approach that allows automatic generation of properties for a given design. These properties are formally verified using model checking. The resulting properties are translated into a description that is easy to read and to understand for the designer, who can add this description to the set of properties or a testbench. The methodology - independently of the designer and verification engineer - provides design understanding and by this significantly contributes to the quality of the process.

3-5

Title	Behavioral Model Construction for Formal Verification of Advanced On-chip Bus Protocols
Author(s)	*Yosuke Kakiuchi (Osaka University), Akira Kitajima (Osaka Electro-Communication University), Kiyoharu Hamaguchi, Toshinobu Kashiwabara (Osaka University)
Page(s)	pp. 282 - 289
Keywords	verification, bus protocol, behavioral model, AMBA, CWL
Abstract	This paper addresses verification of interactions between IPs, that is, bus protocol with formal techniques. Usual models for verification of bus protocols are composed only from finite state machines, and as a result, it is difficult to describe complex protocols. In this paper, we propose an extended behavior model so that we can handle split transactions or pipeline transactions more naturally and show how to construct models from formal specifications written in some language. We also report experimental results on verification of the AMBA protocol specification with symbolic model checker. This shows that, even through rather complex behavioral models, formal verification of bus protocols with advanced features can be done in a reasonable time.

3-6

Title	On Debugging Assistance in Assertion-Based Verification
Author(s)	*Bao-Ren Huang, Tzung-Jr Tsai, Chien-Nan Liu (Dept. of EE, National Central University)
Page(s)	pp. 290 - 295
Keywords	assertion, debigging, HDL
Abstract	In the verification process, debugging is also a hard and time-consuming process and is often done by designers themselves. Because most design errors occur in the early design stages, there are also some approaches proposed for debugging HDL designs. The authors in [7] proposed a method to give a rank to each error candidate. In this way, the debugging efforts can be reduced because designers only have to trace several items in the front of list. However, due to the lack of internal information of the circuit, the estimation of error possibility may still not very accurate. In this paper, we propose a method to use the extra observability provided by assertions to make a better estimation of error possibility. Using our approach, the error ranking can be more accurate than that in previous approach such that the debugging efforts can be further reduced. The effectiveness of our improvements can be shown in the experiments.

3-7

Title	Matlab based Environment for Designing DSP Systems using IP Blocks
Author(s)	Nacer-Eddine Zergainoh, Katalin Popovici, *Ahmed Amine Jerraya (TIMA Laboratory), Pascal Urard (STMicroelectronics)
Page(s)	pp. 296 - 302
Keywords	system-level design, DSP, IP-based design, Matlab, SystemC
Abstract	In this paper, we propose an efficient IP block based design environment for high throughput VLSI Systems. The flow generates SystemC Register Transfer Level (RTL) architecture, starting from a Matlab functional model described as a netlist of functional IP. The refinement model inserts automatically control structures to manage delays induced by the use of RTL IPs. It also inserts a control structure to coordinate the execution of parallel clocked IP. The delays may be managed by registers or by counters included in the control structure. The experimentations show that the approach can produce efficient RTL architecture and allow for huge saving of time.

3-8

Title	Asynchronous Datapath Synthesis Enhancing Graceful Degradation for Delay Faults
Author(s)	*Koji Ohashi, Mineo Kaneko (Japan Advanced Institute of Science and Technology)
Page(s)	pp. 303 - 309
Keywords	Asynchronous systems, Schedule, Datapath synthesis
Abstract	This paper treats scheduling and datapath synthesis for asynchronous systems. Especially we introduce scheduling under a specified assignment of operation to functional unit and data to register for asynchronous systems, and incorporate it into Simulated Annealing exploration of assignment solution space. By our method, more aggressive register sharing is allowed than conventional method does. We present a new objective function to design of graceful degradation system. It is demonstrated that asynchronous datapaths having graceful degradation property are successfully synthesized.

3-9

Title	A Behavioral Synthesis Method Considering Complex Operations
Author(s)	*Tsuyoshi Sadakata, Yusuke Matsunaga (Graduate School of Information Science and Electrical Engineering, Kyushu University )
Page(s)	pp. 310 - 314
Keywords	behavioral synthesis
Abstract	The current behavioral synthesis tools often use the one-to-one mapping to assign functional units to operators (ex. additions) . However, it is possible to create complex functional units from operations in series and make circuits faster by assigning them to operations. This paper proposes a novel method to minimize the number of control steps under timing constraints using complex functional units. In a preliminary experiment, the proposed method can double the speed with 40% increase of the area.

3-10

Title	Customizable Framework for Arithmetic Synthesis
Author(s)	*Taeko Matsunaga (Fukuoka Industry, Science & Technology Foundation), Yusuke Matsunaga (Kyushu University)
Page(s)	pp. 315 - 318
Keywords	arithmetic synthesis
Abstract	Design of arithmetic units has been an important issue which can dominate performance of the whole circuits. Recent logic synthesis tools can implement arithmetic units from RTL descriptions by utilizing parameterized design components predefined in arithmetic libraries. This paper reviews the current status in arithmetic synthesis, and some issues, especially on customizability, are pointed out to be tackled for better performance. Further work is still needed in arithmetic synthesis, and it is helpful to have an framework which eases new approaches to be integrated. Requirements for such framework are discussed and a possible synthesis flow within this framework is shown.

3-11

Title	Arithmetic Description Language and Its Application to Parallel Multiplier Design
Author(s)	*Naofumi Homma, Kazuya Ishida, Takafumi Aoki (Tohoku University), Tatsuo Higuchi (Tohoku Institute of Technology)
Page(s)	pp. 319 - 326
Keywords	computer arithmetic, hardware description language, formal verification, hardware algorithms, number system
Abstract	This paper presents the basic concept of arithmetic description language called ARITH. The use of ARITH makes possible (i) formal description of arithmetic algorithms including those using unconventional number systems, (ii) formal verification of described arithmetic algorithms, and (iii) translation of arithmetic algorithms to equivalent HDL codes. In this paper, we demonstrate the potential of ARITH through an experimental design of parallel multipliers using radix-4 modified Booth's algorithm and (4;2) compressor tree architecture.

3-12

Title	Area efficient Wave-Pipelined Adder Using Redundant Binary Encoding
Author(s)	*Tatsuya Yamamoto, Kiyofumi Tanaka (Japan Advanced Institute of Science and Technology )
Page(s)	pp. 327 - 332
Keywords	wave-pipeline, redundant arithmetic unit, circuit design
Abstract	With the popularization of multimedia applications, there is a demand of high speed data processing. Wave-pipeline is one of the techniques for raising the throughput of arithmetic units used in these applications. We replace the conventional arithmetic unit with Redundant Arithmetic Unit (RAU) in order to prevent the increase of the area before and after wave-pipelining. For all bits, the difference between maximum and minimum propagation delays are almost the same in RAU. Therefore, very few delay elements are needed when designing a wave-pipelined RAU. Also, according to our experimental results on adders, the area of a wave-pipelined RAU is much smaller than a wave-pipelined ripple adder and carry-lookahead adder.

3-13

Title	Reduction on the Usage of Intermediate Registers for Pipelined Circuits
Author(s)	*Bakhtiar Affendi Rosdi, Atsushi Takahashi (Tokyo Institute of Technology)
Page(s)	pp. 333 - 338
Keywords	pipeline, semi-synchronous circuit, multi-clock cycle path
Abstract	Conventional pipelining consumed lots of areas due to intermediate registers need to be inserted between the stages. While wave-pipelining is a method of the circuit design which implements pipelining in logic without the use of intermediate latches or registers. However, to achieve the highest possible wave-pipelining frequency, delay balancing is required that will increase the area of a circuit. In this paper, we propose a new pipeline synthesis method that reduces the usage of intermediate registers by making use of the multi-clock cycle path and the semi-synchronous circuit technique. A multi-clock cycle path is introduced if an intermediate register can be removed without an excessive delay balancing. We will show the constraints that need to be satisfied by a multi-clock cycle path in the given clock period range. Also we will propose an algorithm to reduce the usage of intermediate registers for pipelined circuits.

3-14

Title	Error Diagnosis Technique Based on Boolean Resubstitution
Author(s)	Toshifumi Sugane, *Ryosuke Arai, Takayuki Iida, Hiroshi Inoue, Nobutaka Kuroki, Masahiro Numa (Kobe University)
Page(s)	pp. 339 - 344
Keywords	error diagnosis, design error, ECO, incremental synthesis, Boolean resubstitution
Abstract	In an LSI design process, Engineering Change Orders (ECO’s) are often given due to logic design errors, changes of specification, and timing issues. In these cases, it is impossible to assume the type of correction. Therefore, a rectification technique for various kinds of error models is needed. This paper presents an error diagnosis technique with insertion of additional circuits to be applicable to various kinds of error models. By this technique, after locating the insertion part and computing the function of the additional circuit, rectification is performed by re-synthesis of the additional circuit. Moreover, the size of the additional circuit is reduced by reusing the existing circuit based on Boolean re-substitution.

3-15

Title	Extraction of Subcircuits for Incremental Synthesis Based on Error Diagnosis
Author(s)	*Takayuki Iida, Toshifumi Sugane, Takahiro Iwasaki, Hiroshi Inoue, Nobutaka Kuroki, Masahiro Numa, Keisuke Yamamoto (Kobe University)
Page(s)	pp. 345 - 350
Keywords	ECO, incremental synthesis, error diagnosis, design error
Abstract	We present an approach to extract subcircuits that may contain design errors from larger circuit for incremental synthesis based on an error diagnosis technique. This approach employs not only structural matching between two circuits, but also functional matching of the equivalent points based on BDD's. Experimental results have shown that the proposed subcircuit extraction technique is effective to enlarge the circuit size to be diagnosed and to shorten the processing time for logic diagnosis.

3-16

Title	A Patchwork-like Partitioning Method for Engineering Change Orders in Redesign of High Performance LSIs
Author(s)	*Yuichi Nakamura, Ko Yoshikawa (NEC Corp.), Takeshi Yoshimura (Waseda Univ.)
Page(s)	pp. 351 - 356
Keywords	ECO, Logic Design, Partitioning
Abstract	This paper presents a novel design method for large-scale, high performance LSIs based on a patchwork-like partitioning technique for ECOs (engineering change orders). Even if only small design changes are made after the place and route process, applying a re-layout can be very time consuming. The proposed method partitions the design into several parts before the first place and route. When design changes occur in HDL, several parts relating to the changes are simply exchanged. The netlist for the changed design remains almost the same as the original one. For the partitioning, we use multiple-fan-out-points as partition borders. Experimental evaluation of our method showed that when a small change was made in the RTL description, the revised part of the circuit had only about 87 gates in the average case. This greatly reduces the re-layout time required to implement an ECO.

3-17

Title	Logic Optimization Method after Technology Mapping
Author(s)	*Ko Yoshikawa, Yuichi Nakamura (NEC Corp.), Kyo Akashi (NEC Informatec Systems,Ltd.), Takeshi Yoshimura (Waseda University)
Page(s)	pp. 357 - 362
Keywords	logic synthesis
Abstract	This paper describes an algorithm for logic optimization after technology mapping. The advantage of logic optimization after technology mapping is that estimation of cost ( area and delay ) is more accurate than that of technology independent logic optimization. Our logic optimization method is based on the Transduction method. We extend it to use the NPN-Equivalent function of the gates in the cell library as an internal representation for a node. We also extended the algorithm for fast processing. Experimental results using the proposed techniques achieve an average 10% area reduction compared with the input circuits that have already been optimized by conventional technology independent logic optimization algorithms.

Invited Talk 5
Session Type: Lecture
Time: Tuesday October 19, 2004, 11:15 - 12:00
Location: Kaga
Chairperson(s): Youn-Long Lin (National Tsing Hua Univ.)

Title	A Feed-Foreward Dynamic Voltage Frequency Management by Workload Prediction for a Low Power Motion Video Compression
Author(s)	*Masahiko Yoshimoto (Kobe University), Kentaro Kawakami (Kanazawa University)
Page(s)	pp. 365 - 370
Abstract	This paper proposes a feed-forward dynamic voltage and frequency management(FFDM) method to minimize the total power of software based video compression processing. This method cooperatively controls operating voltage/frequency and body bias voltage according to the workload predicted by a forward analysis to reduce both of dynamic power and leakage power. Simulation results indicate that the FFDM method can reduce power dissipation of MPEG4 encoding by 65%to 80%, depending on sequence activities.

Invited Talk 6
Session Type: Lecture
Time: Tuesday October 19, 2004, 13:30 - 14:15
Location: Kaga
Chairperson(s): Kazutoshi Wakabayashi (NEC)

Title	A Vision towards an Ambient Intelligent Environment and the Associated System Level Design Challenges
Author(s)	*Rudy Lauwereins (IMEC)
Page(s)	pp. 373 - 375
Abstract	The advent of the intelligent environment or "ambient intelligence" is a serious challenge for the systems designer. The systems of the future are small, complex, flexible and consume little energy. These conflicting requirements require new ways of designing that differ radically from conventional methods. A real software washing machine will solve the restrictions of todayÅfs methods in the near future.

Poster Session 4: System Architecture
Session Type: Poster
Time: Tuesday October 19, 2004, 14:15 - 16:00
Location: Oral/Discussion in Kaga
Chairperson(s): Kouhei Nadehara (NEC), Toshinori Sato (Kyushu Inst. of Tech.)

4-1

Title	Analog Topological Placement with Symmetry Constraints Using a O(n loglog n) Evaluation Algorithm
Author(s)	Karthik Krishnamoorthy, Sarat C. Maruvada, *Florin Balasa (University of Illinois at Chicago)
Page(s)	pp. 379 - 386
Keywords	placement, analog layout, topological representations, B-trees
Abstract	This paper presents a novel algorithm for device-level analog placement with symmetry constraints. Based on the exploration of symmetric-feasible binary tree representations of the layout, the novel approach employs an efficient model of priority queue introduced by Johnson and brought to the attention of the CAD community by Tang and Wong. The use of this data structure entails a worst-case complexity of O(n loglog n) for each code evaluation, which is better than of any other existent topological placement algorithm supporting symmetry.

4-2

Title	A Detailed Placement Method for Structured Devices with Multiple Resources
Author(s)	*Yoshihiro Ono, Takumi Okamoto (NEC)
Page(s)	pp. 387 - 394
Keywords	placement
Abstract	This paper describes a detailed placement method for structured architectures with multiple resources. Placement tools required for such architectures must control the density of cells in each type of resource. Specifically, resource capacity is set for each local region, and our method determines if there has been density violation by calculating the total amount of resources used beyond the capacity. Our goal was to optimize detailed placement to reduce density violation without sacrificing other objectives such as the wire length and timing. We describe a new method we developed by incorporating two improvements into the conventional simulated annealing based detailed placement method. Experimental results show that our method reduces density violation by 8.2% compared with the conventional method and that it does so without sacrificing the wire length and timing.

4-3

Title	Floorplan Design for 3-D ICs
Author(s)	Lei Cheng, Liang Deng, *Martin D.F. Wong (University of Illinois at Urbana-Champaign)
Page(s)	pp. 395 - 401
Keywords	floorplan, slicing, constraint, 3D
Abstract	In this paper we present a floorplanning algorithm for 3-D ICs. The problem can be formulated as that of packing a given set of 3-D rectangular blocks while minimizing a suitable cost function. Our algorithm is based on a generalization of the classical 2-D slicing floorplans to 3-D slicing floorplans. A new encoding scheme of slicing floorplans (2-D/3-D) and its associated set of moves form the basis of the new simulated annealing based algorithm. The best known algorithm for packing 3-D rectangular blocks is based on simulated annealing using sequence-triple floorplan representation. Experimental results show that our algorithm produces packing results on average 3% better than the sequence-triple based algorithm under the same annealing parameters, and our algorithm runs much faster (17 times for problems containing $100$ blocks) than the sequence-triple. Moreover, our algorithm can be extended to handle various types of placement constraints while the sequence-triple based algorithm does not have such capabilities. Finally, when specializing to 2-D problems, our algorithm is a new 2-D slicing floorplanning algorithm. We are excited to report the surprising results that our new 2-D floorplanner has produced slicing floorplans for the two largest MCNC benchmarks ami$33$ and ami$49$ which have the smallest areas (among all slicing/non-slicing floorplans) ever reported in the literature.

4-4

Title	Minimization in the Number of Empty Rooms on Floorplan by Dissection Line Merge
Author(s)	*Chikaaki Kodama, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology)
Page(s)	pp. 402 - 407
Keywords	Sequence-pair, empty room, floorplan , dissection line, wiring channel
Abstract	This paper discusses how to minimize the number of empty rooms on a floorplan corresponding to a placement of $n$ modules. In a floorplan, a room is said to be empty if no module is assigned to it. Since a floorplan obtained from a given module placement may have redundant empty rooms, redundant wiring channels and wire bends may be also generated. Hence, in order to reduce redundant channels and wire bends, removal of empty rooms is required. For this purpose, we formulate a problem of removing as many empty rooms as possible by merging dissection lines on a floorplan converted from sequence-pair extracted by FAST-gridding. Then, we propose a method of obtaining the optimal solution of the above problem in $O(n)$ time. The number of empty rooms in the resultant floorplan is reduced to $n-\lfloor \sqrt{4n-1} \rfloor$ or less.

4-5

Title	Area-Array I/O Clustering in Design Cost and Performance Optimization
Author(s)	*Hung-Ming Chen (Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan), I-Min Liu (Cadence Design Systems), Martin D.F. Wong (ECE Department, University of Illinois at Urbana-Champaign, Urbana, IL 61801)
Page(s)	pp. 408 - 413
Keywords	Area-Array I/O, Optimization, Physical Design
Abstract	I/O placement has always been a concern in modern IC design. Due to flip-chip technology, I/O can be placed throughout the whole chip without long wires from the periphery of the chip. However, because of I/O placement constraints in design cost and performance, I/O buffer planning becomes a pressing problem. During the early stages of circuits and packaging co-design, I/O layout should be evaluated to optimize design cost and to avoid product failures. In this paper, our objective is to better an existing/initial standard cell placement by I/O clustering, considering design cost reduction and signal integrity preservation. We formulate it as a minimum cost flow problem minimizing $\alpha W + \beta D$, where $W$ is the I/O wirelength of the placement and $D$ is the total voltage drop in the power network. The experimental results on the MCNC benchmarks show that our method achieves better timing performance and averagely over 30\% design cost reduction when compared with the conventional design rule of thumb popularly used by circuit designers.

4-6

Title	An Integrated Approach of Variable Ordering and Logic Mapping into LUT-Array-Based PLD
Author(s)	*Tomonori Izumi, Sinichi Kouyama (Kyoto University), Hideyuki Ito (NTT Corporation), Yukihiro Nakamura (Kyoto University)
Page(s)	pp. 414 - 421
Keywords	Programmable Logic Device, Logic Mapping, Variable Ordering
Abstract	For variable ordering and logic mapping into LUT-Array-Based PLD, we propose a directed acyclic graph which expresses mapping of the sum of cascaded LUTs and the input variable order based on the multi-valued dicision diagram and an integrated approach for variable reordering and SGCT generation.

4-7

Title	Application of LUT Cascades to Numerical Function Generators
Author(s)	Tsutomu Sasao (Kyushu Institute of Technology), *Jon T. Butler (Naval Postgraduate School), Marc D. Riedel (California Institute of Technology)
Page(s)	pp. 422 - 429
Keywords	LUT cascade, numeric circuit, piecewise linear approximation
Abstract	The availability of large, inexpensive memory has made it possible to realize numerical functions, such as the reciprocal, square root, and trigonometric functions, using a look-up table. This is much faster than by software. However, a naive look-up method requires unreasonably large memory. In this paper, we show the use of a look-up table (LUT) cascade to realize a piecewise linear approximation to the given function. Our approach yields memory of reasonable size and is significantly more accurate than any existing method.

4-8

Title	A Design Algorithm for Sequential Circuits using LUT Rings
Author(s)	*Hiroki Nakahara, Tsutomu Sasao, Munehiro Matsuura (Department of Computer Science and Electronics Kyushu Institute of Technology)
Page(s)	pp. 430 - 437
Keywords	Programmable logic device, Reconfigurable logic, LUT cascade, Logic synthesis, BDD-for-CF
Abstract	This paper shows a design method for a sequential circuit by using a Look-Up Table (LUT) ring. An LUT ring consists of memories, a programmable interconnection network, a feed-back register, an output register, and a control circuit. It sequentially emulates an LUT cascade that represents the state transition functions and the output functions. We present two algorithms for synthesizing a sequential circuit by an LUT ring: The first one partitions the outputs into groups, and realize them by LUT cascades. The second one reduces the evaluation time by using unused memories. We also compare the LUT ring with other methods to realize sequential circuits.

4-9

Title	Scannable Flip-Flops for Test of 3D Asynchronous Finite State Machine
Author(s)	*Soo-Hyun Kim (Gwangju Institute of Science and Technology), Ho-Yong Choi (Chungbuk National University), Kiseon Kim (Gwangju Institute of Science and Technology)
Page(s)	pp. 438 - 442
Keywords	Design-for Testability, scannable flip-flip, 3D AFSM
Abstract	In this paper, we propose a testing method for 3D asynchronous finite state machine (AFSM) using new scannable flip-flops to ease the controllability and the observability. The scannable flip-flops are designed to have low area overhead and no additional delay in critical data path. We experimented testing of 3D AFSM through three phases and obtain 100% fault coverage.

4-10

Title	Cost Functions for the Design of Dynamically Reconfigurable Processor Architectures
Author(s)	*Tobias Oppold, Thomas Schweizer, Tommy Kuhn, Wolfgang Rosenstiel (University of Tuebingen)
Page(s)	pp. 443 - 450
Keywords	reconfigurable, cost functions
Abstract	There are a growing number of reconfigurable architectures that combine the advantages of a hardwired implementation (performance, power consumption) with the advantages of a software solution (flexibility, time to market). Today, there are devices on the market that can be dynamically reconfigured at run-time within one clock cycle. But the benefits of these architectures can only be utilized if applications can be mapped efficiently. In this paper we describe a design approach for reconfigurable architectures that takes into account the three aspects architecture, compiler, and applications. To realize the proposed design flow we developed a synthesizable architecture model. From this model we obtain estimations for speed, area, and power that are used to provide the compiler with the necessary timing information and to optimize the architecture.

4-11

Title	Dynamic Reconfigurable RF Circuit Design
Author(s)	*Kenichi Okada, Yoshiaki Yoshihara, Hirotaka Sugawara, Kazuya Masu (Tokyo Institute of Technology)
Page(s)	pp. 451 - 457
Keywords	Reconfigurable, RF, VCO, process fluctuation
Abstract	This paper proposes a dynamic reconfigurable architecture for analog RF circuits. The architecture consists of RF circuits and a control circuit. The RF circuits can be reconfigured by vias voltages of transistors and variable passive components, and they can also be switched by the block level. The proposed architecture can realize the multi-band/mode RF circuit in single chip for the Software Defined Radio, which achieves considerable reduction of circuit area and power consumption. On the other hand, we can obtain robust RF circuits by the dynamic reconfiguration for the process fluctuation, the dynamic change of temperature, etc.

4-12

Title	Asynchronous Dynamically Reconfigurable Logic LSIs Suitable For Technology Scaling
Author(s)	*Hideyuki Ito, Ryusuke Konishi, Hiroshi Nakada, Yuichi Okuyama, Akira Nagoya (NTT Corporation), Tomonori Izumi, Yukihiro Nakamura (Kyoto University)
Page(s)	pp. 458 - 465
Keywords	Reconfigurable Hardware, Asynchronous Circuit, Technology Scaling, Dynamic Reconfiguration
Abstract	Results from development of a couple of dynamically reconfigurable logic LSIs based on the asynchronous circuit design are described. The LSIs, PCA-1 and PCA-2, are developed to realize the concept of Plastic Cell Architecture (PCA), which is oriented toward general purpose computing based on autonomous reconfigurability of hardware. In addition to a structural feature of PCA as the homogeneous cell array, we introduced the asynchronous circuit design as an efficient realization of the original PCA concept. Consequently, developed LSIs acquired an ultimate local connectivity. Comparison between circuit performances of PCA-1 and PCA-2 shows suitability for miniaturization.

4-13

Title	An Optimization Method in Floating-point to Fixed-point Conversion using Positive and Negative Error Analysis and Sharing of Operations
Author(s)	*Nobuhiro Doi (Waseda University), Takashi Horiyama (Kyoto University), Masaki Nakanishi (Nara Institute of Science and Technology), Shinji Kimura (Waseda University)
Page(s)	pp. 466 - 471
Keywords	HDL, compiler, high-level synthesis, fixed-point optimization, range estimation
Abstract	In application specific hardware design, designers usually use fixed-point operations because of the area and speed. So if the reference algorithms include floating point operations, designer should do the tedious jobs converting floating point operations to fixed point operations. The paper shows an automatic conversion method using non-linear programming technique. For the conversion, we use the positive and negative error analysis on the control data flow graph. We also considered the sharing of registers and operation units in the conversion. The effect of the proposed method is shown on several examples.

4-14

Title	Reusing Cache for Real-Time Memory Address Trace Compression
Author(s)	Ing-Jer Huang, *Chung-Fu Kao (Dept. of Computer Science and Engin.,National Sun Yat-Sen University)
Page(s)	pp. 472 - 476
Keywords	trace, cache, filter, microprocessor
Abstract	Instructions trace can help designer to debug the system architecture and understand the program behavior. However, one of the major problem of tracing is the high cost of storing the traces. How to reduce the trace information or compress the trace volumes is an important issue when debugging a system. Cache is one of the basic component in modern system-on-chip (SoC) design. In this paper, we present the technique that reusing system cache for memory address trace compression within system debugging.

4-15

Title	Dynamic Voltage and Frequency Scaling Techniques for Heterogeneous Multi-Processor Architecture in Future Nanometer Technologies
Author(s)	*Yutetsu Takatsukasa, Kazutoshi Kobayashi, Hidetoshi Onodera (Department of Communications and Computer Engineering, Kyoto University)
Page(s)	pp. 477 - 482
Keywords	DVS, multi-processor, leakage, heterogeneous
Abstract	In this paper, we evaluate power consumption of a heterogeneous multi-processor architecture in a dynamic voltage and frequency scaling environment. The heterogeneous architecture has a capability to reduce total power including active and leakage power in the future leaky nanometer process, since it can frequently turn off unused processors. The energy consumption of a processor depends on its load and the amount of leakage current. In the nanometer era in which the leakage current is inevitable, a heterogeneous multi-processor consumes less energy compared to a homogeneous multiprocessor or a single processor. On a multi-processor architecture, the load imbalance among processors reduces efficiency of DVFS. However, a thread relocation can relieve the load imbalance. A simple thread relocation reduces 20% of energy consumption; hence, the heterogeneous multi-processor system can cope with widespread load range.

4-16

Title	A Sub-Operation Parallelism Optimization Algorithm in HW/SW Partitioning for SIMD Processor Cores
Author(s)	*Hideki Kawazu, Jumpei Uchida, Yuichiro Miyaoka (Dept. of Computer Science, Waseda University), Nozomu Togawa (Dept. of Information and Media Science, The University of Kitakyushu), Masao Yanagisawa, Tatsuo Ohtsuki (Dept. of Computer Science, Waseda University)
Page(s)	pp. 483 - 490
Keywords	processor synthesis, SIMD type instruction, hardware/software partitioning, hardware/software cosynthesis, sub-operation parallelism
Abstract	A b-bit SIMD functional unit has n k-bit sub-functional units in itself, where b = k * n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a processor core do not necessarily execute n-parallel operations. Depending on an application program, some of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a processor core with small area under the given timing constraint. We expect that we can obtain processor core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown.

4-17

Title	Highly Efficient Switch Architecture Based on Banked Memory with Multiple Ports
Author(s)	*Takayuki Fujii (Hiroshima University), Kazuhiko Kobayashi (Hiroshima City University), Tetsushi Koide, Hans Juergen Mattausch (Hiroshima University), Tetsuo Hironaka (Hiroshima City University)
Page(s)	pp. 491 - 498
Keywords	network switch, multi-port memory, bank structure
Abstract	The rapid increase in the Internet traffic requires a drastic performance improvement of the network switches which serve as the connection nodes of the network. However, the required improvement is hardly expectable with the existing switch structures. In this paper, we propose a network switch which uses a bank structure multi-port memory as switch-fabric solution for high switching performance. Furthermore, an efficient packet-scheduling algorithm is developed for maximized throughput. The simulation results of the performance improvement verify a 120% increase in throughput when compared with a conventional crossbar structure with equal resource in memory storage capacity.

Panel Session
Session Type: Lecture
Time: Tuesday October 19, 2004, 16:00 - 17:30
Location: Kaga

Title	Life at the End of CMOS Scaling
Author(s)	Organizer & Moderator: Rob Rutenbar (Carnegie Mellon University), Panelists: Sani Nassif (IBM Austin Research Labs), Jan Rabaey (University of California, Berkeley), H.-S. Philip Wong (Stanford University), Kazuo Yano (Hitachi Ltd.)
Page(s)	p. 501
Abstract	It is obvious that CMOS will continue to scale aggressively for at least one or two more decades. Devices have been demonstrated working at roughly 10nm. However, there is a large gap between "working devices and design methodology". Manufacturing variations continue to increase at all length scales - from neighborhoods with a few devices to entire chips and wafers. The devices themselves continue to behave less well, e.g., leakage is an enormous problem now and continues to worsen. How will we cope with these problems as we continue toward the end of the CMOS semiconductor roadmap? And what happens after CMOS? Is there a most promising alternative emerging from the set of competing post-CMOS options? The panelists will discuss both problems and solutions, ranging from devices to circuits to systems.

Closing
Session Type: Lecture
Time: Tuesday October 19, 2004, 17:30 - 17:45
Location: Kaga

The 12th Workshop on Synthesis And System Integration of Mixed Information technologies Final Technical Program

The 12th Workshop on Synthesis And System Integration of Mixed Information technologies
Final Technical Program