SASIMI 2025
The 26th Workshop on Synthesis And System Integration of Mixed Information Technologies
Technical Program

Remark: The presenter of each paper is marked with "*".

Technical Program: SIMPLE version DETAILED version with abstract

Author Index: HERE

Session Schedule

Thursday, October 9, 2025

Registration
8:30 -

Opening
9:00 - 9:20

K1 Keynote Speech I
9:20 - 10:30

R1 Regular Poster Session I
10:30 - 12:00

Lunch Break (with exhibitors' presentation)
12:00 - 13:30

I1 Invited Talk I
13:30 - 14:30

R2 Regular Poster Session II
14:30 - 16:00

D Panel Discussion
16:00 - 17:30

Banquet
18:00 - 20:00

Friday, October 10, 2025

Registration
9:00 -

K2 Keynote Speech II
9:20 - 10:30

R3 Regular Poster Session III
10:30 - 12:00

Lunch Break
12:00 - 13:30

I2 Invited Talk II
13:30 - 14:30

R4 Regular Poster Session IV
14:30 - 16:00

Closing
16:00 -

List of papers

Remark: The presenter of each paper is marked with "*".

Thursday, October 9, 2025

[To Session Table]

Keynote Speech I
Time: 9:20 - 10:30, Thursday, October 9, 2025
Chair: Shin-ichi Minato (Kyoto University, Japan)

K1-1 (Time: 9:20 - 10:30)

Title	(Keynote Speech) Verification Tools Should Certify Their Results
Author	Randal. E. Bryant (Carnegie Mellon University, USA)
Page	p. 1
Abstract	Verification tools check that a system meets a specified set of properties. For example, a verifier may check whether two digital circuits have identical functionality or that a software device driver adheres to a set of interface requirements. Such tools have become critical components in the design and validation of both hardware and software. One shortcoming of these tools is that bugs in their algorithms or their implementations may cause them to produce incorrect results. They may even state that the verification conditions are satisfied, even though the system being verified contains some critical flaw. Indeed, internal bugs are inevitable, given the complexity of these tools and the extensive engineering to maximize their performance. This is particularly vexing for formal verification tools, where they claim to provide rigorous results. The reliability of a verification tool can be greatly improved by requiring it to be self certifying, generating a certificate demonstrating that the provided answer is correct. This has become a common capability for Boolean satisfiability (SAT) solvers. When they determine that the formula is satisfiable, they generate a satisfying solution. For an unsatisfiable formula, they generate a checkable proof that the formula is indeed unsatisfiable. This proof can be checked by a simple program that itself has been formally verified. Such a capability means that the answers provided by the tool can be fully trusted, even if the tool itself is buggy. This talk describes recent work on certifying the results from a variety of tools, including SAT solvers, SMT solvers, knowledge compilers, and model checkers. These capabilities provide not only valuable assurances to the end users, they also help developers create more reliable tools.

[To Session Table]

Regular Poster Session I
Time: 10:30 - 12:00, Thursday, October 9, 2025
Chairs: Hajime Takayama (Kyoto Institute of Technology, Japan), Ankur Gupta (Netaji Subhas University of Technology, India)

Best Paper Award
R1-1

Title	Modeling of Dynamic Input Capacitance in Trench-Gate SiC MOSFETs via Voltage-Dependent Gate Oxide Capacitance Partitioning
Author	*Taiki Nishioka, Kazuki Matsumoto, Hajime Takayama (Kyoto Institute of Technology, Japan), Jun Furuta (Okayama Prefectural University, Japan), Kazutoshi Kobayashi, Michihiro Shintani (Kyoto Institute of Technology, Japan)
Page	pp. 2 - 7
Keyword	SiC, Trench, Gate Capacitance, Model, MOSFET
Abstract	This paper presents a dynamic input capacitance model for trench-gate silicon carbide (SiC) MOSFETs that accounts for voltage-dependent partitioning between the gate-to-source (Cgs) and gate-to-drain (Cgd) capacitances. Unlike conventional models, which assume static boundaries derived from planar device assumptions, the proposed model introduces a dynamically varying boundary governed by instantaneous changes in gate-source voltage This paper presents a dynamic input capacitance model for trench-gate silicon carbide (SiC) MOSFETs that accounts for voltage-dependent partitioning between the gate-to-source (Cgs) and gate-to-drain (Cgd) capacitances. Unlike conventional models, which assume static boundaries derived from planar device assumptions, the proposed model introduces a dynamically varying boundary governed by instantaneous changes in gate-source voltage Vgs and gate-drain voltage Vgd. The model is developed based on transient electric field behavior extracted from technology computer-aided design (TCAD) simulations, which reveal time-dependent shifts in the internal gate oxide field boundary during switching transitions. Experimental validation is conducted using gate charge measurements obtained from a double pulse test (DPT), with model fitting performed across a two-dimensional voltage space. Compared to the conventional static-boundary model, the proposed model achieves improved accuracy, reducing the relative error against the measurements. Furthermore, it successfully captures the voltage dependence of Cgs, thereby offering enhanced physical fidelity for circuit simulation and the design of trench-gate SiC MOSFETs. gs and gate-drain voltage Vgd. The model is developed based on transient electric field behavior extracted from technology computer-aided design (TCAD) simulations, which reveal time-dependent shifts in the internal gate oxide field boundary during switching transitions. Experimental validation is conducted using gate charge measurements obtained from a double pulse test (DPT), with model fitting performed across a two-dimensional voltage space. Compared to the conventional static-boundary model, the proposed model achieves improved accuracy, reducing the relative error against the measurements. Furthermore, it successfully captures the voltage dependence of Cgs, thereby offering enhanced physical fidelity for circuit simulation and the design of trench-gate SiC MOSFETs.

R1-2

Title	A Detailed Analysis of LLM Execution on IMAX3 and Initial Evaluation of IMAX4 Prototype for Server Environment
Author	*Takuto Ando, Yu Eto, Yasuhiko Nakashima (Nara Institute of Science and Technology, Japan)
Page	pp. 8 - 13
Keyword	LLM, IMAX, CGRA, FPGA
Abstract	In this paper, we present a detailed analysis of the IMAX3 CGRA-based accelerator for energy efficiency and the IMAX4 prototype with an upgraded host CPU.To address the host CPU bottlenecks of its predecessor, IMAX3, IMAX4 incorporates a server-oriented Intel Xeon processor and PCIe Gen5 connectivity, realizing IMAX scalability. We implemented and evaluated the IMAX4 prototype using microbenchmarks and the LLaMA3 8B quantized model. The results demonstrate significantly reduced host-side overheads and improved data transfer compared to IMAX3. While performance characteristics vary by quantization, IMAX4 shows promise, shifting bottlenecks from the host to data pathways and core performance, indicating potential for CGRA in server-based LLM acceleration.

R1-3

Title	GStreamer-integrated HLS-based JPEG Encoder for Edge FPGA SoCs
Author	*Yuri Guimaraes Pereira Primo da Silva, Shinya Honda (Nagoya University, Japan), Sugako Otani (Renesas Electronics Corporation, Japan), Masato Edahiro (Nagoya University, Japan), Abraham Monrroy Cano (MapIV, Japan)
Page	pp. 14 - 19
Keyword	Hardware Acceleration, JPEG Encoding, Vitis HLS, GStreamer, Kria KV260
Abstract	With the rise in ADAS (Advanced Driver Assistance System) and autonomous driving, there is a demand for lightweight image compression on edge devices. Although many image compression FPGA accelerators have been designed to provide high performance and low power consumption for data centers, they often do not meet the resource constraints of edge devices. This paper presents a hardware-accelerated JPEG encoder implemented using Vitis HLS on a AMD KV260 board for edge processing. We integrate the encoder with GStreamer, a multimedia framework that allows for easy integration with existing Linux tools. The system achieved the capture and encoding of 1080p NV12 images from a MIPI camera at 19.5 fps, demonstrating low variation when compared to CPU-based encoding, even under heavy load, and thus being suited for low-latency applications.

R1-4

Title	AutoPre-ACM:Autoencoder Based Precision-Enhanced Anomaly Detection For Cyber-Physical Attacks in MEDA Biochips
Author	*Purrnima Singh, Yash Gupta (Netaji Subhas University of Technology, New Delhi, India), Syed Rameem Zahra (Sher-e-Kashmir University of Agricultural Sciences and Technology, Kashmir, J&K, India), Ankur Gupta (Netaji Subhas University of Technology, New Delhi, India), Shigeru Yamashita (Ritsumeikan University, Japan)
Page	pp. 20 - 25
Keyword	Deep Learning, Lab-on-a-chip Security, Actuation Tampering Attacks, Cyber Physical System, MEDA biochip
Abstract	Addressing critical cyber-physical security concerns in MEDA biochips, an autoencoder-based anomaly detection framework is presented. While actuation tampering attacks pose significant threats, existing detection methods often struggle with operational noise, resulting in false alarms. Our approach leverages a 1D CNN autoencoder to learn complex normal patterns, enabling robust differentiation between benign system variations and malicious attacks. This demonstrates superior precision and accuracy, significantly enhancing the reliability and trustworthiness of digital microfluidic biochip operations.

R1-5

Title	Binary Synthesis from ARM Machine Code Using a General-Purpose High-Level Synthesis System
Author	*Yuga Sugimoto, Nagisa Ishiura (Kwansei Gakuin University, Japan)
Page	pp. 26 - 31
Keyword	Binary Synthesis, High-Level Syntesis, ARMv6, Thumb
Abstract	This paper presents the implementation of a binary synthesis system that utilizes a generalpurpose high-level synthesis (HLS) system as a backend, starting from ARM machine code. Binary synthesis is a technique that generates a hardware design description functionally equivalent to the CPU executing the original machine code program. However, implementing such a system for each instruction set architecture incurs a high development cost. Nakado has proposed a method for easily implementing binary synthesis systems by converting machine code programs into C programs, which are then synthesized into hardware using a general-purpose HLS system. Based on this approach, we implement a binary synthesis system that generates hardware design descriptions from machine code programs using the ARMv6 instruction set. ARMv6 includes features such as program counter-relative addressing for immediate load instructions and conditional execution based on flags. Additionally, division and modulo operations are typically handled by runtime libraries, which increase circuit size and execution time; to address this, we convert runtime library calls directly into division/modulo instructions. The proposed binary synthesis system is implemented in Python. Experimental results using machine code programs obtained from several C programs show that, compared to hardware generated directly from the original C programs, the circuit size increased by a factor of approximately 1.3 to 5.5, while the delay remained nearly the same.

R1-6

Title	Quantification of Design Difficulty of Analog Circuits Based on Volume of Effectual Design Space
Author	Riku Anan, *Akira Tsuchiya, Toshiyuki Inoue, Keiji Kishine (The University of Shiga Prefecture, Japan)
Page	pp. 32 - 37
Keyword	analog circuit, design automation
Abstract	This paper proposes a method to quantify difficulty level of analog circuits. Analog circuits have many design parameters and performance measures, so it is difficult to set adequate target specification as well as circuit design. However, there is less discussion how to evaluate the design difficulty level quantitatively. We focus on the volume of design space where the performance meets the target specification. The proposed method calculates the volume of design space by using linear classifiers and slack variables. Numerical experiments shows that the effective volume of design space can be used the measure of design difficulty level.

R1-7

Title	Flow-based Augmented Droplet Routing Algorithm for MEDA-Based DMFB
Author	*Emuun Purevdagva, Masayuki Shimoda, Satoshi Tayu, Atsushi Takahashi (Institute of Science Tokyo, Japan)
Page	pp. 38 - 43
Keyword	MEDA biochip, Droplet routing, Flow network problem
Abstract	Micro Electrode Dot Array-based Digital Microfluidic Biochip (MEDA) is a promising platform in biological and medical applications, offering high flexibility in droplet-based operations such as disease diagnosis, DNA analysis, and PCR testing. To fully exploit its capabilities, fast and accurate droplet routing is essential. In this work, we formulate droplet transportation in MEDA as a flow network problem and propose a high-speed algorithm that efficiently finds a minimum-time routing with low computation cost. The algorithm first determines the shortest routing time using maximum flow search, and then refines the solution using minimum-cost flow optimization to reduce redundant movements. The proposed algorithm enables efficient handling of large-scale problems, contributing to improved experimental efficiency of MEDA biochips.

R1-8

Title	Enhancing Hardware Trojan Detection via ATPG-Based Transition Delay Fault Testing with Split Fault Lists
Author	*Asuka Koike, Yutaka Masuda, Tohru Ishihara (Graduate School of Informatics, Nagoya University, Japan)
Page	pp. 44 - 49
Keyword	hardware trojan, automatic test pattern generation, transition delay fault
Abstract	Recent advances in LSI design have shifted development from vertically integrated models to horizontally specialized models involving multiple companies. Although cost-effective, this approach increases the risk of Hardware Trojan (HT) insertion by untrusted third parties. HTs are malicious circuit modifications that trigger under rare conditions, causing abnormal behavior or information leakage. This work presents an HT detection method based on post-silicon power analysis. The method tackles two key challenges: (1) the difficulty of activating HTs, and (2) the relatively small power consumption of HT circuits, which makes it difficult to distinguish their effects from power variations due to process variation. To address the first challenge, the proposed method employs automatic test pattern generation (ATPG) to comprehensively activate transition delay faults (TDFs) in a clean netlist, thereby increasing the likelihood of HT activation. To tackle the second challenge, our method partitions the TDF list to reduce the simultaneous activation of non-HT logic gates and better highlight the power dissipation associated with the target faults. Experimental evaluations using the Trust-Hub AES-T2500 benchmark demonstrate that the proposed method can effectively reveal relative power differences caused by HTs.

R1-9

Title	Voltage and Frequency Dependence of Single Event Transient Induced by Alpha-Particle
Author	*Arata Matsumoto, Haruto Sugisaki, Ryuichi Nakajima (Kyoto Institute of Technology, Japan), Jun Furuta (Okayama Prefectural University, Japan), Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan)
Page	pp. 50 - 51
Keyword	Soft error, Single Event Transients, Single Event Upsets, Frequency Dependence, Dynamic Irradiation
Abstract	soft error on a semiconductor chip is a temporary malfunction in a latch or ﬂip-ﬂop (FF). Soft errors caused by Single Event Transients (SETs) become more signiﬁcant at high clock frequencies over GHz. This study investigated the frequency dependence of SETs using a test circuit that eliminates the inﬂuence of Single Event Upsets (SEUs). By irradiating the circuit with alpha particles while the clock was running, soft error rates (SERs) were measured. A linear increase in SERs was observed below 700 MHz.

R1-10

Title	Evaluating Signal Integrity in InFO Package via Learning-based Methods
Author	*Yi-Hua Yeh, Haoru Chang, Hung-Ming Chen, Chien-Nan Liu (NYCU, Taiwan)
Page	pp. 52 - 57
Keyword	SI, InFO package, Learning-based methods
Abstract	This work investigates further signal integrity issues in the InFO package, focusing on high-bandwidth memory. Ensuring signal integrity is crucial in layout design to accommodate the increasingly complex modern circuit designs, particularly in high-speed signal transmission. This research proposes regression prediction models based on CatBoost, XGBoost, and LightGBM to predict the values of signal quality indicators such as eye height and eye width, providing a methodology to assess signal integrity. In this work, we use the systematic method of multi-layer modeling and angle simulation, signal integrity in design has been quantitatively characterized and optimized. Such a framework demonstrates high accuracy in evaluating and optimizing signal integrity of InFO packages through rapid evaluation.

R1-11

Title	A High Precision Heuristic for Motif Extraction Using Random Forests
Author	*Jigen Murata, Masato Inagi, Martin Lukac, Shin'ichi Wakabayashi, Shinobu Nagayama (Hiroshima City University, Japan)
Page	pp. 58 - 63
Keyword	Motif extraction problem, heuristic method, Gibbs sampling method, Random Forest, string classification
Abstract	The motif extraction is a problem to find similar substrings, called motifs, from many DNA sequences. The Gibbs sampling is the best known heuristic for this problem, but it has a disadvantage that it tends to fall into a local optimum because solution search space is limited due to its greedy search. This paper proposes a heuristic that uses Random Forests to classify substrings and performs a wide range of solution search. Since the proposed heuristic improves precision of solutions significantly, this paper also proposes a hybrid method combining the Gibbs sampling with the Random Forest based method to reduce computation time while keeping the precision of solutions.

R1-12

Title	Sizing Transformation Technique for Analog Design Migration Across Different Technologies
Author	Yao-Cheng Wu, King-Ho Wong, *Chien-Nan Jimmy Liu (Institute of Electronics, National Yang Ming Chiao Tung University, Taiwan), Chia-Tseng Chiang (Richtek Technology Corporation, Taiwan)
Page	pp. 64 - 65
Keyword	Technology Node Transfer, Sample Transformation, Initial Sizing, Analog Circuit Design Automation, Transistor Sizing
Abstract	While migrating to new technology, analog circuits often require a complete redesign due to different circuit parameters, which is a time-consuming process. This work proposes a quick sizing transformation technique to obtain the corresponding sizing of each device in new technology without simulation. Based on the assumption that the node voltages and operation region of each transistor remain the same after migration, the proposed approximation method obtains a new circuit based on previous comparison of an unit-sized transistor between different technologies. Although this approach cannot guarantee an optimized solution, it can help to bypass the global exploration in evolutionary optimization and significantly reduce the runtime by up to 65%, as shown in the experimental results.

R1-13

Title	Equalizing QAM Waveform Distortion with Linear SVM Classifier and its Machine Learning Dataset Generation
Author	*Yiwei Liu, Yukina Haruta, Yutaka Masuda, Tohru Ishihara (Nagoya University, Japan)
Page	pp. 66 - 71
Keyword	QAM, Waveform distortion, Machine learning, SVM classifier, Dataset generation
Abstract	Multi-level modulation formats such as quadrature amplitude modulation (QAM) have been widely used in modern wideband communications. QAM waveforms are distorted by various influences during communication. To learn the distortion trends in the QAM waveform accurately, this paper first focuses on generating datasets for QAM communications. The paper then proposes a method for equalizing QAM waveform distortion with a support vector machine (SVM) classifier which identifies the received QAM code based on pre-learned distortion trends. Simulation results obtained for our SVM classifier designed as a dedicated digital circuit demonstrate that the proposed method reduces the computational cost by 80% compared to existing methods that achieve the similar classification accuracy. We also confirmed that the classification accuracy was improved from 96.4% to 99.0% at 1.4 times the computational cost compared to one of the simplest existing SVM classifiers.

R1-14

Title	A Hardware Design Environment for ROS2 Node to FPGA-Integrated SoC
Author	Xingze Li (Nagoya University, Japan), *Ryota Yamamoto (National Institute of Technology, Tomakomai College, Japan), Shinya Honda (Nagoya University, Japan)
Page	pp. 72 - 77
Keyword	High Level Synthesis, ROS2, Co-design
Abstract	As robotic applications increasingly demand both high performance and energy efficiency, FPGA-integrated system-on-chip (SoC) platforms have emerged as a promising solution for offloading intensive workloads. Despite their potential, integrating hardware (HW) accelerators with ROS2-based applications remains a significant difficulties, often requiring deep domain expertise. In this paper, we present CWB2ROS, a development framework that enables seamless integration between ROS2 nodes and HW modules synthesized via high-level synthesis (HLS) using Cyber Work Bench (CWB). CWB2ROS supports core ROS2 communication models, including Publish/Subscribe, Service and Parameters, and automates the generation of necessary software wrappers. Experimental results on the Kria KR260 platform demonstrate that CWB2ROS achieves competitive communication latency and offers greater flexibility compared to existing solutions.

[To Session Table]

Invited Talk I
Time: 13:30 - 14:30, Thursday, October 9, 2025
Chair: Shinichi Nishizawa (Hiroshima University, Japan)

I1-1

Title	(Invited Talk) Towards Emerging Device Computing for the Post-Moore Era
Author	Koji Inoue (Kyushu University, Japan)
Page	p. 78
Abstract	Moore’s Law, doubling the number of transistors in a chip every two years, has so far contributed to the evolution of computer systems. Unfortunately, we cannot expect sustainable transistors to shrink anymore, marking the beginning of the so-called post-Moore era. In such a generation, we need to realize a paradigm shift from a quantitative approach that increases the number of transistors to a qualitative approach based on creating and utilizing emerging (or novel) devices. The fundamental question is “how can we fully exploit the significant potential of such new devices?” To answer this question, we are conducting research in computer architecture, focusing on superconducting technology, silicon photonics technology, quantum technology, and nanowire transistor technology. This talk shares our state-of-the-art research activities. Then, we try to discuss how to open the new door for next-generation ultra-low-power, high-performance computing by accelerating cross-layer interaction from material, device, circuit, architecture, software, algorithm, to application.

[To Session Table]

Regular Poster Session II
Time: 14:30 - 16:00, Thursday, October 9, 2025
Chairs: Hiroshi Saito (The University of Aizu, Japan), Chih-Tsun Huang (National Tsing Hua University, Taiwan)

Outstanding Paper Award
R2-1

Title	Optimal Golomb Coding via Dynamic Programming and Its Application on Large Language Model Compression
Author	Jen-Hung Yang, Zhi-Kai Xu, *Juinn-Dar Huang (National Yang Ming Chiao Tung University, Taiwan)
Page	pp. 79 - 84
Keyword	Golomb coding, Large language models, model compression, machine learning, AI
Abstract	Large language models (LLMs) pose computational and storage challenges for resource-constrained devices. Lossless compression is preferred over lossy methods like quantization. Huffman coding, though optimal, is hardware-inefficient. We propose optimal Golomb coding, a lossless scheme enhancing exp-Golomb’s hardware-friendly prefix structure with distribution-dependent suffix optimization.It improves compression by 12.8% over exp-Golomb,nearing 0.21% of the theoretical limit, and compresses LLaMA3-70B’s INT8 weights by 56.08%, enabling practical LLM deployment on edge devices.

R2-2

Title	On Ratio of Embedded RECON Spare Cell Types for Technology Remapping
Author	*Yasuaki Nabetani, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Page	pp. 85 - 90
Keyword	ECO
Abstract	This paper presents an approach to determine the ratio of RECON spare cell types embedded in the circuit to improve the feasibility of technology remapping using RECON cells to deal with post-mask ECOs through metal fixes in the LSI design process. The ratio of 2T/4T/6T-RECON spare cell types is determined by using the pre-calculated average number of RECON cells required for each possible modification pattern. These averages are weighed by the frequency of each pattern based on the number of cell inputs in the netlist. Experimental results have shown that the proposed approach improves the ECO success ratio by 1.3 to 2.0 pt, although reducing initial slack by 7 to 31%.

R2-3

Title	A Seamless Hardware/Software Switching Technique for Embedded Systems Using HDLRuby
Author	*Lovic Gauthier, Sachi Yoshigai (National Institute of Technology, Ariake College, Japan)
Page	pp. 91 - 96
Keyword	Hardware Description Language, Hardware/Software Co-Design, RTL Simulation, Co-Simulation
Abstract	This paper presents a new hardware/software (HW/SW) co-design technique for HDLRuby, a Register Transfer Level (RTL) Hardware Description Language. This technique enables automatic switching between HW and SW implementations for HDLRuby processes that describe finite state machines. It is designed to facilitate smooth HW/SW exploration and accelerated HW simulation. Experimental results show that, with this technique, simulation can be orders of magnitude faster than standard RTL simulation.

R2-4

Title	Speeding Up a Routing Method Considering Droplet Splitting on MEDABiochips by Dijkstra’s Method
Author	*Issei Nakamura, Shigeru Yamashita, Hiroyuki Tomiyama (Ritsumeikan University, Japan), Ankur Guputa (Netaji Subhas University of Technology, India)
Page	pp. 97 - 102
Keyword	MEDA, Optimal Routing, Droplet Split, Dijkstra’s Algorithm
Abstract	Recently, a biochip called MEDA (Micro Electrode Dot Array) has been studied in the field of biochemistry. When conducting experiments using MEDAs, the droplets are moved to the sink point. In this operation, the droplet is moved to the sink point by selecting the optimal path. The time required to move a droplet to the sink point can be reduced by selecting an optimal path. This paper proposes a method to find the optimal path and timing of droplet splitting by applying the Dijkstra method to weighted graphs.

R2-5

Title	EMESN: an Extended MOSFET Reservoir Computing Architecture for Echo State Networks with Hardware-Software Co-Optimization
Author	*Haoyuan Li (Xi'an Jiaotong University/Kyoto University, China), Masami Utsunomiya, Ryuto Seki, Takashi Sato (Kyoto University, Japan), Feng Liang (Xi'an Jiaotong University, China)
Page	pp. 103 - 108
Keyword	Reservoir Computing, Echo State Network, MOSFET-based Hardware, Genetic Algorithm Optimization, Time-Series Classification
Abstract	This paper presents EMESN, an extended MOSFET hardware reservoir–computing architecture for time-series tasks. A pulse-based crossbar exploits intrinsic threshold-voltage variation to realize fixed random weights, while multi-mask mapping, ADC‐range tuning and genetic optimization jointly enhance inference accuracy. Evaluations on eight public datasets demonstrate accuracy gains of up to 10.4 points and a 5 times reduction in accuracy standard deviation, all with markedly lower static power than previous MOS-ESN designs.

R2-6

Title	Development of Tsugaru Dialect Translation System Using Transparent Display
Author	*Haruto Saito, Masashi Imai (Hirosaki University, Japan)
Page	pp. 109 - 114
Keyword	Translation system, speech recognition, Artificial Intelligence, Tsugaru dialect, Docker
Abstract	The Tsugaru region of Aomori prefecture possesses a distinctive regional dialect known as Tsugaru-ben. The dialect can lead significant communication issues between local residents and people from outside the region. This paper presents the prototype construction of a Tsugaru dialect translation system utilizing a transparent display and its system architecture. The system consists of three main components including an automatic speech recognition module, a translation module from the Tsugaru dialect to standard Japanese, and a user interface module. Evaluation results demonstrate a complete processing pipeline capable of handling end-to-end translation within an average of 1.16 seconds, resulting in the first real-time Tsugaru dialect translation system.

R2-7

Title	Accelerated Behavioral Simulation for Optimizing MOSFET-Based Echo State Networks
Author	*Ryuto Seki, Masami Utsunomiya (Kyoto University, Japan), Haoyuan Li (Xi'an Jiaotong University, China), Hiromitsu Awano, Takashi Sato (Kyoto University, Japan)
Page	pp. 115 - 120
Keyword	Echo State Network, MOSFET, Behavioral Simulation
Abstract	The Leakage-based MOSFET Echo State Network (LMESN) is a hardware reservoir computing architecture that achieves low power consumption by leveraging the subthreshold leakage current of MOSFETs, making it a promissing candidate for edge-oriented AI applications. In this paper, we propose a fast behavioral simulation method to accurately capture the transistor-level characteristics of LMESN. Using this method, we investigate how the inference accuracy of LMESN is influenced by two key factors: the bit width of the feedback signal and the device temperature.

R2-8

Title	A Design Method for Single-Rail LUT Cascades
Author	*Tsutomu Sasao (Meiji University, Japan)
Page	pp. 121 - 126
Keyword	FPGA, cascade, LUT, decomposition
Abstract	This paper presents a method to realize logic functions by single-rail LUT cascades. Main results include: 1) Any 2m-variable function can be realized by a single-rail cascade with (m+1)-LUTs. The number of LUTs is at most 2^m+1-1. There exists a 2m+1 variable function that cannot be realized by the single-rail cascade with (m+1)-LUTs. 2) When a 2m-variable function has a functional decomposition f(X1,X2), where X1 and X2 have m variables, and the column multiplicity of the decomposition is µ, the number of LUTs can be reduced to 2µ-1. 3) Any n-variable function can be realized by a single-rail cascade with seven (n-1)-LUTs.4) Ad-hoc methods to realize a n-variable function using two or three (n-1)-LUTs.

R2-9

Title	Implementation of Interrupt Handlers in Full Hardware Implementation of RTOS-Based Systems
Author	*Yuki Nakatani, Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama, Hiroyuki Kanbara (Ritsumeikan University, Japan)
Page	pp. 127 - 132
Keyword	RTOS, Handler, Full Hardware
Abstract	This paper proposes a method for imple menting interrupt handling in the context of a fully hardware-implemented RTOS-based system. To en hance the responsiveness of real-time systems, previ ous research by Oosako et al. has explored the com plete hardware implementation of RTOS function alities and tasks. Ando has proposed an architec ture that enables task hardwareization using general purpose high-level synthesis; however, this approach does not address interrupt handling. In this paper, we present a hardware implementation of alarm handlers, cyclic handlers, and interrupt handlers. Each handler is associated with a dedicated timer that counts down and triggers the handler when the counter reaches zero, thereby simplifying the control logic. The inter rupt handler is designed to accelerate activation by evaluating invocation conditions in parallel. Service calls related to handlers are implemented by updating or referencing specific status registers. Experimen tal results demonstrate that, in the proposed RTOS hardware, alarm and interrupt handlers can be trig gered within 1 and 2 cycles, respectively, after their conditions are satisfied. Furthermore, all service calls related to handlers can be executed within 5 cycles.

R2-10

Title	An Error Diagnosis Technique Applicable to Single Line Errors Based on Location Variable Simulation
Author	*Kazuki Sakamoto, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Page	pp. 133 - 138
Keyword	Error diagnosis, ECO
Abstract	This paper presents an error diagnosis technique applicable to single line errors based on the location variable (LV) simulation and the truth variable (TV) simulation. The LV-simulation, originally proposed for diagnosing functional errors, has been extended to diagnose signal line errors to efficiently diagnose both functional and single line errors. Experimental results have shown that the proposed technique reduces the processing time by 99.7% for the C7552 benchmark circuit and by 73.7% for the C5315 benchmark circuit compared to the conventional error diagnosis technique applicable to single line errors.

R2-11

Title	Multi-Objective Optimization of RESURF Structure in SiC MOSFET for I-V and C-V Characteritics
Author	*Sota Oyama (Hirosaki University, Japan), Ichirota Takazawa (Jedat Inc., Japan), Satoru Honda, Toshiki Kanamoto (Hirosaki University, Japan)
Page	pp. 139 - 144
Keyword	Power, SiC, MOS, Optimization, AI
Abstract	-This paper proposes an optimization method for the vertical RESURF structure implemented in SiC trench MOSFETs, which are utilized as one of the most energy-efficient devices for automotive power modules. Regarding energy efficiency, the RESURF structure is introduced to enable operation at higher voltages. The most critical parameter of the RESURF structure is the thickness of the P-type vertical RESURF region surrounding the N-drift layer near the drain. We leverage artificial intelligence to optimize the thickness for the desired electrical characteristics. Our previous work searched for the thickness to meet the desired current-voltage (I-V) characteristics by applying simulated annealing. In this paper, we further enhance the method to also optimize for capacitance-voltage (C-V) characteristics. First, we model the relationship between the output characteristics and the layer thicknesses using machine learning. Based on the resulting model, we then employ simulated annealing to identify the optimal thicknesses that satisfy the desired I-V as well as C-V characteristics. Experimental results demonstrate that the proposed method successfully attains the Pareto-optimal front with respect to both I-V and C-V characteristics, achieving an adjusted coefficient of determination greater than 0.95 in the regression analysis.

R2-12

Title	A Systematic Hardware Solution for GDPR Compliance
Author	Yi-Chun Yang, *Ren-Song Tsay (National Tsing Hua University, Taiwan)
Page	pp. 145 - 147
Keyword	Access control, accountability, data ownership, GDPR, hardware security
Abstract	The increasing deployment of Internet of Things (IoT) devices presents significant challenges for compliance with the General Data Protection Regulation (GDPR) due to their inherent privacy concerns and often opaque operational nature. This paper introduces GDPR-Guard, a novel and systematic hardware-based solution embedded within IoT devices to ensure GDPR compliance. By shifting control from enterprises to users through a transparent "glass box" approach, GDPR-Guard enhances accountability and transparency by auditing the entire device lifecycle from manufacturing. This paper details the architecture and functionality of the GDPR-Guard hardware component, its integration into the device manufacturing process under supervisory authority (SA) oversight, and its mechanisms for enforcing consent-based access control and generating tamper-proof audit records. A proof-of-concept implementation and security/performance evaluations demonstrate the feasibility and effectiveness of GDPR-Guard as a systematic hardware solution for achieving GDPR compliance in IoT networks.

R2-13

Title	Reducing Registers in Convolution Operation for Binarized Neural Networks with Register-Bridge LSI Architecture
Author	*Jun Masuda, Kazuhito Ito (Saitama University, Japan)
Page	pp. 148 - 153
Keyword	Machine learning, BNN, LSI
Abstract	Convolutional neural networks are widely used to implement machine learning such as image recognition. BNNs, which binarize data and convolution weights, are advantageous in terms of reducing power consumption and miniaturizing implementations. In this paper, a method to reduce the number of registers required to store data and weights in LSI implementations of BNNs is proposed using the register bridge architecture. The number of register bits was reduced by 44% compared to using the conventional architecture.

R2-14

Title	Extending the Single-Target Droplet Generation Method CoDOS to Multi-Target Synthesis
Author	Yusuke Igarashi, *Shigeru Yamashita (Ritsumeikan University, Japan)
Page	pp. 154 - 158
Keyword	CoDOS, Multi-Target
Abstract	This paper presents an enhanced method based on CoDOS, a droplet generation algorithm for DMFB biochips, to efficiently generate multiple droplet types. By optimizing droplet generation, our approach reduces reagent usage and operational costs. Compared to manual synthesis and dilution, our method offers improved automation and cost-effectiveness. Experimental results demonstrate approximately a 1% improvement in droplet efficiency

[To Session Table]

Panel Discussion
Time: 16:00 - 17:30, Thursday, October 9, 2025
Moderator: Shin-ichi Minato (Kyoto University, Japan)

D-1 (Time: 16:00 - 17:30)

Title	(Panel Discussion) Classical Techniques Meet Emerging Technologies: Retrospectives and Future Prospects
Author	Moderator: Shin-ichi Minato (Kyoto University, Japan), Panelists: Randal E. Bryant (Carnegie Mellon University, USA), Hideki Nishizawa (NTT Network Innovation Laboratories, Japan), Jie-Hong Roland Jiang (National Taiwan University, Taiwan), Koji Inoue (Kyushu University, Japan), Chun-Yao Wang (National Tsing Hua University, Taiwan), Shigeru Yamashita (Ritsumeikan University, Japan), Organizer: Shin-ichi Minato (Kyoto University, Japan)
Page	p. 159
Abstract	Generative AI is currently taking the world by storm, reshaping how we approach computation and design. Yet in areas where generative models fall short -- such as formal verification and structured problem solving -- symbolic computation is expected to play an increasingly criticall role. This renewed interest in classical methods reflects broader trends: analog computation, once replaced by digital techniques, is now back in the spotlight with the rise of quantum computing; circuit-switched communication, long overtaken by packet switching, is regaining relevance in optical routing. In this panel, we invite each speaker to hear about a piece of research from earlier in their career -- something that, when revisited today, might offer fresh insights or unexpected relevance. Following brief presentations, we will have an open and informal discussion with the audience, exploring how foundational ideas can inform future directions and foster interdisciplinary innovation.

Friday, October 10, 2025

[To Session Table]

Keynote Speech II
Time: 9:20 - 10:30, Friday, October 10, 2025
Chair: Seiya Shibata (NEC, Japan)

K2-1 (Time: 9:20 - 10:30)

Title	(Keynote Speech) New Optical Path Design Trend on the IOWN Global Forum Open All-Photonics Network: Background, Application, and Key Enablers
Author	Hideki Nishizawa (NTT Network Innovation Laboratories, Japan)
Page	p. 160
Abstract	The evolution of network infrastructure is entering a transformative phase, driven by the demand for ultra-low latency, massive bandwidth, and sustainable scalability. The IOWN Global Forum’s Open All-Photonics Network (Open APN) introduces a disruptive architectural shift through end-to-end photonics-based transmission. This presentation explores emerging trends in optical path design within the Open APN framework, emphasizing the transition from traditional electrical routing to optical switching. This shift enables a new paradigm of intelligent, high-performance networking tailored for the AI era. We begin by providing a technical overview of the Open APN, focusing on the miniaturization and power efficiency of transponders enabled by digital coherent technology, as well as the standardization of their control interfaces. Next, we present practical use cases and the value proposition of advanced optical path design. Finally, we identify key technological enablers underpinning these innovations—including physical modeling of transceivers and line systems, digital longitudinal monitoring, and integration with digital twin-based control systems. This session aims to offer insights into how these design trends will shape the next generation of optical networks and the broader digital ecosystem.

[To Session Table]

Regular Poster Session III
Time: 10:30 - 12:00, Friday, October 10, 2025
Chairs: Hiroyuki Uzawa (NTT, Japan), Martin Lukac (Hiroshima City University, Japan)

Outstanding Paper Award
R3-1

Title	Cross-Design Power Trace Prediction using Graph Neural Network
Author	Shih-Chun Lin, *Bo-Hao Haung, Yung-Chih Chen (National Taiwan University of Science and Technology, Taiwan), Wang-Dauh Tseng (Yuan Ze University, Taiwan)
Page	pp. 161 - 166
Keyword	Power estimation, machine learning, graph neural network
Abstract	Accurate cycle-by-cycle power estimation plays a critical role in the early stages of chip design, facilitating power, performance, and area (PPA) optimization and accelerating time-to-market. Recently, machine learning (ML)-based methods have emerged to strike a balance between estimation speed and accuracy compared to traditional electronic design automation (EDA) tools. However, existing cycle-based ML approaches typically require model retraining for each new design, significantly limiting their general applicability and efficiency during early-stage design exploration. To overcome this challenge, this paper introduces a graph neural network (GNN)-based estimator specifically developed for gate-level cycle-by-cycle power prediction, aiming for cross-design generalizability. By encoding standard cell types derived directly from the design library into node embeddings, our proposed model effectively generalizes to unseen circuit designs without retraining. Experimental results demonstrate that our GNN-based estimator offers significantly faster cycle-by-cycle power estimation compared to commercial EDA tools, while maintaining a high level of accuracy. This enables more practical and efficient power analysis in modern VLSI design flows.

R3-2

Title	ILP-Based Movable Layout Replacer for Standard Cells with Extending Metal Boundaries
Author	Ya-Chu Yang, Shih-Sian Tang, *Chen-Chen Yeh, Shao-Chien Lu, Hui-Lin Cho, Yu-Cheng Lin, Rung-Bin Lin (Department of Computer Science and Engineering, Yuan Ze University, Taiwan)
Page	pp. 167 - 172
Keyword	Standard cell, library, integer linear programming
Abstract	We introduce a new standard cell architecture with extended metal boundaries, based on the ASAP7 PDK, to improve routability in complex designs. By integrating these cells with vertically shrunk counterparts and applying an layout replacer based on linear integer programming that allows cell movement, our method achieves an average routing overflow reduction of 4.08% and 13.75% for fixed and movable layout without any DRC violations.

R3-3

Title	Compact QUBO Formulation of Resource-Constrained Operation Scheduling in High-Level LSI Design
Author	*Haruki Yamagishi, Takuto Kishimoto, Kazuhito Ito (Saitama University, Japan)
Page	pp. 173 - 178
Keyword	Ising model, QUBO, scheduling, LSI
Abstract	Resource-constrained operation scheduling in LSI design determines the start time of operation execution so as to satisfy precedence constraints, and is known to be an NP-hard combinatorial optimization problem. By formulating the operation scheduling problem as a QUBO model, it is possible to search for an optimal operation schedule through parallel solution of the QUBO model. This paper proposes a QUBO formulation for operation scheduling that reduces the number of variables. As a result, the number of variables was reduced by up to 65%, and the solving time was reduced by up to 81%.

R3-4

Title	Efficitent FPGA Implementation of Multiple-Input Adders Using Generalized Parallel Counter (6,0,7;5)
Author	Mugi Noda, *Ryo Kanai, Nagisa Ishiura (Kwansei Gakuin University, Japan)
Page	pp. 179 - 184
Keyword	GPC, FPGA, neural networks, multi-input adders
Abstract	This paper proposes an efficient method for implementing multi-input adders on FPGAs, which are essential components in multipliers and neural networks, by hierarchically connecting 6-input 2-output adders. One major approach for FPGA implementation of multi-input adders involves constructing a tree of carry-save adders using Generalized Parallel Counters (GPCs) optimized through integer linear programming. However, many GPCs with a lowest-level input of 7 often consume two units of FPGA slices (basic FPGA components) resulting in reduced efficiency. In this research, we utilize the GPC (6,0,7;5), which achieves the highest bit reduction rate and can be implemented in a single slice only when the carry outputs are chain-connected. By cascading this GPC, we construct a 6-input, 2-output adder. These adders are then arranged into a carrysave tree structure to perform multi-input addition. Based on this method, we designed circuits to add m binary numbers of n bits for n = 16, 32, 64 and m = 16, 32, ..., 512, targeting the Xilinx 7 Series FPGA. The results demonstrates that, on average, the proposed method reduced the circuit area by 8.9%, the critical path delay by 6.7%. The time required for circuit construction to less than 0.001% compared to conventional methods.

R3-5

Title	Genetic Algorithm-based Layer-wise Adaptive Filter Pruning
Author	Ting-Yi Liu, Yi-Ting Li, Wuqian Tang (National Tsing Hua University, Taiwan), *Yung-Chih Chen (National Taiwan University of Science and Technology, Taiwan), Shih-Chieh Chang, Chun-Yao Wang (National Tsing Hua University, Taiwan)
Page	pp. 185 - 190
Keyword	model pruning, genetic algorithm, filter pruning
Abstract	Model pruning reduces model size and inference cost by removing redundant weights, channels, or filters. However, existing filter pruning methods often rely on fixed heuristics and global criteria, limiting their adaptability. They also require manually set pruning ratios, which are inefficient to tune. To address these issues, we propose a genetic algorithm-based pruning approach that automatically determines layer-wise pruning ratios and criteria, enabling adaptive, data-driven compression without manual intervention.

R3-6

Title	Evaluation of Free-form Conversation Learning Effects in a Tsugaru Dialect Speech Recognition Model
Author	*Akihiro Murakami, Masashi Imai (Hirosaki University, Japan)
Page	pp. 191 - 192
Keyword	Tsugaru dialect, Speech recognition system, Free-form conversation, Artificial Intellitence
Abstract	The Tsugaru dialect, which is a regional vernacular of Aomori Prefecture, can pose communication challenges between local residents and individuals from outside the prefecture. We are conducting research aimed at developing a bidirectional speech and text conversion system between the dialects and standard Japanese utilizing artificial intelligences. This paper presents the results of evaluating the impact of training on spontaneous Tsugaru dialect speech data to improve automatic speech recognition accuracy.

R3-7

Title	High-Speed SIFT Descriptor Generation with 36 Small-Region Division and Logic-Synthesis Evaluation
Author	*Ayumu Mitsumoto, Tetsuo Hironaka (Hiroshima City University, Japan)
Page	pp. 193 - 198
Keyword	SIFT, Acceleration, Image Processing, Architecture, Hardware
Abstract	SIFT’s rotation operation is sequential and therefore becomes a bottleneck in hardware implementations. We divide the descriptor region into 36 small regions, compute orientation histograms independently, and sum them into 17 subregion histograms, enabling 36-way parallel processing. This enables fast execution and high matching accuracy under accuracy-oriented parameter settings. Furthermore, we present the RTL design of the feature descriptor generator and perform logic synthesis with FreePDK45, achieving up to 31.8 times speedup compared with the method proposed in previous research.

R3-8

Title	Numberlink Problem Variants Modeled after FPGA Routing Fabrics and their Solvers that Enumerate all the Solutions
Author	*Ryohei Komi, Hiroyuki Ochi (Ritsumeikan University, Japan)
Page	pp. 199 - 204
Keyword	combinatorial problem, zero-suppressed binary decision diagram, field programmable gate array, design space exploration of routing architecture
Abstract	In this study, we define numberlink problem variants that mimic the routing fabrics of FPGAs, and develop solvers that enumerate all their solutions. The target FPGA architectures are early SRAM-based FPGAs and via-switch FPGAs. The existing method, which uses a top-down ZDD construction method (TdZdd), efficiently enumerates all solutions to the numberlink problem; however, it is specialized for planar grid-based routing problems. In this study, we extend the algorithm to multi-layer problems, targeting actual FPGAs where horizontal and vertical segments may overlap without intersections.

R3-9

Title	Implementation and Evaluation of a Speculative Execution-Based FPGA Accelerator for Electronic Circuit Simulation Using Gauss-Jordan and BiCGSTAB Methods
Author	*Yuma Omoto, Atsushi Kubota, Tetsuo Hironaka (Hiroshima City University, Japan)
Page	pp. 205 - 210
Keyword	FPGA, Electronic Circuit Simulator, Speculative Execution, GJE Method, BiCGSTAB Method
Abstract	This paper proposes a speculative execution system on FPGA to accelerate circuit simulation by combining Gauss–Jordan Elimination (GJE) and the BiCGSTAB method. Both solvers run in parallel, and the first to converge is adopted to reduce latency. Experiments on 20×20, 40×40, 80×80, and 160×160 matrices measured latency and resource use to evaluate scalability. A prototype on a Xilinx ZCU104 FPGA was further tested with a 20×20 circuit. Results show that GJE is effective for small problems, while BiCGSTAB achieves higher efficiency for larger dimensions and certain conditions, confirming the benefit of the proposed speculative execution approach.

R3-10

Title	Efficient and Accurate SC Arithmetic Circuits Using Bit Manipulation Based on Interval Partitioning of Bit Strings
Author	*Yota Yanagida, Shigeru Yamashita (Ritsumeikan University, Japan)
Page	pp. 211 - 216
Keyword	stochastic computing, bit-shuffling
Abstract	Stochastic computing (SC) is an approximate computation method using Stochastic Numbers (SNs), which are expressed based on the probability of 1s in a bitstream. Generating low-correlated SNs generally requires the use of independent Linear Feedback Shift Registers (LFSRs), which increases circuit area. In this paper, an SC arithmetic circuit using a single LFSR and bit-shuffling with bit manipulation based on interval partitioning is proposed. The proposed method is applied to an SC arithmetic circuit and the computational accuracy and circuit area are evaluated. The proposed method outperforms the conventional method using independent LFSRs in both computational accuracy and circuit area for SC arithmetic circuits.

R3-11

Title	Logic Gate Design Using Vertical Nanowire Transistors
Author	*Genta Nakamura (Kyushu University, Japan), Katsuhiro Tomioka (Hokkaido University, Japan), Koji Inoue (Kyushu University, Japan)
Page	pp. 217 - 222
Keyword	Logic Gate Design, TFET, Novel Device, Low Power Comsumption
Abstract	Reducing power consumption of VLSI systems has become increasingly important. Although power consumption can be reduced by lowering the supply voltage, it is difficult to lower the supply voltage of conventional MOSFETs below 0.6V. In this study, we propose structure of logic gates using a novel device, VGAA-TFET (Vertical Gate-All-Around Tunnel Field Effect Transistor), which is capable of operating at supply voltages lower than the minimum voltage limit of conventional MOSFETs and evaluate the proposed structures in terms of area and wire-length.

R3-12

Title	Subitizing-Inspired Large Language Models for Floorplanning
Author	Chen-Chen Yeh, *Shao-Chien Lu, Hui-Lin Cho, Yu-Cheng Lin, Rung-Bin Lin (Department of Computer Science and Engineering, Yuan Ze University, Taiwan)
Page	pp. 223 - 228
Keyword	floorplanning, large language model, electronic design automation
Abstract	We present a novel approach to solving the floorplanning problem by leveraging fine-tuned Large Language Models (LLMs). Inspired by subitizing, the human ability to instantly count small numbers of items at a glance, we hypothesize that LLMs can similarly address floorplanning challenges accurately. Our experimental results demonstrate that LLMs achieve high success and optimal rates while attaining relatively low average dead space. These findings underscore the potential of LLMs as promising solutions for complex optimization tasks in VLSI design.

R3-13

Title	Reinforcement Learning-Based Loop Optimization Using the Polyhedral Model
Author	*Hayato Takahashi, Motoki Amagasaki, Masato Kiyama, Kenshu Seto, Mery Diana (Kumamoto University, Japan)
Page	pp. 229 - 234
Keyword	Hig-Level Synthesis, Polyhedral Model, Loop Optimization, Reinforcement Learning
Abstract	In current high-level synthesis technology, code optimization is often required in order to improve the performance of the generated hardware. A program typically contains very many instructions in loops and especially nested loops, thereby degrading its performance considerably, so it is necessary to execute nested loops efficiently. As a formal framework for the structure of a program, the polyhedral model facilitates the construction of a complete nested loop. However, it remains challenging to obtain an optimal solution that minimizes the number of instruction executions.Therefore, this paper reports to use of reinforcement learning to search effectively for the optimal fully nested loop. Performance evaluation is conducted by analyzing the number of instructions executed in the generated code and the circuit synthesized by the high-level synthesis tool.

R3-14

Title	Evaluation of FPGA Development Boards in a Cryogenic Environment
Author	*Tomoki Takashima, Masashi Imai (Hirosaki University, Japan)
Page	pp. 235 - 240
Keyword	Superconducting quantum computer, FPGA, Cryogenic environment, Asynchronous circuit
Abstract	The practical application of very large-scale superconducting quantum computers is highly anticipated. Although it has been proposed that the use of FPGAs in cryogenic environments for their control circuits, the feasibility has not been sufficiently evaluated. This study evaluates the performance of FPGA development boards under cryogenic conditions. In addition, the effectiveness of using asynchronous circuits is also evaluated to address changes in circuit characteristics under cryogenic conditions. As a result, it can be observed that the FPGA development boards do not operate normally at 4K. This paper presents the causes of failure and possible countermeasures based on changes in power consumption and delay time prior to the failure.

[To Session Table]

Invited Talk II
Time: 13:30 - 14:30, Friday, October 10, 2025
Chair: Kenshu Seto (Kumamoto University, Japan)

I2-1 (Time: 13:30 - 14:30)

Title	(Invited Talk) A Symbolic Approach to Exact Quantum Circuit Simulation and Verification
Author	Jie-Hong Roland Jiang (National Taiwan University, Taiwan)
Page	p. 241
Abstract	Recent advancements in quantum technologies are paving the way toward practical quantum computation in the near future. Within this landscape, accurate quantum circuit simulation and verification are critical components in the development of quantum computing systems. However, due to the exponential growth of the Hilbert space with the number of qubits, classical simulation and verification of quantum circuits remain extremely challenging. In this talk, we present advances in both the accuracy and scalability of quantum circuit simulation and verification. To achieve exactness, we represent complex numbers algebraically rather than numerically. For scalability, we adopt a bit-sliced number representation and perform matrix-vector and matrix-matrix operations using symbolic Boolean manipulation. As a result, our simulation framework demonstrates substantial improvements over state-of-the-art tools, successfully handling quantum circuits with up to tens of thousands of qubits. In the context of verification, our method offers superior scalability and exactness, effectively overcoming the limitations of previous inexact approaches. Moreover, it exhibits significantly greater robustness in verifying functionally dissimilar circuits—an area where existing methods often struggle.

[To Session Table]

Regular Poster Session IV
Time: 14:30 - 16:00, Friday, October 10, 2025
Chairs: Yutaka Masuda (Nagoya University, Japan), Kun-Chih Chen (National Yang Ming Chiao Tung University, Taiwan)

Outstanding Paper Award
R4-1

Title	Combatting Transient Errors and Aging in Heterogeneous Multicores: A Framework for Reliable and Energy-Efficient Task Deployment
Author	Yin-Rong Zhuo, *Yu-Guang Chen (National Central University, Taiwan), Zheng-Wei Chen (National Taiwan University, Taiwan), Ing-Chao Lin (National Cheng Kung University, Taiwan)
Page	pp. 242 - 247
Keyword	Heterogeneous multicore systems, aging effects, DVFS, energy efficient
Abstract	CMOS advancements allow ICs to handle critical tasks, but balancing performance and energy efficiency challenges edge devices. Heterogeneous multicore systems optimize energy while meeting performance demands. Yet, transient errors and aging effects degrade reliability. Strategies like task replication and DVFS address these, but poor integration shortens lifespan. We introduce a reliability-aware task deployment framework to extend lifespan, ensure reliability, and reduce energy. Experiments show a 4.98× lifespan increase and 44% energy reduction compared to prior methods.

R4-2

Title	HW/SW Co-Design for Efficient GPT-2 Inference on FPGA via High-Level Synthesis
Author	Shao-Tang Sung, Yi-Wen Tang, Fen-Yu Hsieh, Rong-Yi Lin, *Fang-Yu Hsu, Chih-Tsun Huang (National Tsing Hua University, Taiwan)
Page	pp. 248 - 253
Keyword	High-Level Synthesis, Large Language Model, GPT-2, Hardware, FPGA
Abstract	This paper presents an efficient hardware accelerator for GPT-2 inference on an AMD Alveo U280 FPGA using High-Level Synthesis (HLS). With row-wise GEMM scheduling, we optimize GPT-2 components via data packing, loop unrolling, and kernel fusion. Our design achieves a 2.16x speedup over CPU while consuming only 23% of the power, demonstrating a scalable and sustainable solution for deploying Transformer-based models in resource-constrained environments.

R4-3

Title	Optimization of Power, Area, and Slack via Multi-bit Flip-Flop Generation
Author	Chi Hsu, Yi-Ting Li, Woei-Haur Hung, Chun-Yao Wang, *Ting-Chi Wang (National Tsing Hua University, Taiwan)
Page	pp. 254 - 259
Keyword	Multi-bit Flip-flop, VLSI Optimization, Timing Optimization, Power-Area Trade-off
Abstract	Power and area are key objectives to minimize in modern circuit design. Multi-bit flip-flops (MBFFs) are commonly used to reduce clock load and layout area but may increase total negative slack (TNS) and degrade timing. Balancing these metrics remains a critical challenge. We propose an approach that dynamically adjusts banking and debanking of flip-flops to simultaneously optimize these metrics across different scenarios. Experiments on the test cases of the ICCAD 2024 contest [1] show that on average, our approach reduces the value of the objective function by 7.8%, and shortens the execution time by 41% compared to the first-place winner, demonstrating both effectiveness and efficiency.

R4-4

Title	Error Recovery in MEDA Biochips Using Deep Reinforcement Learning with Electrode Health Awareness
Author	*Yash Gupta, Purrnima Singh (Netaji Subhas University of Technology, New Delhi, India), Syed Rameem Zahra (Sher-e-Kashmir University of Agricultural Sciences and Technology, Kashmir, J&K, India), Ankur Gupta (Netaji Subhas University of Technology, New Delhi, India), Shigeru Yamashita (Ritsumeikan University, Japan)
Page	pp. 260 - 265
Keyword	Swin Transformer, Proximal Policy Optimization, Temporal Graph Transformer, Graph neural netowrk - ordinary differntial equation, Biochip life-span
Abstract	Micro-Electrode-Dot-Array (MEDA) biochips offer flexible, scalable solutions for applications like diagnostics and DNA sequencing. However, frequent hardware issues limit their reliability. To address this, we introduce a deep reinforcement learning (DRL)-based framework that adapts in real time using sensor feedback. Uniquely, it also accounts for electrode-health degradation, improving system resilience. This intelligent, adaptive approach enhances the biochip’s reliability, making it more suitable for real-world, long-term use.

R4-5

Title	A Stochastic Number Comparator by Utilizing Positive Correlation
Author	*Nao Shinoda, Zhou Songyu, Shigeru Yamashita (Ritsumeikan University, Japan)
Page	pp. 266 - 271
Keyword	Stochastic Computing, Comparator, Positive Correlation
Abstract	Stochastic Computing (SC) is an approximate computation method that encodes values by the probability of 1s in bit streams. One of the possible applications of SC computation are image processing applications where median filters are widely used for noise reduction. To implement SC median filters, we need to compare stochastic numbers (SNs). However, conventional SC comparison circuits typically rely on up-down counters, leading to a large circuit area. Considering the above, this paper proposes a compact SN comparator that exploits the positive correlation between SNs. The proposed design reduces the circuit area by approximately 91.0%, from 1581μm2 to 142μm2, compared to conventional approaches.

R4-6

Title	Concurrent Detection of Multiple Thermal Fault Injection Attacks on Optical Neural Networks
Author	*Kota Nishida, Yoshihiro Midoh, Noriyuki Miura (The University of Osaka, Japan), Satoshi Kawakami (Kyushu University, Japan), Alex Orailoglu (University of California, San Diego, USA), Jun Shiomi (The University of Osaka, Japan)
Page	pp. 272 - 277
Keyword	Silicon Photonics-based AI Accelerator (SPAA), Optical Neural Network (ONN), Thermal Fault Injection Attack
Abstract	Optical Neural Networks (ONNs) have been regarded as a promising paradigm providing high computational efficiency for artificial intelligence-based applications. Although ONNs have been widely studied to maximize their computational efficiency, their physical security has only recently been paid attention. This paper tackles with thermal fault injection attacks on Silicon Photonics AI Accelerators (SPAAs), which tampers with optical signals in SPAAs to cause misprediction. A concurrent detection method of thernal fault injection attacks is proposed in this paper. The proposed method achieves over 96% attackcaused average misprediction recall with no significant hardware and computational overhead.

R4-7

Title	A Design Hackathon to Bridge AI and Hardware
Author	*Hideharu Amano, Takao Goto, Mizuho Nitami, Yuki Mitarai, Jiawei Yu, Yuxuan Pan, Atsutake Kosuge, Makoto Ikeda (The University of Tokyo, Japan)
Page	pp. 278 - 283
Keyword	YoLov3, Design Hackathon, FPGA, Vitis-AI, SoM
Abstract	This paper introduces a design hackathon approach and case study that allows beginners in hardware design to compete in terms of performance and accuracy using YOLO, a representative object detection method. Rather than optimizing the hardware itself, the focus is on employing system-level optimization techniques on a low-cost KV260 board. Even students from non-technical backgrounds were able to achieve several-fold performance improvements with the help of tools like ChatGPT.

R4-8

Title	A Study of Image Classifier Combining In-pixel Array Operations and Digital Matrix Operations in Image Sensors
Author	*Takeshi Enomoto, Kota Imagawa, Kota Yoshida, Shunsuke Okura (Research Organization of Science and Engineering, Ritsumeikan University, Japan)
Page	pp. 284 - 289
Keyword	CMOS image sensor, image classification, systolic array, convolutional neural networks
Abstract	Toward the IoT era, where a vast number of sensors and AI-driven analysis are deployed, this study proposes an on-chip image classification system that integrates lightweight neural networks within CMOS image sensors (CIS). The system combines in-pixel and in-column analog convolution with digital matrix computations, enabling data processing directly at the sensor level. This approach reduces communication overhead and system power consumption compared to MCU-based classification, thereby enhancing efficiency and real-time performance. To evaluate the feasibility of the proposed on-chip image classification system, we conducted the matrix computation circuit area and accuracy through software simulations. In the circuit area evaluation, the matrix computation circuit was designed to occupy less than 10% of the total area of the CIS chip. The matrix computation circuit supports up to 10 classification classes for a 1:1 aspect ratio image when the PE array column count is 8 or 16, and up to 5 classes with 32 columns. For a 1:2 aspect ratio image and 2-class classification, 32 columns PE can be supported. Under a given area constraint, the image classification accuracy for the MNIST, Fashion-MNIST, and INRIA-Person datasets reached 88.75%, 79.91%, and 83.79%, respectively.

R4-9

Title	LMESN: A Low-Power Hardware Reservoir Computing Architecture Based on MOSFET Leakage Variation
Author	*Masami Utsunomiya, Hiroya Murata, Ryuto Seki (Kyoto University, Japan), Haoyuan Li (Xi'n Jiaotong University, China), Hiromitsu Awano, Takashi Sato (Kyoto University, Japan)
Page	pp. 290 - 295
Keyword	Reservoir Computing, Echo State Network, Edge Computing, Leakage Current, Analog Computing
Abstract	We propose LMESN, a low-power hardware reservoir computing architecture that leverages variations in MOSFET leakage currents to perform analog computation. In LMESN, input values and internal states are encoded as pulse widths, and computation is achieved by discharging capacitors through the leakage currents of MOSFETs. By eliminating the analog peripheral circuits required in conventional MOSFET-based Echo State Networks (ESNs), such as operational amplifiers, LMESN achieves a simplified design with improved energy efficiency. Each circuit component is designed and validated using SPICE simulations based on a commercial 22 nm process, and the extracted characteristics are incorporated into a Python-based inference model via lookup tables. Evaluation experiments on two time-series classification datasets, JapaneseVowels and PenDigits, demonstrate that LMESN achieves classification accuracy comparable to software-based ESNs. Power consumption estimates highlight that cell leakage currents are the primary contributors to energy use, underscoring the importance of leakage control for further power reduction. Overall, LMESN offers a promising hardware platform that balances miniaturization, low power consumption, and competitive inference performance, making it well-suited for real-time analog time-series processing in edge devices.

R4-10

Title	Comparison of Latch-Based Circuit and Flip-Flop-Based Circuit in Actual Device
Author	*Kenji Takahashi, Tadaaki Tanimoto, Keizo Hiraga, Masayuki Hayashi, Takato Inoue, Kazuhiro Bessho, Toshimasa Shimizu (Sony Semiconductor Solutions Corporation, Japan)
Page	pp. 296 - 301
Keyword	Latch, Power consumption, Maximum operating frequency characteristic, Minimum operating voltage characteristic, Actual device measurement
Abstract	This paper reports the comparison results of current consumption, maximum operating frequency characteristics (Fmax), and minimum operating voltage characteristics (Vmin) of latch-based and flip-flop-based circuits. The latch-based circuits, under certain conditions, consume less current, have a higher Fmax at the same voltage, and a lower Vmin at the same frequency. These results show that latch-based circuits can reduce power at the same frequency as flip-flop-based circuits.

R4-11

Title	Wafer to Lot-level S-parameter Prediction in Radio Frequency Testing Using Radial Basis Function Neural Network
Author	*Huimin Wang, Yasuhiko Iguchi, Chika Tanaka (Kioxia Corporation, Japan)
Page	pp. 302 - 305
Keyword	S-parameter, Radial Basis Function Neural Network, Machine Learning, RF test
Abstract	To reduce the production testing costs without sacrificing quality, a wafer to lot-level performance prediction has gained traction as a key enabler for production tests. Although many effective prediction methods for physical properties have been proposed, there are a few methods that can predict S-parameters accurately. In this study, we propose a novel S-parameter prediction method based Radial Basis Function neural network to make the accurate predictions using a minimal data set.

R4-12

Title	Striking Force Estimation on a Punching Bag Using IMU and Computer Vision
Author	Tsung-Han Lai, Ming-Qi Hsu, Yi-Ting Li, Wuqian Tang, Yun-Ju Lee (National Tsing Hua University, Taiwan), Yung-Chih Chen (National Taiwan University of Science and Technology, Taiwan), Wen-Hsin Chiu, *Chun-Yao Wang (National Tsing Hua University, Taiwan)
Page	pp. 306 - 311
Keyword	boxing, inertial measurement unit, computer vision
Abstract	We propose a striking force estimation system for punching bag training that combines an inertial measurement unit (IMU) and a vision-based pipeline to extract angular acceleration and impact location. A machine learning model is trained using data from a pneumatic impact device. Experimental results show that our method achieves accurate and robust predictions without interfering with the athlete’s movement, making it suitable for real-world training scenarios.

R4-13

Title	Improving Bokeh Simulation on CPUs: Faster Inference and Better Perception
Author	*Chia-Lin Chang, Hao-Cheng Hsu, Cen-En Jian, Yu-Hui Huang (Department of Electrical Engineering, Yuan Ze University, Taiwan)
Page	pp. 312 - 313
Keyword	Bokeh Simulation, CPU Inference, Model Optimization
Abstract	We present a set of lightweight optimization techniques to accelerate bokeh simulation on CPUs while improving perceptual quality. By integrating XLA compilation, multiprocessing, edge-aware interpolation, and weight pruning, our method achieves a 7.3% speedup over the baseline and improves both PSNR and SSIM. The results demonstrate practical value for CPU-only deployment scenarios.

R4-14

Title	Cross-Modal Quantization of BLIP-2 Using Activation-Aware Weight Quantization
Author	Hui-Yun Deng, Chia-Yun Chiang, *Yu-Hui Huang (Yuan Ze University, Taiwan)
Page	pp. 314 - 315
Keyword	Activation-aware Weight Quantization (AWQ), BLIP-2, Cross-modal Inference, Vision-Language Models, Edge AI Deployment
Abstract	This work explores the extension of Activation-aware Weight Quantization (AWQ) to the multimodal BLIP2 architecture, encompassing both the language and vision components. We apply AWQ to the OPT-2.7B model and adapt it for the EVA-ViT-G vision encoder by selectively quantizing only the Value projection within the fused QKV attention layers. Our method aims to enable efficient inference on edge devices with limited memory capacity. Evaluation on the COCO VQAv2 dataset shows that while AWQ effectively reduces memory usage and preserves reasonable accuracy for the language module, applying full quantization to both components yields greater efficiency but at the cost of accuracy, in terms of memory consumption, inference latency, and performance. This study highlights the trade-offs and design considerations involved in deploying cross-modal quantization strategies on resource-constrained hardware.

[To Session Table]

SASIMI 2025 The 26th Workshop on Synthesis And System Integration of Mixed Information Technologies Technical Program

Session Schedule

List of papers

SASIMI 2025
The 26th Workshop on Synthesis And System Integration of Mixed Information Technologies
Technical Program