(Go to Top Page)

SASIMI 2025
The 26th Workshop on Synthesis And System Integration of Mixed Information Technologies
Technical Program

Remark: The presenter of each paper is marked with "*".
Technical Program:   SIMPLE version   DETAILED version with abstract
Author Index:   HERE

Session Schedule

Thursday, October 9, 2025

Registration
8:30 -
Op  Opening
9:00 - 9:20
K1  Keynote Speech I
9:20 - 10:30
R1  Regular Poster Session I
10:30 - 12:00
Lunch Break
12:00 - 13:30
I1  Invited Talk I
13:30 - 14:30
R2  Regular Poster Session II
14:30 - 16:00
D  Panel Discussion
16:00 - 17:30
Banquet
18:00 - 20:00
Friday, October 10, 2025

K2  Keynote Speech II
9:20 - 10:30
R3  Regular Poster Session III
10:30 - 12:00
Lunch Break
12:00 - 13:30
I2  Invited Talk II
13:30 - 14:30
R4  Regular Poster Session IV
14:30 - 16:00
Cl  Closing
16:00 -


List of papers

Remark: The presenter of each paper is marked with "*".

Thursday, October 9, 2025

[To Session Table]

Keynote Speech I
Time: 9:20 - 10:30, Thursday, October 9, 2025

K1-1 (Time: 9:20 - 10:30)
Title(Keynote Speech) Verification Tools Should Certify Their Results
AuthorRandal. E. Bryant (Carnegie Mellon University, USA)


[To Session Table]

Regular Poster Session I
Time: 10:30 - 12:00, Thursday, October 9, 2025

R1-1
TitleModeling of Dynamic Input Capacitance in Trench-Gate SiC MOSFETs via Voltage-Dependent Gate Oxide Capacitance Partitioning
Author*Taiki Nishioka, Kazuki Matsumoto, Hajime Takayama (Kyoto Institute of Technology, Japan), Jun Furuta (Okayama Prefectural University, Japan), Kazutoshi Kobayashi, Michihiro Shintani (Kyoto Institute of Technology, Japan)
KeywordSiC, Trench, Gate Capacitance, Model, MOSFET
AbstractThis paper presents a dynamic input capacitance model for trench-gate silicon carbide (SiC) MOSFETs that accounts for voltage-dependent partitioning between the gate-to-source (Cgs) and gate-to-drain (Cgd) capacitances. Unlike conventional models, which assume static boundaries derived from planar device assumptions, the proposed model introduces a dynamically varying boundary governed by instantaneous changes in gate-source voltage This paper presents a dynamic input capacitance model for trench-gate silicon carbide (SiC) MOSFETs that accounts for voltage-dependent partitioning between the gate-to-source (Cgs) and gate-to-drain (Cgd) capacitances. Unlike conventional models, which assume static boundaries derived from planar device assumptions, the proposed model introduces a dynamically varying boundary governed by instantaneous changes in gate-source voltage Vgs and gate-drain voltage Vgd. The model is developed based on transient electric field behavior extracted from technology computer-aided design (TCAD) simulations, which reveal time-dependent shifts in the internal gate oxide field boundary during switching transitions. Experimental validation is conducted using gate charge measurements obtained from a double pulse test (DPT), with model fitting performed across a two-dimensional voltage space. Compared to the conventional static-boundary model, the proposed model achieves improved accuracy, reducing the relative error against the measurements. Furthermore, it successfully captures the voltage dependence of Cgs, thereby offering enhanced physical fidelity for circuit simulation and the design of trench-gate SiC MOSFETs. gs and gate-drain voltage Vgd. The model is developed based on transient electric field behavior extracted from technology computer-aided design (TCAD) simulations, which reveal time-dependent shifts in the internal gate oxide field boundary during switching transitions. Experimental validation is conducted using gate charge measurements obtained from a double pulse test (DPT), with model fitting performed across a two-dimensional voltage space. Compared to the conventional static-boundary model, the proposed model achieves improved accuracy, reducing the relative error against the measurements. Furthermore, it successfully captures the voltage dependence of Cgs, thereby offering enhanced physical fidelity for circuit simulation and the design of trench-gate SiC MOSFETs.

R1-2
TitleA Detailed Analysis of LLM Execution on IMAX3 and Initial Evaluation of IMAX4 Prototype for Server Environment
Author*Takuto ANDO, Yu ETO, Yasuhiko NAKASHIMA (Nara Institute of Science and Technology, Japan)
KeywordLLM, IMAX, CGRA, FPGA
AbstractIn this paper, we present a detailed analysis of the IMAX3 CGRA-based accelerator for energy efficiency and the IMAX4 prototype with an upgraded host CPU.To address the host CPU bottlenecks of its predecessor, IMAX3, IMAX4 incorporates a server-oriented Intel Xeon processor and PCIe Gen5 connectivity, realizing IMAX scalability. We implemented and evaluated the IMAX4 prototype using microbenchmarks and the LLaMA3 8B quantized model. The results demonstrate significantly reduced host-side overheads and improved data transfer compared to IMAX3. While performance characteristics vary by quantization, IMAX4 shows promise, shifting bottlenecks from the host to data pathways and core performance, indicating potential for CGRA in server-based LLM acceleration.

R1-3
TitleGStreamer-integrated HLS-based JPEG Encoder for Edge FPGA SoCs
Author*Yuri Guimaraes Pereira Primo da Silva, Shinya Honda (Nagoya University, Japan), Sugako Ootani (Renesas, Japan), Masato Edahiro (Nagoya University, Japan), Abraham Monrroy Cano (MapIV, Japan)
KeywordHardware Acceleration, JPEG Encoding, Vitis HLS, GStreamer, Kria KV260
AbstractWith the rise in ADAS (Advanced Driver Assistance System) and autonomous driving, there is a demand for lightweight image compression on edge devices. Although many image compression FPGA accelerators have been designed to provide high performance and low power consumption for data centers, they often do not meet the resource constraints of edge devices. This paper presents a hardware-accelerated JPEG encoder implemented using Vitis HLS on a AMD KV260 board for edge processing. We integrate the encoder with GStreamer, a multimedia framework that allows for easy integration with existing Linux tools. The system achieved the capture and encoding of 1080p NV12 images from a MIPI camera at 19.5 fps, demonstrating low variation when compared to CPU-based encoding, even under heavy load, and thus being suited for low-latency applications.

R1-4
TitleAutoPre-ACM:Autoencoder Based Precision-Enhanced Anomaly Detection For Cyber-Physical Attacks in MEDA Biochips
Author*Purrnima Singh, Yash Gupta (Netaji Subhas University of Technology, New Delhi, India), Syed Rameem Zahra (Sher-e-Kashmir University of Agricultural Sciences and Technology, Kashmir, J&K, India), Ankur Gupta (Netaji Subhas University of Technology, New Delhi, India), Shigeru Yamashita (Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu, Shiga 525-8577, JAPAN., Japan)
KeywordDeep Learning, Lab-on-a-chip Security, Actuation Tampering Attacks, Cyber Physical System, MEDA biochip
AbstractAddressing critical cyber-physical security concerns in MEDA biochips, an autoencoder-based anomaly detection framework is presented. While actuation tampering attacks pose significant threats, existing detection methods often struggle with operational noise, resulting in false alarms. Our approach leverages a 1D CNN autoencoder to learn complex normal patterns, enabling robust differentiation between benign system variations and malicious attacks. This demonstrates superior precision and accuracy, significantly enhancing the reliability and trustworthiness of digital microfluidic biochip operations.

R1-5
TitleBinary Synthesis from ARM Machine Code Using a General-Purpose High-Level Synthesis System
Author*Yuga Sugimoto, Nagisa Ishiura (Kwansei Gakuin University, Japan)
KeywordBinary Synthesis, High-Level Syntesis, ARMv6, Thumb
AbstractThis paper presents the implementation of a binary synthesis system that utilizes a generalpurpose high-level synthesis (HLS) system as a backend, starting from ARM machine code. Binary synthesis is a technique that generates a hardware design description functionally equivalent to the CPU executing the original machine code program. However, implementing such a system for each instruction set architecture incurs a high development cost. Nakado has proposed a method for easily implementing binary synthesis systems by converting machine code programs into C programs, which are then synthesized into hardware using a general-purpose HLS system. Based on this approach, we implement a binary synthesis system that generates hardware design descriptions from machine code programs using the ARMv6 instruction set. ARMv6 includes features such as program counter-relative addressing for immediate load instructions and conditional execution based on flags. Additionally, division and modulo operations are typically handled by runtime libraries, which increase circuit size and execution time; to address this, we convert runtime library calls directly into division/modulo instructions. The proposed binary synthesis system is implemented in Python. Experimental results using machine code programs obtained from several C programs show that, compared to hardware generated directly from the original C programs, the circuit size increased by a factor of approximately 1.3 to 5.5, while the delay remained nearly the same.

R1-6
TitleQuantification of Design Difficulty of Analog Circuits Based on Volume of Effectual Design Space
AuthorRiku Anan, *Akira Tsuchiya, Toshiyuki Inoue, Keiji Kishine (The University of Shiga Prefecture, Japan)
Keywordanalog circuit, design automation
AbstractThis paper proposes a method to quantify difficulty level of analog circuits. Analog circuits have many design parameters and performance measures, so it is difficult to set adequate target specification as well as circuit design. However, there is less discussion how to evaluate the design difficulty level quantitatively. We focus on the volume of design space where the performance meets the target specification. The proposed method calculates the volume of design space by using linear classifiers and slack variables. Numerical experiments shows that the effective volume of design space can be used the measure of design difficulty level.

R1-7
TitleFlow-based Augmented Droplet Routing Algorithm for MEDA-Based DMFB
Author*Emuun Purevdagva, Masayuki Shimoda, Satoshi Tayu, Atsushi Takahashi (Institute of Science Tokyo, Japan)
KeywordMEDA biochip, Droplet routing, Flow network problem
AbstractMicro Electrode Dot Array-based Digital Microfluidic Biochip (MEDA) is a promising platform in biological and medical applications, offering high flexibility in droplet-based operations such as disease diagnosis, DNA analysis, and PCR testing. To fully exploit its capabilities, fast and accurate droplet routing is essential. In this work, we formulate droplet transportation in MEDA as a flow network problem and propose a high-speed algorithm that efficiently finds a minimum-time routing with low computation cost. The algorithm first determines the shortest routing time using maximum flow search, and then refines the solution using minimum-cost flow optimization to reduce redundant movements. The proposed algorithm enables efficient handling of large-scale problems, contributing to improved experimental efficiency of MEDA biochips.

R1-8
TitleEnhancing Hardware Trojan Detection via ATPG-Based Transition Delay Fault Testing with Split Fault Lists
Author*Asuka Koike, Yutaka Masuda, Tohru Ishihara (Graduate School of Informatics, Nagoya University, Japan)
Keywordhardware trojan, automatic test pattern generation, transition delay fault
AbstractRecent advances in LSI design have shifted development from vertically integrated models to horizontally specialized models involving multiple companies. Although cost-effective, this approach increases the risk of Hardware Trojan (HT) insertion by untrusted third parties. HTs are malicious circuit modifications that trigger under rare conditions, causing abnormal behavior or information leakage. This work presents an HT detection method based on post-silicon power analysis. The method tackles two key challenges: (1) the difficulty of activating HTs, and (2) the relatively small power consumption of HT circuits, which makes it difficult to distinguish their effects from power variations due to process variation. To address the first challenge, the proposed method employs automatic test pattern generation (ATPG) to comprehensively activate transition delay faults (TDFs) in a clean netlist, thereby increasing the likelihood of HT activation. To tackle the second challenge, our method partitions the TDF list to reduce the simultaneous activation of non-HT logic gates and better highlight the power dissipation associated with the target faults. Experimental evaluations using the Trust-Hub AES-T2500 benchmark demonstrate that the proposed method can effectively reveal relative power differences caused by HTs.

R1-9
TitleVoltage and Frequency Dependence of Single Event Transient Induced by Alpha-Particle
Author*Arata Matsumoto, Haruto Sugisaki, Ryuichi Nakajima (Kyoto Institute of Technology, Japan), Jun Furuta (Okayama Prefectural University, Japan), Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan)
KeywordSoft error, Single Event Transients, Single Event Upsets, Frequency Dependence, Dynamic Irradiation
Abstractsoft error on a semiconductor chip is a temporary malfunction in a latch or flip-flop (FF). Soft errors caused by Single Event Transients (SETs) become more significant at high clock frequencies over GHz. This study investigated the frequency dependence of SETs using a test circuit that eliminates the influence of Single Event Upsets (SEUs). By irradiating the circuit with alpha particles while the clock was running, soft error rates (SERs) were measured. A linear increase in SERs was observed below 700 MHz.

R1-10
TitleEvaluating Signal Integrity in InFO Package via Learning-based Methods
Author*Yi-Hua Yeh, Haoru Chang, Hung-Ming Chen, Chien-Nan Liu (NYCU, Taiwan)
KeywordSI, InFO package, Learning-based methods
AbstractThis work investigates further signal integrity issues in the InFO package, focusing on high-bandwidth memory. Ensuring signal integrity is crucial in layout design to accommodate the increasingly complex modern circuit designs, particularly in high-speed signal transmission. This research proposes regression prediction models based on CatBoost, XGBoost, and LightGBM to predict the values of signal quality indicators such as eye height and eye width, providing a methodology to assess signal integrity. In this work, we use the systematic method of multi-layer modeling and angle simulation, signal integrity in design has been quantitatively characterized and optimized. Such a framework demonstrates high accuracy in evaluating and optimizing signal integrity of InFO packages through rapid evaluation.

R1-11
TitleA High Precision Heuristic for Motif Extraction Using Random Forests
AuthorJigen Murata, Masato Inagi, Martin Lukac, Shin'ichi Wakabayashi, *Shinobu Nagayama (Hiroshima City University, Japan)
KeywordMotif extraction problem, heuristic method, Gibbs sampling method, Random Forest, string classification
AbstractThe motif extraction is a problem to find similar substrings, called motifs, from many DNA sequences. The Gibbs sampling is the best known heuristic for this problem, but it has a disadvantage that it tends to fall into a local optimum because solution search space is limited due to its greedy search. This paper proposes a heuristic that uses Random Forests to classify substrings and performs a wide range of solution search. Since the proposed heuristic improves precision of solutions significantly, this paper also proposes a hybrid method combining the Gibbs sampling with the Random Forest based method to reduce computation time while keeping the precision of solutions.

R1-12
TitleSizing Transformation Technique for Analog Design Migration Across Different Technologies
AuthorYao-Cheng Wu, King-Ho Wong, *Chien-Nan Jimmy Liu (Institute of Electronics, National Yang Ming Chiao Tung University, Taiwan), Chia-Tseng Chiang (Richtek Technology Corporation, Taiwan)
KeywordTechnology Node Transfer, Sample Transformation, Initial Sizing, Analog Circuit Design Automation, Transistor Sizing
AbstractWhile migrating to new technology, analog circuits often require a complete redesign due to different circuit parameters, which is a time-consuming process. This work proposes a quick sizing transformation technique to obtain the corresponding sizing of each device in new technology without simulation. Based on the assumption that the node voltages and operation region of each transistor remain the same after migration, the proposed approximation method obtains a new circuit based on previous comparison of an unit-sized transistor between different technologies. Although this approach cannot guarantee an optimized solution, it can help to bypass the global exploration in evolutionary optimization and significantly reduce the runtime by up to 65%, as shown in the experimental results.

R1-13
TitleEqualizing QAM Waveform Distortion with Linear SVM Classifier and its Machine Learning Dataset Generation
Author*Yiwei Liu, Yukina Haruta, Yutaka Masuda, Tohru Ishihara (Nagoya University, Japan)
KeywordQAM, Waveform distortion, Machine learning, SVM classifier, Dataset generation
AbstractMulti-level modulation formats such as quadrature amplitude modulation (QAM) have been widely used in modern wideband communications. QAM waveforms are distorted by various influences during communication. To learn the distortion trends in the QAM waveform accurately, this paper first focuses on generating datasets for QAM communications. The paper then proposes a method for equalizing QAM waveform distortion with a support vector machine (SVM) classifier which identifies the received QAM code based on pre-learned distortion trends. Simulation results obtained for our SVM classifier designed as a dedicated digital circuit demonstrate that the proposed method reduces the computational cost by 80% compared to existing methods that achieve the similar classification accuracy. We also confirmed that the classification accuracy was improved from 96.4% to 99.0% at 1.4 times the computational cost compared to one of the simplest existing SVM classifiers.

R1-14
TitleA Hardware Design Environment for ROS2 Node to FPGA-Integrated SoC
AuthorXingze Li (Nagoya University, Japan), *Ryota Yamamoto (National Institute of Technology, Tomakomai College, Japan), Shinya Honda (Nagoya University, Japan)
KeywordHigh Level Synthesis, ROS2, Co-design
AbstractAs robotic applications increasingly demand both high performance and energy efficiency, FPGA-integrated system-on-chip (SoC) platforms have emerged as a promising solution for offloading intensive workloads. Despite their potential, integrating hardware (HW) accelerators with ROS2-based applications remains a significant difficulties, often requiring deep domain expertise. In this paper, we present CWB2ROS, a development framework that enables seamless integration between ROS2 nodes and HW modules synthesized via high-level synthesis (HLS) using Cyber Work Bench (CWB). CWB2ROS supports core ROS2 communication models, including Publish/Subscribe, Service and Parameters, and automates the generation of necessary software wrappers. Experimental results on the Kria KR260 platform demonstrate that CWB2ROS achieves competitive communication latency and offers greater flexibility compared to existing solutions.


[To Session Table]

Invited Talk I
Time: 13:30 - 14:30, Thursday, October 9, 2025

I1-1
Title(Invited Talk) Towards Emerging Device Computing for the Post-Moore Era
AuthorKoji Inoue (Kyushu University, Japan)


[To Session Table]

Regular Poster Session II
Time: 14:30 - 16:00, Thursday, October 9, 2025

R2-1
TitleOptimal Golomb Coding via Dynamic Programming and Its Application on Large Language Model Compression
AuthorJen-Hung Yang, Zhi-Kai Xu, *Juinn-Dar Huang (National Yang Ming Chiao Tung University, Taiwan)
KeywordGolomb coding, Large language models, model compression, machine learning, AI
AbstractLarge language models (LLMs) pose computational and storage challenges for resource-constrained devices. Lossless compression is preferred over lossy methods like quantization. Huffman coding, though optimal, is hardware-inefficient. We propose optimal Golomb coding, a lossless scheme enhancing exp-Golomb’s hardware-friendly prefix structure with distribution-dependent suffix optimization.It improves compression by 12.8% over exp-Golomb,nearing 0.21% of the theoretical limit, and compresses LLaMA3-70B’s INT8 weights by 56.08%, enabling practical LLM deployment on edge devices.

R2-2
TitleOn Ratio of Embedded RECON Spare Cell Types for Technology Remapping
AuthorYasuaki Nabetani, Nobutaka Kuroki, *Masahiro Numa (Kobe University, Japan)
KeywordECO
AbstractThis paper presents an approach to determine the ratio of RECON spare cell types embedded in the circuit to improve the feasibility of technology remapping using RECON cells to deal with post-mask ECOs through metal fixes in the LSI design process. The ratio of 2T/4T/6T-RECON spare cell types is determined by using the pre-calculated average number of RECON cells required for each possible modification pattern. These averages are weighed by the frequency of each pattern based on the number of cell inputs in the netlist. Experimental results have shown that the proposed approach improves the ECO success ratio by 1.3 to 2.0 pt, although reducing initial slack by 7 to 31%.

R2-3
TitleA Seamless Hardware/Software Switching Technique for Embedded Systems Using HDLRuby
Author*Lovic Gauthier, Sachi Yoshigai (National Institute of Technology, Ariake College, Japan)
KeywordHardware Description Language, Hardware/Software Co-Design, RTL Simulation, Co-Simulation
AbstractThis paper presents a new hardware/software (HW/SW) co-design technique for HDLRuby, a Register Transfer Level (RTL) Hardware Description Language. This technique enables automatic switching between HW and SW implementations for HDLRuby processes that describe finite state machines. It is designed to facilitate smooth HW/SW exploration and accelerated HW simulation. Experimental results show that, with this technique, simulation can be orders of magnitude faster than standard RTL simulation.

R2-4
TitleSpeeding Up a Routing Method Considering Droplet Splitting on MEDABiochips by Dijkstra’s Method
Author*Issei Nakamura, Shigeru Yamashita, Hiroyuki Tomiyama (Ritsumeikan University, Japan), Ankur Guputa (Netaji Subhas University of Technology, India)
KeywordMEDA, Optimal Routing, Droplet Split, Dijkstra’s Algorithm
AbstractRecently, a biochip called MEDA (Micro Electrode Dot Array) has been studied in the field of biochemistry. When conducting experiments using MEDAs, the droplets are moved to the sink point. In this operation, the droplet is moved to the sink point by selecting the optimal path. The time required to move a droplet to the sink point can be reduced by selecting an optimal path. This paper proposes a method to find the optimal path and timing of droplet splitting by applying the Dijkstra method to weighted graphs.

R2-5
TitleEMESN: an Extended MOSFET Reservoir Computing Architecture for Echo State Networks with Hardware-Software Co-Optimization
Author*Haoyuan Li (Xi'an Jiaotong University/Kyoto University, China), Masami Utsunomiya, Ryuto Seki, Takashi Sato (Kyoto University, Japan), Feng Liang (Xi'an Jiaotong University, China)
KeywordReservoir Computing, Echo State Network, MOSFET-based Hardware, Genetic Algorithm Optimization, Time-Series Classification
AbstractThis paper presents EMESN, an extended MOSFET hardware reservoir–computing architecture for time-series tasks. A pulse-based crossbar exploits intrinsic threshold-voltage variation to realize fixed random weights, while multi-mask mapping, ADC‐range tuning and genetic optimization jointly enhance inference accuracy. Evaluations on eight public datasets demonstrate accuracy gains of up to 10.4 points and a 5 times reduction in accuracy standard deviation, all with markedly lower static power than previous MOS-ESN designs.

R2-6
TitleDevelopment of Tsugaru Dialect Translation System Using Transparent Display
Author*Haruto Saito, Masashi Imai (Hirosaki University, Japan)
KeywordTranslation system, speech recognition, Artificial Intelligence, Tsugaru dialect, Docker
AbstractThe Tsugaru region of Aomori prefecture possesses a distinctive regional dialect known as Tsugaru-ben. The dialect can lead significant communication issues between local residents and people from outside the region. This paper presents the prototype construction of a Tsugaru dialect translation system utilizing a transparent display and its system architecture. The system consists of three main components including an automatic speech recognition module, a translation module from the Tsugaru dialect to standard Japanese, and a user interface module. Evaluation results demonstrate a complete processing pipeline capable of handling end-to-end translation within an average of 1.16 seconds, resulting in the first real-time Tsugaru dialect translation system.

R2-7
TitleAccelerated Behavioral Simulation for Optimizing MOSFET-Based Echo State Networks
Author*Ryuto Seki, Masami Utsunomiya (Kyoto University, Japan), Haoyuan Li (Xi'an Jiaotong University, China), Hiromitsu Awano, Takashi Sato (Kyoto University, Japan)
KeywordEcho State Network, MOSFET, Behavioral Simulation
AbstractThe Leakage-based MOSFET Echo State Network (LMESN) is a hardware reservoir computing architecture that achieves low power consumption by leveraging the subthreshold leakage current of MOSFETs, making it a promissing candidate for edge-oriented AI applications. In this paper, we propose a fast behavioral simulation method to accurately capture the transistor-level characteristics of LMESN. Using this method, we investigate how the inference accuracy of LMESN is influenced by two key factors: the bit width of the feedback signal and the device temperature.

R2-8
TitleA Design Method for Single-Rail LUT Cascades
Author*Tsutomu Sasao (Meiji University, Japan)
KeywordFPGA, cascade, LUT, decomposition
AbstractThis paper presents a method to realize logic functions by single-rail LUT cascades. Main results include: 1) Any 2m-variable function can be realized by a single-rail cascade with (m+1)-LUTs. The number of LUTs is at most 2^{m+1}-1. There exists a 2m+1 variable function that cannot be realized by the single-rail cascade with (m+1)-LUTs. 2) When a 2m-variable function has a functional decomposition f(X1,X2), where X1 and X2 have m variables, and the column multiplicity of the decomposition is \mu, the number of LUTs can be reduced to 2\mu-1. 3) Any n-variable function can be realized by a single-rail cascade with seven (n-1)-LUTs.4) Ad-hoc methods to realize a n-variable function using two or three (n-1)-LUTs.

R2-9
TitleImplementation of Interrupt Handlers in Full Hardware Implementation of RTOS-Based Systems
Author*Yuki Nakatani, Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama, Hiroyuki Kanbara (Ritsumeikan University, Japan)
KeywordRTOS, Handler, Full Hardware
AbstractThis paper proposes a method for imple menting interrupt handling in the context of a fully hardware-implemented RTOS-based system. To en hance the responsiveness of real-time systems, previ ous research by Oosako et al. has explored the com plete hardware implementation of RTOS function alities and tasks. Ando has proposed an architec ture that enables task hardwareization using general purpose high-level synthesis; however, this approach does not address interrupt handling. In this paper, we present a hardware implementation of alarm handlers, cyclic handlers, and interrupt handlers. Each handler is associated with a dedicated timer that counts down and triggers the handler when the counter reaches zero, thereby simplifying the control logic. The inter rupt handler is designed to accelerate activation by evaluating invocation conditions in parallel. Service calls related to handlers are implemented by updating or referencing specific status registers. Experimen tal results demonstrate that, in the proposed RTOS hardware, alarm and interrupt handlers can be trig gered within 1 and 2 cycles, respectively, after their conditions are satisfied. Furthermore, all service calls related to handlers can be executed within 5 cycles.

R2-10
TitleAn Error Diagnosis Technique Applicable to Single Line Errors Based on Location Variable Simulation
Author*Kazuki Sakamoto, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
KeywordError diagnosis, ECO
AbstractThis paper presents an error diagnosis technique applicable to single line errors based on the location variable (LV) simulation and the truth variable (TV) simulation. The LV-simulation, originally proposed for diagnosing functional errors, has been extended to diagnose signal line errors to efficiently diagnose both functional and single line errors. Experimental results have shown that the proposed technique reduces the processing time by 99.7% for the C7552 benchmark circuit and by 73.7% for the C5315 benchmark circuit compared to the conventional error diagnosis technique applicable to single line errors.

R2-11
TitleMulti-Objective Optimization of RESURF Structure in SiC MOSFET for I-V and C-V characteritics
Author*Sota Oyama (Hirosaki University, Japan), Ichirota Takazawa (Jedat Inc., Japan), Satoru Honda, Toshiki Kanamoto (Hirosaki University, Japan)
KeywordPower, SiC, MOS, Optimization, AI
Abstract-This paper proposes an optimization method for the vertical RESURF structure implemented in SiC trench MOSFETs, which are utilized as one of the most energy-efficient devices for automotive power modules. Regarding energy efficiency, the RESURF structure is introduced to enable operation at higher voltages. The most critical parameter of the RESURF structure is the thickness of the P-type vertical RESURF region surrounding the N-drift layer near the drain. We leverage artificial intelligence to optimize the thickness for the desired electrical characteristics. Our previous work searched for the thickness to meet the desired current-voltage (I-V) characteristics by applying simulated annealing. In this paper, we further enhance the method to also optimize for capacitance-voltage (C-V) characteristics. First, we model the relationship between the output characteristics and the layer thicknesses using machine learning. Based on the resulting model, we then employ simulated annealing to identify the optimal thicknesses that satisfy the desired I-V as well as C-V characteristics. Experimental results demonstrate that the proposed method successfully attains the Pareto-optimal front with respect to both I-V and C-V characteristics, achieving an adjusted coefficient of determination greater than 0.95 in the regression analysis.

R2-12
TitleA Systematic Hardware Solution for GDPR Compliance
AuthorYi-Chun Yang, *Ren-Song Tsay (National Tsing-Hua University, Taiwan)
KeywordAccess control, accountability, data ownership, GDPR, hardware security
AbstractThe increasing deployment of Internet of Things (IoT) devices presents significant challenges for compliance with the General Data Protection Regulation (GDPR) due to their inherent privacy concerns and often opaque operational nature. This paper introduces GDPR-Guard, a novel and systematic hardware-based solution embedded within IoT devices to ensure GDPR compliance. By shifting control from enterprises to users through a transparent "glass box" approach, GDPR-Guard enhances accountability and transparency by auditing the entire device lifecycle from manufacturing. This paper details the architecture and functionality of the GDPR-Guard hardware component, its integration into the device manufacturing process under supervisory authority (SA) oversight, and its mechanisms for enforcing consent-based access control and generating tamper-proof audit records. A proof-of-concept implementation and security/performance evaluations demonstrate the feasibility and effectiveness of GDPR-Guard as a systematic hardware solution for achieving GDPR compliance in IoT networks.

R2-13
TitleReducing Registers in Convolution Operation for Binarized Neural Networks with Register-Bridge LSI Architecture
Author*Jun Masuda, Kazuhito Ito (Saitama University, Japan)
KeywordMachine learning, BNN, LSI
AbstractConvolutional neural networks are widely used to implement machine learning such as image recognition. BNNs, which binarize data and convolution weights, are advantageous in terms of reducing power consumption and miniaturizing implementations. In this paper, a method to reduce the number of registers required to store data and weights in LSI implementations of BNNs is proposed using the register bridge architecture. The number of register bits was reduced by 44% compared to using the conventional architecture.

R2-14
TitleExtending the Single-Target Droplet Generation Method CoDOS to Multi-Target Synthesis
AuthorYusuke Igarashi, *Shigeru Yamashita (Ritsumeikan University, Japan)
KeywordCoDOS, Multi-Target
AbstractThis paper presents an enhanced method based on CoDOS, a droplet generation algorithm for DMFB biochips, to efficiently generate multiple droplet types. By optimizing droplet generation, our approach reduces reagent usage and operational costs. Compared to manual synthesis and dilution, our method offers improved automation and cost-effectiveness. Experimental results demonstrate approximately a 1% improvement in droplet efficiency



Friday, October 10, 2025

[To Session Table]

Keynote Speech II
Time: 9:20 - 10:30, Friday, October 10, 2025

K2-1 (Time: 9:20 - 10:30)
Title(Keynote Speech) New Optical Path Design Trend on the IOWN Global Forum Open All-Photonics Network: Background, Application, and Key Enablers
AuthorHideki Nishizawa (NTT Network Innovation Laboratories, Japan)


[To Session Table]

Regular Poster Session III
Time: 10:30 - 12:00, Friday, October 10, 2025

R3-1
TitleCross-Design Power Trace Prediction using Graph Neural Network
AuthorShih-Chun Lin, *Bo-Hao Haung, Yung-Chih Chen (National Taiwan University of Science and Technology, Taiwan), Wang-Dauh Tseng (Yuan Ze University, Taiwan)
KeywordPower estimation, machine learning, graph neural network
AbstractAccurate cycle-by-cycle power estimation plays a critical role in the early stages of chip design, facilitating power, performance, and area (PPA) optimization and accelerating time-to-market. Recently, machine learning (ML)-based methods have emerged to strike a balance between estimation speed and accuracy compared to traditional electronic design automation (EDA) tools. However, existing cycle-based ML approaches typically require model retraining for each new design, significantly limiting their general applicability and efficiency during early-stage design exploration. To overcome this challenge, this paper introduces a graph neural network (GNN)-based estimator specifically developed for gate-level cycle-by-cycle power prediction, aiming for cross-design generalizability. By encoding standard cell types derived directly from the design library into node embeddings, our proposed model effectively generalizes to unseen circuit designs without retraining. Experimental results demonstrate that our GNN-based estimator offers significantly faster cycle-by-cycle power estimation compared to commercial EDA tools, while maintaining a high level of accuracy. This enables more practical and efficient power analysis in modern VLSI design flows.

R3-2
TitleILP-Based Movable Layout Replacer for Standard Cells with Extending Metal Boundaries
AuthorYa-Chu Yang, Shih-Sian Tang, *Chen-Chen Yeh, Shao-Chien Lu, Hui-Lin Cho, Yu-Cheng Lin, Rung-Bin Lin (Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan)
KeywordStandard cell, library, integer linear programming
AbstractWe introduce a new standard cell architecture with extended metal boundaries, based on the ASAP7 PDK, to improve routability in complex designs. By integrating these cells with vertically shrunk counterparts and applying an layout replacer based on linear integer programming that allows cell movement, our method achieves an average routing overflow reduction of 4.08% and 13.75% for fixed and movable layout without any DRC violations.

R3-3
TitleCompact QUBO Formulation of Resource-Constrained Operation Scheduling in High-Level LSI Design
Author*Haruki Yamagishi, Takuto Kishimoto, Kazuhito Ito (Saitama University, Japan)
KeywordIsing model, QUBO, scheduling, LSI
AbstractResource-constrained operation scheduling in LSI design determines the start time of operation execution so as to satisfy precedence constraints, and is known to be an NP-hard combinatorial optimization problem. By formulating the operation scheduling problem as a QUBO model, it is possible to search for an optimal operation schedule through parallel solution of the QUBO model. This paper proposes a QUBO formulation for operation scheduling that reduces the number of variables. As a result, the number of variables was reduced by up to 65%, and the solving time was reduced by up to 81%.

R3-4
TitleEfficitent FPGA Implementation of Multiple-Input Adders Using Generalized Parallel Counter (6,0,7;5)
Author*Mugi Noda, Ryo Kanai, Nagisa Ishiura (Kwansei Gakuin University, Japan)
KeywordGPC, FPGA, neural networks, multi-input adders
AbstractThis paper proposes an efficient method for implementing multi-input adders on FPGAs, which are essential components in multipliers and neural networks, by hierarchically connecting 6-input 2-output adders. One major approach for FPGA implementation of multi-input adders involves constructing a tree of carry-save adders using Generalized Parallel Counters (GPCs) optimized through integer linear programming. However, many GPCs with a lowest-level input of 7 often consume two units of FPGA slices (basic FPGA components) resulting in reduced efficiency. In this research, we utilize the GPC (6,0,7;5), which achieves the highest bit reduction rate and can be implemented in a single slice only when the carry outputs are chain-connected. By cascading this GPC, we construct a 6-input, 2-output adder. These adders are then arranged into a carrysave tree structure to perform multi-input addition. Based on this method, we designed circuits to add m binary numbers of n bits for n = 16, 32, 64 and m = 16, 32, ..., 512, targeting the Xilinx 7 Series FPGA. The results demonstrates that, on average, the proposed method reduced the circuit area by 8.9%, the critical path delay by 6.7%. The time required for circuit construction to less than 0.001% compared to conventional methods.

R3-5
TitleGenetic Algorithm-based Layer-wise Adaptive Filter Pruning
AuthorTing-Yi Liu, Yi-Ting Li, Wuqian Tang (National Tsing Hua University, Taiwan), *Yung-Chih Chen (National Taiwan University of Science and Technology, Taiwan), Shih-Chieh Chang, Chun-Yao Wang (National Tsing Hua University, Taiwan)
Keywordmodel pruning, genetic algorithm, filter pruning
AbstractModel pruning reduces model size and inference cost by removing redundant weights, channels, or filters. However, existing filter pruning methods often rely on fixed heuristics and global criteria, limiting their adaptability. They also require manually set pruning ratios, which are inefficient to tune. To address these issues, we propose a genetic algorithm-based pruning approach that automatically determines layer-wise pruning ratios and criteria, enabling adaptive, data-driven compression without manual intervention.

R3-6
TitleEvaluation of Free-form Conversation Learning Effects in a Tsugaru Dialect Speech Recognition Model
Author*Akihiro Murakami, Masashi Imai (Hirosaki University, Japan)
KeywordTsugaru dialect, Speech recognition system, Free-form conversation, Artificial Intellitence
AbstractThe Tsugaru dialect, which is a regional vernacular of Aomori Prefecture, can pose communication challenges between local residents and individuals from outside the prefecture. We are conducting research aimed at developing a bidirectional speech and text conversion system between the dialects and standard Japanese utilizing artificial intelligences. This paper presents the results of evaluating the impact of training on spontaneous Tsugaru dialect speech data to improve automatic speech recognition accuracy.

R3-7
TitleHigh-Speed SIFT Descriptor Generation with 36 Small-Region Division and Logic-Synthesis Evaluation
Author*Ayumu Mitsumoto, Tetsuo Hironaka (Hiroshima City University, Japan)
KeywordSIFT, Acceleration, Image Processing, Architecture, Hardware
AbstractSIFT’s rotation operation is sequential and therefore becomes a bottleneck in hardware implementations. We divide the descriptor region into 36 small regions, compute orientation histograms independently, and sum them into 17 subregion histograms, enabling 36-way parallel processing. This enables fast execution and high matching accuracy under accuracy-oriented parameter settings. Furthermore, we present the RTL design of the feature descriptor generator and perform logic synthesis with FreePDK45, achieving up to 31.8 times speedup compared with the method proposed in previous research.

R3-8
TitleNumberlink Problem Variants Modeled after FPGA Routing Fabrics and their Solvers that Enumerate all the Solutions
Author*Ryohei Komi, Hiroyuki Ochi (Ritsumeikan University, Japan)
Keywordcombinatorial problem, zero-suppressed binary decision diagram, field programmable gate array, design space exploration of routing architecture
AbstractIn this study, we define numberlink problem variants that mimic the routing fabrics of FPGAs, and develop solvers that enumerate all their solutions. The target FPGA architectures are early SRAM-based FPGAs and via-switch FPGAs. The existing method, which uses a top-down ZDD construction method (TdZdd), efficiently enumerates all solutions to the numberlink problem; however, it is specialized for planar grid-based routing problems. In this study, we extend the algorithm to multi-layer problems, targeting actual FPGAs where horizontal and vertical segments may overlap without intersections.

R3-9
TitleImplementation and Evaluation of a Speculative Execution-Based FPGA Accelerator for Electronic Circuit Simulation Using Gauss-Jordan and BiCGSTAB Methods
Author*Yuma Omoto, Atsushi Kubota, Tetsuo Hironaka (Hiroshima City Univercity, Japan)
KeywordFPGA, Electronic Circuit Simulator, Speculative Execution, GJE Method, BiCGSTAB Method
AbstractThis paper proposes a speculative execution system on FPGA to accelerate circuit simulation by combining Gauss–Jordan Elimination (GJE) and the BiCGSTAB method. Both solvers run in parallel, and the first to converge is adopted to reduce latency. Experiments on 20×20, 40×40, 80×80, and 160×160 matrices measured latency and resource use to evaluate scalability. A prototype on a Xilinx ZCU104 FPGA was further tested with a 20×20 circuit. Results show that GJE is effective for small problems, while BiCGSTAB achieves higher efficiency for larger dimensions and certain conditions, confirming the benefit of the proposed speculative execution approach.

R3-10
TitleEfficient and Accurate SC Arithmetic Circuits Using Bit Manipulation Based on Interval Partitioning of Bit Strings
Author*Yota Yanagida, Shigeru Yamashita (Ritsumeikan University, Japan)
Keywordstochastic computing, bit-shuffling
AbstractStochastic computing (SC) is an approximate computation method using Stochastic Numbers (SNs), which are expressed based on the probability of 1s in a bitstream. Generating low-correlated SNs generally requires the use of independent Linear Feedback Shift Registers (LFSRs), which increases circuit area. In this paper, an SC arithmetic circuit using a single LFSR and bit-shuffling with bit manipulation based on interval partitioning is proposed. The proposed method is applied to an SC arithmetic circuit and the computational accuracy and circuit area are evaluated. The proposed method outperforms the conventional method using independent LFSRs in both computational accuracy and circuit area for SC arithmetic circuits.

R3-11
TitleLogic Gate Design Using Vertical Nanowire Transistors
Author*Genta Nakamura (Kyushu University, Japan), Katsuhiro Tomioka (Hokkaido University, Japan), Koji Inoue (Kyushu University, Japan)
KeywordLogic Gate Design, TFET, Novel Device, Low Power Comsumption
AbstractReducing power consumption of VLSI systems has become increasingly important. Although power consumption can be reduced by lowering the supply voltage, it is difficult to lower the supply voltage of conventional MOSFETs below 0.6V. In this study, we propose structure of logic gates using a novel device, VGAA-TFET (Vertical Gate-All-Around Tunnel Field Effect Transistor), which is capable of operating at supply voltages lower than the minimum voltage limit of conventional MOSFETs and evaluate the proposed structures in terms of area and wire-length.

R3-12
TitleSubitizing-Inspired Large Language Models for Floorplanning
AuthorChen-Chen Yeh, *Shao-Chien Lu, Hui-Lin Cho, Yu-Cheng Lin, Rung-Bin Lin (Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan)
Keywordfloorplanning, large language model, electronic design automation
AbstractWe present a novel approach to solving the floorplanning problem by leveraging fine-tuned Large Language Models (LLMs). Inspired by subitizing, the human ability to instantly count small numbers of items at a glance, we hypothesize that LLMs can similarly address floorplanning challenges accurately. Our experimental results demonstrate that LLMs achieve high success and optimal rates while attaining relatively low average dead space. These findings underscore the potential of LLMs as promising solutions for complex optimization tasks in VLSI design.

R3-13
TitleReinforcement Learning-Based Loop Optimization Using the Polyhedral Model
Author*Hayato Takahashi, Motoki Amagasaki, Masato Kiyama, Kenshu Seto, Mery Diana (Kumamoto University, Japan)
KeywordHig-Level Synthesis, Polyhedral Model, Loop Optimization, Reinforcement Learning
AbstractIn current high-level synthesis technology, code optimization is often required in order to improve the performance of the generated hardware. A program typically contains very many instructions in loops and especially nested loops, thereby degrading its performance considerably, so it is necessary to execute nested loops efficiently. As a formal framework for the structure of a program, the polyhedral model facilitates the construction of a complete nested loop. However, it remains challenging to obtain an optimal solution that minimizes the number of instruction executions.Therefore, this paper reports to use of reinforcement learning to search effectively for the optimal fully nested loop. Performance evaluation is conducted by analyzing the number of instructions executed in the generated code and the circuit synthesized by the high-level synthesis tool.

R3-14
TitleEvaluation of FPGA Development Boards in a Cryogenic Environment
Author*Tomoki Takashima, Masashi Imai (Hirosaki University, Japan)
KeywordSuperconducting quantum computer, FPGA, Cryogenic environment, Asynchronous circuit
AbstractThe practical application of very large-scale superconducting quantum computers is highly anticipated. Although it has been proposed that the use of FPGAs in cryogenic environments for their control circuits, the feasibility has not been sufficiently evaluated. This study evaluates the performance of FPGA development boards under cryogenic conditions. In addition, the effectiveness of using asynchronous circuits is also evaluated to address changes in circuit characteristics under cryogenic conditions. As a result, it can be observed that the FPGA development boards do not operate normally at 4K. This paper presents the causes of failure and possible countermeasures based on changes in power consumption and delay time prior to the failure.


[To Session Table]

Invited Talk II
Time: 13:30 - 14:30, Friday, October 10, 2025

I2-1 (Time: 13:30 - 14:30)
Title(Invited Talk) A Symbolic Approach to Exact Quantum Circuit Simulation and Verification
AuthorJie-Hong Roland Jiang (National Taiwan University, Taiwan)


[To Session Table]

Regular Poster Session IV
Time: 14:30 - 16:00, Friday, October 10, 2025

R4-1
TitleCombatting Transient Errors and Aging in Heterogeneous Multicores: A Framework for Reliable and Energy-Efficient Task Deployment
AuthorYin-Rong Zhuo, *Yu-Guang Chen (National Central University, Taiwan), Zheng-Wei Chen (National Taiwan University, Taiwan), Ing-Chao Lin (National Cheng Kung University, Taiwan)
KeywordHeterogeneous multicore systems, aging effects, DVFS, energy efficient
AbstractCMOS advancements allow ICs to handle critical tasks, but balancing performance and energy efficiency challenges edge devices. Heterogeneous multicore systems optimize energy while meeting performance demands. Yet, transient errors and aging effects degrade reliability. Strategies like task replication and DVFS address these, but poor integration shortens lifespan. We introduce a reliability-aware task deployment framework to extend lifespan, ensure reliability, and reduce energy. Experiments show a 4.98× lifespan increase and 44% energy reduction compared to prior methods.

R4-2
TitleHW/SW Co-Design for Efficient GPT-2 Inference on FPGA via High-Level Synthesis
AuthorShao-Tang Sung, Yi-Wen Tang, Fen-Yu Hsieh, Rong-Yi Lin, *Fang-Yu Hsu, Chih-Tsun Huang (National Tsing Hua University, Taiwan)
KeywordHigh-Level Synthesis, Large Language Model, GPT-2, Hardware, FPGA
AbstractThis paper presents an efficient hardware accelerator for GPT-2 inference on an AMD Alveo U280 FPGA using High-Level Synthesis (HLS). With row-wise GEMM scheduling, we optimize GPT-2 components via data packing, loop unrolling, and kernel fusion. Our design achieves a 2.16x speedup over CPU while consuming only 23% of the power, demonstrating a scalable and sustainable solution for deploying Transformer-based models in resource-constrained environments.

R4-3
TitleOptimization of Power, Area, and Slack via Multi-bit Flip-Flop Generation
Author*Chi Hsu, Yi-Ting Li, Woei-Haur Hung, Chun-Yao Wang, Ting-Chi Wang (National Tsing Hua University, Taiwan)
KeywordMulti-bit Flip-flop, VLSI Optimization, Timing Optimization, Power-Area Trade-off
AbstractPower and area are key objectives to minimize in modern circuit design. Multi-bit flip-flops (MBFFs) are commonly used to reduce clock load and layout area but may increase total negative slack (TNS) and degrade timing. Balancing these metrics remains a critical challenge. We propose an approach that dynamically adjusts banking and debanking of flip-flops to simultaneously optimize these metrics across different scenarios. Experiments on the test cases of the ICCAD 2024 contest [1] show that on average, our approach reduces the value of the objective function by 7.8%, and shortens the execution time by 41% compared to the first-place winner, demonstrating both effectiveness and efficiency.

R4-4
TitleError Recovery in MEDA Biochips Using Deep Reinforcement Learning with Electrode Health Awareness
Author*Yash Gupta, Purrnima Singh (Netaji Subhas University of Technology, New Delhi, India), Syed Rameem Zahra (Sher-e-Kashmir University of Agricultural Sciences and Technology, Kashmir, J&K, India), Ankur Gupta (Netaji Subhas University of Technology, New Delhi, India), Shigeru Yamashita (Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu, Shiga 525-8577, JAPAN., Japan)
KeywordSwin Transformer, Proximal Policy Optimization, Temporal Graph Transformer, Graph neural netowrk - ordinary differntial equation, Biochip life-span
AbstractMicro-Electrode-Dot-Array (MEDA) biochips offer flexible, scalable solutions for applications like diagnostics and DNA sequencing. However, frequent hardware issues limit their reliability. To address this, we introduce a deep reinforcement learning (DRL)-based framework that adapts in real time using sensor feedback. Uniquely, it also accounts for electrode-health degradation, improving system resilience. This intelligent, adaptive approach enhances the biochip’s reliability, making it more suitable for real-world, long-term use.

R4-5
TitleA Stochastic Number Comparator by Utilizing Positive Correlation
Author*Nao Shinoda, Zhou Songyu, Shigeru Yamashita (Ritsumeikan University, Japan)
KeywordStochastic Computing, Comparator, Positive Correlation
AbstractStochastic Computing (SC) is an approximate computation method that encodes values by the probability of 1s in bit streams. One of the possible applications of SC computation are image processing applications where median filters are widely used for noise reduction. To implement SC median filters, we need to compare stochastic numbers (SNs). However, conventional SC comparison circuits typically rely on up-down counters, leading to a large circuit area. Considering the above, this paper proposes a compact SN comparator that exploits the positive correlation between SNs. The proposed design reduces the circuit area by approximately 91.0%, from 1581μm2 to 142μm2, compared to conventional approaches.

R4-6
TitleConcurrent Detection of Multiple Thermal Fault Injection Attacks on Optical Neural Networks
Author*Kota Nishida, Yoshihiro Midoh, Noriyuki Miura (The University of Osaka, Japan), Satoshi Kawakami (Kyushu University, Japan), Alex Orailoglu (University of California, San Diego, USA), Jun Shiomi (The University of Osaka, Japan)
KeywordSilicon Photonics-based AI Accelerator (SPAA), Optical Neural Network (ONN), Thermal Fault Injection Attack
AbstractOptical Neural Networks (ONNs) have been regarded as a promising paradigm providing high computational efficiency for artificial intelligence-based applications. Although ONNs have been widely studied to maximize their computational efficiency, their physical security has only recently been paid attention. This paper tackles with thermal fault injection attacks on Silicon Photonics AI Accelerators (SPAAs), which tampers with optical signals in SPAAs to cause misprediction. A concurrent detection method of thernal fault injection attacks is proposed in this paper. The proposed method achieves over 96% attackcaused average misprediction recall with no significant hardware and computational overhead.

R4-7
TitleA design hackathon to bridge AI and hardware
Author*Hideharu Amano, Takao Goto, Mizuho Nitami, Yuki Mitarai, Jiawei Yu, Yuxuan Pan, Atsutake Kosuge, Makoto Ikeda (The University of Tokyo, Japan)
KeywordYoLov3, Design Hackathon, FPGA, Vitis-AI, SoM
AbstractThis paper introduces a design hackathon approach and case study that allows beginners in hardware de- sign to compete in terms of performance and accu- racy using YOLO, a representative object detection method. Rather than optimizing the hardware itself, the focus is on employing system-level optimization techniques on a low-cost KV260 board. Even students from non-technical backgrounds were able to achieve several-fold performance improvements with the help of tools like ChatGPT.

R4-8
TitleA Study of Image Classifier Combining In-pixel Array Operations and Digital Matrix Operations in Image Sensors
Author*Takeshi ENOMOTO, Kota IMAGAWA, Kota YOSHIDA, Shunsuke OKURA (Research Organization of Science and Engineering, Ritsumeikan University, Japan)
KeywordCMOS image sensor, image classification, systolic array, convolutional neural networks
AbstractToward the IoT era, where a vast number of sensors and AI-driven analysis are deployed, this study proposes an on-chip image classification system that integrates lightweight neural networks within CMOS image sensors (CIS). The system combines in-pixel and in-column analog convolution with digital matrix computations, enabling data processing directly at the sensor level. This approach reduces communication overhead and system power consumption compared to MCU-based classification, thereby enhancing efficiency and real-time performance. To evaluate the feasibility of the proposed on-chip image classification system, we conducted the matrix computation circuit area and accuracy through software simulations. In the circuit area evaluation, the matrix computation circuit was designed to occupy less than 10% of the total area of the CIS chip. The matrix computation circuit supports up to 10 classification classes for a 1:1 aspect ratio image when the PE array column count is 8 or 16, and up to 5 classes with 32 columns. For a 1:2 aspect ratio image and 2-class classification, 32 columns PE can be supported. Under a given area constraint, the image classification accuracy for the MNIST, Fashion-MNIST, and INRIA-Person datasets reached 88.75%, 79.91%, and 83.79%, respectively.

R4-9
TitleLMESN: A Low-Power Hardware Reservoir Computing Architecture Based on MOSFET Leakage Variation
Author*Masami Utsunomiya, Hiroya Murata, Ryuto Seki (Kyoto University, Japan), Haoyuan Li (Xi’an Jiaotong University, China), Hiromitsu Awano, Takashi Sato (Kyoto University, Japan)
KeywordReservoir Computing, Echo State Network, Edge Computing, Leakage Current, Analog Computing
AbstractWe propose LMESN, a low-power hardware reservoir computing architecture that leverages variations in MOSFET leakage currents to perform analog computation. In LMESN, input values and internal states are encoded as pulse widths, and computation is achieved by discharging capacitors through the leakage currents of MOSFETs. By eliminating the analog peripheral circuits required in conventional MOSFET-based Echo State Networks (ESNs), such as operational amplifiers, LMESN achieves a simplified design with improved energy efficiency. Each circuit component is designed and validated using SPICE simulations based on a commercial 22 nm process, and the extracted characteristics are incorporated into a Python-based inference model via lookup tables. Evaluation experiments on two time-series classification datasets, JapaneseVowels and PenDigits, demonstrate that LMESN achieves classification accuracy comparable to software-based ESNs. Power consumption estimates highlight that cell leakage currents are the primary contributors to energy use, underscoring the importance of leakage control for further power reduction. Overall, LMESN offers a promising hardware platform that balances miniaturization, low power consumption, and competitive inference performance, making it well-suited for real-time analog time-series processing in edge devices.

R4-10
TitleComparison of latch-based circuit and flip-flop-based circuit in actual device
Author*Kenji TAKAHASHI, Tadaaki TANIMOTO, Keizo HIRAGA, Masayuki HAYASHI, Takato INOUE, Kazuhiro BESSHO, Toshimasa SHIMIZU (Sony Semiconductor Solutions Corporation, Japan)
KeywordLatch, Power consumption, Maximum operating frequency characteristic, Minimum operating voltage characteristic, Actual device measurement
AbstractThis paper reports the comparison results of current consumption, maximum operating frequency characteristics (Fmax), and minimum operating voltage characteristics (Vmin) of latch-based and flip-flop-based circuits. The latch-based circuits, under certain conditions, consume less current, have a higher Fmax at the same voltage, and a lower Vmin at the same frequency. These results show that latch-based circuits can reduce power at the same frequency as flip-flop-based circuits.

R4-11
TitleWafer to Lot-level S-parameter Prediction in Radio Frequency Testing Using Radial Basis Function Neural Network
Author*Huimin WANG, Yasuhiko Iguchi, Chika Tanaka (Kioxia Corporation, Japan)
KeywordS-parameter, Radial Basis Function Neural Network, Machine Learning, RF test
AbstractTo reduce the production testing costs without sacrificing quality, a wafer to lot-level performance prediction has gained traction as a key enabler for production tests. Although many effective prediction methods for physical properties have been proposed, there are a few methods that can predict S-parameters accurately. In this study, we propose a novel S-parameter prediction method based Radial Basis Function neural network to make the accurate predictions using a minimal data set.

R4-12
TitleStriking Force Estimation on a Punching Bag Using IMU and Computer Vision
AuthorTsung-Han Lai, Ming-Qi Hsu, Yi-Ting Li, Wuqian Tang (National Tsing Hua University, Taiwan), Yun-Ju Lee (National Tsing-Hua University, Taiwan), Yung-Chih Chen (National Taiwan University of Science and Technology, Taiwan), Wen-Hsin Chiu, *Chun-Yao Wang (National Tsing Hua University, Taiwan)
Keywordboxing, inertial measurement unit, computer vision
AbstractWe propose a striking force estimation system for punching bag training that combines an inertial measurement unit (IMU) and a vision-based pipeline to extract angular acceleration and impact location. A machine learning model is trained using data from a pneumatic impact device. Experimental results show that our method achieves accurate and robust predictions without interfering with the athlete’s movement, making it suitable for real-world training scenarios.

R4-13
TitleImproving Bokeh Simulation on CPUs: Faster Inference and Better Perception
Author*Chia-Lin Chang, Hao-Cheng Hsu, Cen-En Jian, Yu-Hui Huang (Yuan Ze University/Department of Electrical Engineering, Taiwan)
KeywordBokeh Simulation, CPU Inference, Model Optimization
AbstractWe present a set of lightweight optimization techniques to accelerate bokeh simulation on CPUs while improving perceptual quality. By integrating XLA compilation, multiprocessing, edge-aware interpolation, and weight pruning, our method achieves a 7.3% speedup over the baseline and improves both PSNR and SSIM. The results demonstrate practical value for CPU-only deployment scenarios.

R4-14
TitleCross-Modal Quantization of BLIP-2 Using Activation-Aware Weight Quantization
AuthorHui-Yun Deng, Chia-Yun Chiang, *Yu-Hui Huang (Yuan Ze University, Taiwan)
KeywordActivation-aware Weight Quantization (AWQ), BLIP-2, Cross-modal Inference, Vision-Language Models, Edge AI Deployment
AbstractThis work explores the extension of Activation-aware Weight Quantization (AWQ) to the multimodal BLIP2 architecture, encompassing both the language and vision components. We apply AWQ to the OPT-2.7B model and adapt it for the EVA-ViT-G vision encoder by selectively quantizing only the Value projection within the fused QKV attention layers. Our method aims to enable efficient inference on edge devices with limited memory capacity. Evaluation on the COCO VQAv2 dataset shows that while AWQ effectively reduces memory usage and preserves reasonable accuracy for the language module, applying full quantization to both components yields greater efficiency but at the cost of accuracy, in terms of memory consumption, inference latency, and performance. This study highlights the trade-offs and design considerations involved in deploying cross-modal quantization strategies on resource-constrained hardware.