PCS-2010 Technical Program

28th Picture Coding Symposium

Session P2 Poster Session 2
Time: 9:30 - 11:00 Thursday, December 9, 2010
Chair: Kazunori Kotani (Japan Advanced Institute of Science and Technology, Japan)

[3DTV/FTV/multi-view-related topics]

P2-1

Title Focus on Visual Rendering Quality through Content-Based Depth Map Coding

Author Emilie Bosc, Luce Morin, Muriel Pressigout (INSA of Rennes, France)

Page pp. 158 - 161

Keyword 3D video coding, adaptive coding, depth coding

Abstract Multi-view video plus depth (MVD) data is a set of multiple sequences capturing the same scene at different viewpoints, with their associated per-pixel depth value. Overcoming this large amount of data requires an effective coding framework. Yet, a simple but essential question refers to the means assessing the proposed coding methods. While the challenge in compression is the optimization of the rate-distortion ratio, a widely used objective metric to evaluate the distortion is the Peak-Signal-to-Noise-Ratio (PSNR), because of its simplicity and mathematically easiness to deal with such purposes. This paper points out the problem of reliability, concerning this metric, when estimating 3D video codec performances. We investigated the visual performances of two methods, namely H.264/MVC and Locally Adaptive Resolution (LAR) method, by encoding depth maps and reconstructing existing views from those degraded depth images. The experiments revealed that lower coding efficiency, in terms of PSNR, does not imply a lower rendering visual quality and that LAR method preserves the depth map properties correctly.

P2-2

Title	Bit Allocation of Vertices and Colors for Patch-Based Coding in Time-Varying Meshes
Author	Toshihiko Yamasaki, Kiyoharu Aizawa (The University of Tokyo, Japan)
Page	pp. 162 - 165
Keyword	3DTV, Time-varying mesh (TVM), inter-frame compression, vector quantization (VQ)
Abstract	This paper discusses the optimal bit rates assignment for vertices and color and for reference frames (I frames) and target frames (P frames) in the patch-based compression method for Time-Varying Meshes (TVMs). TMVs are non-isomorphic 3D mesh sequences generated from multi-view images. Experimental results demonstrated that, the bit rate for vertices affects the visual quality of the rendered 3D model very much whereas that for color does not contribute to the quality improvement. Therefore, as many bits as possible should be assigned to vertices and 8-10 bits per vertex (bpv) is enough for color. In inter-frame coding, the bit rates for the target frames improves the visual quality proportionally but at the same time it is demonstrated that less bits (5~6 bpv) are enough to achieve the same visual quality as the intra-frames.

P2-3

Title	Motion Activity-Based Block Size Decision for Multi-view Video Coding
Author	Huanqiang Zeng, Kai-Kuang Ma (School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore), Canhui Cai (Institute of Information Science and Technology, Huaqiao University, China)
Page	pp. 166 - 169
Keyword	multi-view video coding, motion estimation, disparity estimation, block size decision, motion activity
Abstract	Motion estimation and disparity estimation using variable block sizes have been exploited in multi-view video coding to effectively improve the coding efficiency, but at the expense of yielding higher computational complexity. In this paper, a fast block size decision algorithm, called motion activity-based block size decision (MABSD), is proposed. In our approach, the various motion estimation and disparity estimation block sizes are classified into four classes, and only one of them will be chosen to further identify the optimal block size within that class according to the measured motion activity of the current macroblock. The above-mentioned motion activity can be measured by the maximum city-block distance of a set of motion vectors taken from the adjacent macroblocks in the current view and its neighboring view. Experimental results have shown that compared with exhaustive block size decision, which is a default approach set in the JMVM reference software, the proposed MABSD algorithm achieves a reduction of computational complexity by 42% on average, while incurring only 0.01 dB loss in peak signal-to-noise ratio (PSNR) and 1% increment on the total bit rate.

P2-4

Title	Color Based Depth Up-Sampling for Depth Compression
Author	Meindert Onno Wildeboer, Tomohiro Yendo, Mehrdad Panahpour Tehrani (Nagoya University, Japan), Toshiaki Fujii (Tokyo Institute of Technology, Japan), Masayuki Tanimoto (Nagoya University, Japan)
Page	pp. 170 - 173
Keyword	depth up-sampling, depth coding, depth-map, FTV, 3DTV
Abstract	3D scene information can be represented in several ways. In applications based on a (N-)view plus (N-)depth representation, both view and depth data is compressed. In this paper we present a depth compression method containing an depth up-sample filter which uses the color view as prior. Our method of depth down-/up-sampling is able to maintain clear object boundaries in the reconstructed depth maps. Our experimental results show that the proposed depth re-sampling filter, used in combination with a standard state-of-the art video encoder, can increase both the coding efficiency and rendering quality.

P2-5

Title	Efficient Free Viewpoint Video-On-Demand Scheme Realizing Walk-Through Experience
Author	Akio Ishikawa, Hiroshi Sankoh, Sei Naito, Shigeyuki Sakazawa (KDDI R&D Laboratories Inc., Japan)
Page	pp. 174 - 177
Keyword	Freeviewpoint Television, Multi-view Video, Freeviewpoint Video Transmission, Walk-through, Multi-texturing
Abstract	This paper presents an efficient VOD scheme for FTV, and proposes a data format and its data generation method to provide a walk-through experience. We employ a hybrid rendering approach to describe a 3D scene using 3D model data for objects and textures. In this paper we propose an efficient texture data format, which removes the redundancy due to occlusion of objects by employing an orthogonal projection image for each object. The advantage of the data format is great simplification at the server to choose the transmitted images that correspond to the requested viewpoint.

P2-6

Title	3-D Video Coding Using Depth Transition Data
Author	Woo-Shik Kim, Antonio Ortega (University of Southern California, U.S.A.), Jaejoon Lee, HoCheon Wey (Samsung Electronics Co., Ltd., Republic of Korea)
Page	pp. 178 - 181
Keyword	3-D video coding, multiview plus depth, view synthesis
Abstract	The objective is to develop new 3-D video coding system to provide better coding efficiency with improved subjective quality. We analyzed rendered view distortions in DIBR system, and found that the depth map coding distortion leads to “erosion artifacts”, which lead to significant perceptual quality degradation. To solve this, we propose a solution using depth transition data which indicates the camera position where depth changes. Simulation results show that significant subjective quality improvement with maximum PSNR gains of 0.5 dB.

P2-7

Title	Subjective Assessment of Frame Loss Concealment Methods in 3D Video
Author	Joao Carreira, Luis Pinto, Nuno Rodrigues, Sergio Faria, Pedro Assuncao (Institute of Telecommunications / Polytechnic Institute of Leiria - ESTG, Portugal)
Page	pp. 182 - 185
Keyword	3D video, Frame loss concealment, Subjective Assessment
Abstract	This paper investigates the subjective impact resulting from different concealment methods for coping with lost frames in 3D video communication systems. It is assumed that a high priority channel is assigned to the main view and only the auxiliary view is subject to either transmission errors or packet loss, leading to missing frames at the decoder output. Three methods are used for frame concealment under different loss ratios. The results show that depth is well perceived by users and the subjective impact of frame loss not only depends on the concealment method but also exhibits high correlation with the disparity of the original sequence. It is also shown that under heavy loss conditions it is better to switch from 3D to 2D rather than presenting concealed 3D video to users.

P2-8

Title	A Novel Upsampling Scheme for Depth Map Compression in 3DTV System
Author	Yanjie Li, Lifeng Sun (Tsinghua University, China)
Page	pp. 186 - 189
Keyword	Depth map compression, Resolution reduction, Upsampling, Rate-distortion
Abstract	In this paper, we propose a novel two-step depth map upsampling scheme to address the problem on 3D videos. The first step utilizes the full resolution 2D color map to help reconstruct a more accurate full resolution depth map. And the second step further flats the reconstructed depth map to ensure its local uniformity. Test results show that the proposed upsampling scheme achieves up to 2dB coding gains for the rendering of free-viewpoint video, and improves its perceptual quality significantly.

[Beyond H.264/MPEG-4 AVC and related topics]

P2-9

Title Adaptive Direct Vector Derivation for Video Coding

Author Yusuke Itani, Shunichi Sekiguchi, Yoshihisa Yamada (Information Technology R&D Center, Mitsubishi Electric Corporation, Japan)

Page pp. 190 - 193

Keyword direct mode, motion vector predictor, HEVC, H.264, extended macroblock

Abstract This paper proposes a new method for improving direct prediction scheme that has been employed in conventional video coding standards such as AVC/H.264. We extend direct prediction concept to achieve better adaptation to local statistics of video source with the assumption of the use of larger motion blocks than conventional macroblock size. Experimental results show the proposed method provides up to 3.3% bitrate saving in low-bitrate coding.

P2-10

Title	Inter Prediction Based on Spatio-Temporal Adaptive Localized Learning Model
Author	Hao Chen, Ruimin Hu, Zhongyuan Wang, Rui Zhong (Wuhan University, China)
Page	pp. 194 - 197
Keyword	Inter prediction, STALL, LSP
Abstract	Inter prediction based on block matching motion estimation is important for video coding. But this method suffers from the additional overhead in data rate representing the motion information that needs to be transmitted to the decoder. To solve this problem, we present an improved implicit motion information inter prediction algorithm for P slice in H.264/AVC based on the spatio-temporal adaptive localized learning (STALL) model. According to 4*4 block transform structure in H.264/AVC, we first adaptively choose nine spatial neighbors and nine temporal neighbors, and a localized 3D casual cube is designed as training window. By using these information, the model parameters could be adaptively computed based on the Least Square Prediction (LSP) method. Finally, we add a new inter prediction mode into H.264/AVC standard for P slice. The experimental results show that our algorithm improves encoding efficiency compared with H.264/AVC standard, with relatively increases in complexity.

P2-11

Title	Intra Picture Coding with Planar Representations
Author	Jani Lainema, Kemal Ugur (Nokia Research Center, Finland)
Page	pp. 198 - 201
Keyword	video coding, intra coding, planar coding, H.264/AVC, HEVC
Abstract	In this paper we introduce a novel concept for Intra coding of pictures especially suitable for representing smooth image segments. Traditional block based transform coding methods cause visually annoying blocking artifacts for image segments with gradually changing smooth content. The proposed solution overcomes this drawback by defining a fully continuous surface of sample values approximating the original image. The gradient of the surface is indicated by transmitting values for selected control points within the image segment and the surface itself is obtained by interpolating sample values in-between the control points. This approach is found to provide up to 30 percent bitrate reductions in the case of natural imagery and it has also been adopted to the initial HEVC codec design by JCT-VC.

P2-12

Title	Adaptive Global Motion Temporal Prediction for Video Coding
Author	Alexander Glantz, Andreas Krutz, Thomas Sikora (Technische Universität Berlin, Germany)
Page	pp. 202 - 205
Keyword	H.264/AVC, video coding, global motion, temporal filtering, prediction
Abstract	Depending on the content of a video sequence, the amount of bits spent for the transmission of motion vectors can be enormous. A global motion model can be a better representation of movement in these regions than a motion vector. This paper presents a novel prediction technique that is based on global motion compensation and temporal filtering. The new approach is incorporated into H.264/AVC and outperforms the reference by up to 14%.

P2-13

Title	Highly Efficient Video Compression Using Quadtree Structures and Improved Techniques for Motion Representation and Entropy Coding
Author	Detlev Marpe, Heiko Schwarz, Thomas Wiegand, Sebastian Bosse, Benjamin Bross, Philipp Helle, Tobias Hinz, Heiner Kirchhoffer, Haricharan Lakshman, Tung Nguyen, Simon Oudin, Mischa Siekmann, Karsten Suehring, Martin Winken (Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Germany)
Page	pp. 206 - 209
Keyword	Video coding, H.265, HEVC
Abstract	This paper describes a novel video coding scheme that can be considered as a generalization of the block-based hybrid video coding approach of H.264/AVC. While the individual building blocks of our approach are kept simple similarly as in H.264/AVC, the flexibility of the block partitioning for prediction and transform coding has been substantially in-creased. This is achieved by the use of nested and pre-configurable quadtree structures, such that the block parti-tioning for temporal and spatial prediction as well as the space-frequency resolution of the corresponding prediction residual can be adapted to the given video signal in a highly flexible way. In addition, techniques for an improved motion representation as well as a novel entropy coding concept are included. The presented video codec was submitted to a Call for Proposals of ITU-T VCEG and ISO/IEC MPEG and was ranked among the five best performing proposals, both in terms of subjective and objective quality.

[Image/video coding and related topics]

P2-14

Title Dictionary Learning-Based Distributed Compressive Video Sensing

Author Hung-Wei Chen, Li-Wei Kang, Chun-Shien Lu (Academia Sinica, Taiwan)

Page pp. 210 - 213

Keyword compressive sensing, sparse representation, dictionary learning, single-pixel camera, l1-minimization

Abstract We address an important issue of fully low-cost and low-complex video compression for use in resource-extremely limited sensors/devices. Conventional motion estimation-based video compression or distributed video coding (DVC) techniques all rely on the high-cost mechanism, namely, sensing/sampling and compression are disjointedly performed, resulting in unnecessary consumption of resources. That is, most acquired raw video data will be discarded in the (possibly) complex compression stage. In this paper, we propose a dictionary learning-based distributed compressive video sensing (DCVS) framework to “directly” acquire compressed video data. Embedded in the compressive sensing (CS)-based single-pixel camera architecture, DCVS can compressively sense each video frame in a distributed manner. At DCVS decoder, video reconstruction can be formulated as an l1-minimization problem via solving the sparse coefficients with respect to some basis functions. We investigate adaptive dictionary/basis learning for each frame based on the training samples extracted from previous reconstructed neighboring frames and argue that much better basis can be obtained to represent the frame, compared to fixed basis-based representation and recent popular “CS-based DVC” approaches without relying on dictionary learning.

P2-15

Title	Medium-Granularity Computational Complexity Control for H.264/AVC
Author	Xiang Li, Mathias Wien, Jens-Rainer Ohm (Institute of Communications Engineering, RWTH Aachen University, Germany)
Page	pp. 214 - 217
Keyword	Computational complexity control, computational scalability
Abstract	Today, video applications on handheld devices become more and more popular. Due to limited computational capability of handheld devices, complexity constrained video coding draws much attention. In this paper, a medium-granularity computational complexity control (MGCC) is proposed for H.264/AVC. First, a large dynamic range in complexity is achieved by taking 16x16 motion estimation in a single reference frame as the basic computational unit. Then a high coding efficiency is obtained by an adaptive computation allocation at MB level. Simulations show that coarse-granularity methods cannot work when the normalized complexity is below 15\%. In contrast, the proposed MGCC performs well even when the complexity is reduced to 8.8\%. Moreover, an average gain of 0.3 dB over coarse-granularity methods in BD-PSNR is obtained for 11 sequences when the complexity is around 20\%.

P2-16

Title	An Improved Wyner-Ziv Video Coding With Feedback Channel
Author	Feng Ye, Aidong Men, Bo Yang, Manman Fan, Kan Chang (School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, China)
Page	pp. 218 - 221
Keyword	Wyner-Ziv video coding, motion activity, 3DRS motion estimation, side information
Abstract	This paper presents a improved feedback-assisted low complexity WZVC scheme. The performance of this scheme is improved by two enhancements: an improved mode-based key frame encoding and a 3DRS-assisted (three-dimensional recursive search assisted) motion estimation algorithm for WZ encoding. Experimental results show that our coding scheme can achieve significant gain compared to state-of-the-art TDWZ codec while still low encoding complexity.

P2-17

Title	Background Aided Surveillance-Oriented Distributed Video Coding
Author	Hongbin Liu (Harbin Institute of Technology, China), Siwei Ma (Peking University, China), Debin Zhao (Harbin Institute of Technology, China), Wen Gao (Peking University, China), Xiaopeng Fan (Harbin Institute of Technology, China)
Page	pp. 222 - 225
Keyword	surveillance, background, distributed video coding
Abstract	This paper presents a background aided surveillance-oriented distributed video coding system. A high quality background frame is encoded for each group of pictures (GOP), which can provides high quality SI for the background parts of the Wyner-Ziv (WZ) frames. Consequently, bit rate for the WZ frames can be reduced. Experimental results demonstrate that the proposed system can decrease the bit rate by up to 67.4% when compared with traditional DVC codec.

P2-18

Title	Content-Adaptive Spatial Scalability for Scalable Video Coding
Author	Yongzhe Wang (Shanghai Jiao Tong University, China), Nikolce Stefanoski (Disney Research Zurich, Switzerland), Xiangzhong Fang (Shanghai Jiao Tong University, China), Aljoscha Smolic (Disney Research Zurich, Switzerland)
Page	pp. 226 - 229
Keyword	H.264/AVC, scalable video coding, spatial scalability, content-adaptation, non-linear image warping
Abstract	This paper presents an enhancement of the SVC extension of the H.264/AVC standard by content-adaptive spatial scalability (CASS). The video streams (spatial layers), which are used as input to the encoder, are created by content-adaptive and art-directable retargeting of existing high resolution video. The non-linear dependencies between such video streams are efficiently exploited by CASS for scalable coding. This is achieved by integrating warping-based non-linear texture prediction and warp coding into the SVC framework.

P2-19

Title	Colorization-Based Coding by Focusing on Characteristics of Colorization Bases
Author	Shunsuke Ono, Takamichi Miyata, Yoshinori Sakai (Tokyo Institute of Technology, Japan)
Page	pp. 230 - 233
Keyword	Colorization, Colorization-based coding, Representative pixels, Redundancy, Correct color
Abstract	A novel approach to image compression called colorization-based coding has recently been proposed. It automatically extracts representative pixels from an original color image at an encoder and restores a full color image by using colorization at a decoder. However, previous studies on colorization-based coding extract redundant representative pixels. We propose a new colorization-based coding method focuses on the colorization basis. Experimental results revealed that our method can drastically suppress the information amount compared conventional method while objective quality is maintained.

P2-20

Title	Wyner-Ziv Coding of Multispectral Images for Space and Airborne Platforms
Author	Shantanu Rane, Yige Wang, Petros Boufounos, Anthony Vetro (Mitsubishi Electric Research Laboratories, U.S.A.)
Page	pp. 234 - 237
Keyword	Multispectral, Wyner-Ziv coding, LDPC code
Abstract	This paper investigates the application of lossy distributed source coding to high resolution multispectral images. The choice of distributed source coding is motivated by the need for very low encoding complexity on space and airborne platforms. The data consists of red, blue, green and infra-red channels and is compressed in an asymmetric Wyner-Ziv setting. One image channel is compressed using traditional JPEG and transmitted to the ground station where it is available as side information for Wyner-Ziv coding of the other channels. Encoding is accomplished by quantizing the image data, applying a Low-Density Parity Check code to the remaining three image channels, and transmitting the resulting syndromes. At the ground station, the image data is recovered from the syndromes by exploiting the correlation in the frequency spectrum of the band being decoded and the JPEG-decoded side information band. In experiments with real uncompressed images obtained by a satellite, the rate-distortion performance is found to be vastly superior to JPEG compression of individual image channels and rivals that of JPEG2000 at much lower encoding complexity.

P2-21

Title	Reversible Component Transforms by the LU Factorization
Author	Hisakazu Kikuchi, Junghyeun Hwang, Shogo Muramatsu (Niigata University, Japan), Jaeho Shin (Dongguk University, Republic of Korea)
Page	pp. 238 - 241
Keyword	component transform, image compression, LU factorization, lifting, round-off error
Abstract	A scaled transform is defined for a given irreversible linear transformation based on the LU factorization of a nonsingular matrix so that the transformation may be computed in a lifting form and hence may be reversible. Round-off errors in the lifting computation and the computational complexity are analyzed. Some reversible component transforms are presented and experimented to give some remarks to image compression applications. Discussions are developed with coding gain and actual bit rates.

P2-22

Title	MOS-Based Bit Allocation in SNR-Temporal Scalable Coding
Author	Yuya Yamasaki, Toshiyuki Yoshida (University of Fukui, Japan)
Page	pp. 242 - 245
Keyword	scalable coding, video coding, MOS, bit allocation
Abstract	In a scalable video coding, improvement in the motion smoothness (frame rate) and in the spatial quality (SNR) conflicts within a given bit rate. Since the motion and the spatial activities in a target video varies from scene to scene, these activities should be taken into account in order to optimally allocate the temporal and quality scalability. This paper proposes a fundamental idea for allocating bit rates in the temporal and quality scalability based on the maximization of the estimated mean opinion score (MOS) for each scene. A reduction technique of MOS fluctuation in the enhancement layer is discussed as an application of the proposed technique.

[Image/video processing and related topics]

P2-23

Title Automatic Moving Object Extraction Using X-means Clustering

Author Kousuke Imamura, Naoki Kubo, Hideo Hashimoto (Kanazawa University, Japan)

Page pp. 246 - 249

Keyword moving object extraction, x-means clustering, watershed algorithm, voting method

Abstract The present paper proposes an automatic extraction technique of moving objects using x-means clustering. The proposed technique is an extended k-means clustering and can determine the optimal number of clusters based on the Bayesian Information Criterion(BIC). In the proposed method, the feature points are extracted from a current frame, and x-means clustering classifies the feature points based on their estimated affine motion parameters. A label is assigned to the segmented region, which is obtained by morphological watershed, by voting for the feature point cluster in each region. The labeling result represents the moving object extraction. Experimental results reveal that the proposed method provides extraction results with the suitable object number.

P2-24

Title	Accurate Motion Estimation for Image of Spatial Periodic Pattern
Author	Jun-ichi Kimura, Naohisa Komatsu (School of Fundamental Science and Engineering, Waseda University, Japan)
Page	pp. 250 - 253
Keyword	motion estimation, motion vector
Abstract	We investigate the mechanism of the error of motion estimation for image including spatial periodic patterns. With a motion estimation model, we conclude that block matching distortion caused by motion vector sampling error (DVSE) degrades the motion estimation accuracy. We propose a new motion estimation using a maximum value of DVSE for each block. Simulations show that precision of proposed method is over 98% which are superior to full search method.

P2-25

Title	Direction-Adaptive Image Upsampling Using Double Interpolation
Author	Yi-Chun Lin, Yi-Nung Liu, Shao-Yi Chien (National Taiwan University, Taiwan)
Page	pp. 254 - 257
Keyword	double interpolation, upsampling, direction-adaptive, zigzagging, bicubic
Abstract	Double interpolation quality evaluation can be used as a measurement of an interpolation operation. By using this double interpolation framework, a high efficient direction-adaptive upsampling algorithm is proposed without any threshold setting and post-processing. With the proposed upsampling algorithm, the problem of zigzagging artifacts on the edge no longer exists. Moreover, the proposed algorithm has low computation complexity. The experimental results show that the proposed algorithm has high quality image output.

P2-26

Title	Blind GOP Structure Analysis of MPEG-2 and H.264/AVC Decoded Video
Author	Gilbert Yammine, Eugen Wige, Andre Kaup (Multimedia Communications and Signal Processing - University of Erlangen-Nuremberg, Germany)
Page	pp. 258 - 261
Keyword	GOP Structure, Blind Analysis, Noise Estimation
Abstract	In this paper, we provide a simple method for analyzing the GOP structure of an MPEG-2 or H.264/AVC decoded video without having access to the bitstream. Noise estimation is applied on the decoded frames and the variance of the noise in the different I-, P-, and B-frames is measured. After the encoding process, the noise variance in the video sequence shows a periodic pattern, which helps in the extraction of the GOP period, as well as the type of frames. This algorithm can be used along with other algorithms to blindly analyze the encoding history of a video sequence. The method has been tested on several MPEG-2 DVB and DVD streams, as well as on H.264/AVC encoded sequences, and shows successful results in both cases.

P2-27

Title	Distributed Video Coding Based on Adaptive Slice Size Using Received Motion Vectors
Author	Kyungyeon Min, Seanae Park, Donggyu Sim (Kwangwoon University, Republic of Korea)
Page	pp. 262 - 265
Keyword	DVC, crossover, adaptive slice control
Abstract	In this paper, we propose a new distributed video coding (DVC) method based on adaptive slice size using received motion vectors (MVs). In the proposed algorithm, the MVs estimated at a DVC decoder are transmitted to a corresponding encoder. In the proposed encoder, a predicted side information (PSI) is reconstructed with the transmitted MVs and key frames. Therefore, the PSI can be generated same to side information (SI) at the decoder. We can, also, calculate an exact crossover rate between the SI and original input frame using PSI and the original frame. As a result, the proposed method can transmit minimum parity bits to maximize error correction ability of a channel decoder with minimal computational complexity. Experimental results show that the proposed algorithm is better than several conventional DVC methods.

P2-28

Title	Improved Watermark Sharing Scheme Using Minimum Error Selection and Shuffling
Author	Aroba Khan, Yohei Yokoyama, Kiyoshi Tanaka (Shinshu University, Japan)
Page	pp. 266 - 269
Keyword	Watermark Sharing, Halftoning, error diffussion
Abstract	In this work, we focus on a watermark sharing scheme using error diffusion called DHCED, and try to overcome some drawbacks of this method. The proposed method simultaneously generates carrier halftone images that share the watermark information by selecting the minimum error caused in the noise function for watermark embedding. Also, the proposed method shuffles watermark image before embedding not only to increase the secrecy of the embedded watermark information but also improve the watermark detection ratio as well as the watermark appearance in the detection process.

[Quality, system, applications, and other topics]

P2-29

Title On the Duality of Rate Allocation and Quality Indices

Author Thomas Richter (University of Stuttgart, Germany)

Page pp. 270 - 273

Keyword JPEG 2000, SSIM

Abstract In a recent work, the author proposed to study the performance of still image quality indices such as the SSIM by using them as bjective function of rate allocation algorithms. The outcome of that work was not only a multi-scale SSIM optimal JPEG 2000 implementation, but also a first-order approximation of the MS-SSIM that is surprisingly similar to more traditional contrast-sensitivity and visual masking based approaches. It will be seen in this work that the only difference between the latter works and the MS-SSIM index is the choice of the exponent of the masking term, and furthermore, that a slight modification of the SSIM definition reproducing the traditional exponent is able to improve the performance of the index at or below the visual threshold. It is hence demonstrated that the duality of quality indices and rate allocation helps to improve both the visual performance of the compression codec and the performance of the index.

P2-30

Title	Image Quality Assessment Based on Local Orientation Distributions
Author	Yue Wang (Graduate University of Chinese Academy of Sciences, China), Tingting Jiang, Siwei Ma, Wen Gao (Institute of Digital Media, Peking University, China)
Page	pp. 274 - 277
Keyword	mage quality assessment (IQA), human visual system (HVS), Histograms of Oriented Gradients (HOG)
Abstract	Image quality assessment (IQA) is very important for many image and video processing applications, e.g. compression, archiving, restoration and enhancement. An ideal image quality metric should achieve consistency between image distortion prediction and psychological perception of human visual system (HVS). Inspired by that HVS is quite sensitive to image local orientation features, in this paper, we propose a new structural information based image quality metric, which evaluates image distortion by computing the distance of Histograms of Oriented Gradients (HOG) descriptors. Experimental results on LIVE database show that the proposed IQA metric is competitive with state-of-the-art IQA metrics, while keeping relatively low computing complexity.

P2-31

Title	Distance and Relative Speed Estimation of Binocular Camera Images Based on Defocus and Disparity Information
Author	Mitsuyasu Ito, Yoshiaki Takada, Takayuki Hamamoto (Tokyo University of Science, Japan)
Page	pp. 278 - 281
Keyword	ITS, focus blur, disparity information, distance estimation, relative speed
Abstract	In this paper, we discuss a method of distance and relative speed estimation for ITS. In this method, we use different focus positions of two cameras for obtaining the amount of focus blur. Next, we propose the method of distance estimation by the amount and disparity information. According to the result of simulation, proposed method was reasonably. In addition, we compose a prototype system for the real-time estimation. As a result of the implementation of processing, system was properly validated.

P2-32

Title	Comparing Two Eye-Tracking Databases: The Effect of Experimental Setup and Image Presentation Time on the Creation of Saliency Maps
Author	Ulrich Engelke (Blekinge Institute of Technology, Sweden), Hantao Liu (Delft University of Technology, Netherlands), Hans-Jürgen Zepernick (Blekinge Institute of Technology, Sweden), Ingrid Heynderickx (Philips Research Laboratories, Netherlands), Anthony Maeder (University of Western Sydney, Australia)
Page	pp. 282 - 285
Keyword	Eye tracking experiments, Visual saliency, Correlation analysis, Natural images
Abstract	Visual attention models are typically designed based on human gaze patterns recorded through eye tracking. In this paper, two similar eye tracking experiments from independent laboratories are presented, in which humans observed natural images under task-free condition. The resulting saliency maps are analysed with respect to two criteria; the consistency between the experiments and the impact of the image presentation time. It is shown, that the saliency maps between the experiments are strongly correlated independent of presentation time. It is further revealed that the presentation time can be reduced without substantially sacrificing the accuracy of the convergent saliency map. The results provide valuable insight into the similarity of saliency maps from independent laboratories and are highly beneficial for the creation of converging saliency maps at reduced experimental time and cost.

P2-33

Title	Successive Refinement of Overlapped Cell Side Quantizers for Scalable Multiple Description Coding
Author	Muhammad Majid, Charith Abhayaratne (The University of Sheffield, U.K.)
Page	pp. 286 - 289
Keyword	Multiple description coding, scalability, robustness
Abstract	Scalable multiple description coding (SMDC) provides reliability and facility to truncate the descriptions according to the user rate-distortion requirements. In this paper we generalize the conditions of successive refinement of the side quantizer of a multiple description scalar quantizer that has overlapped quantizer cells generated by a modified linear index assignment matrix. We propose that the split or refinement factor for each of the refinement side quantizers should be greater than the maximum side quantizer bin spread and should not be integer multiples of each other for satisfying the SMDC distortion conditions and verify through simulation results on scalable multiple description image coding.

P2-34

Title	VQ Based Data Hiding Method for Still Images by Tree-Structured Links
Author	Hisashi Igarashi, Yuichi Tanaka, Madoka Hasegawa, Shigeo Kato (Utsunomiya University, Japan)
Page	pp. 290 - 293
Keyword	Data Hiding, Vector Quantization, Tree-Structured Links
Abstract	In this paper, we propose a data embedding method into still images based on Vector Quantization (VQ). Several VQ-based data embedding methods, called 'Mean Gray-Level Embedding method (MGLE)' or 'Pair wise Nearest-Neighbor Embedding method (PNNE)' have been proposed. Those methods are, however, not sufficiently effective. Meanwhile an efficient adaptive data hiding method called 'Adaptive Clustering Embedding method (ACE)' was pro-posed, but is somewhat complicated because the VQ indices have to be adaptively clustered in the embedding process. In our proposed method, output vectors are linked with tree structure and information is embedded by using some of linked vectors. The simulation results show that our proposed method indicates higher SNR than the conventional methods under the same amounts of embedded data.

Title	Focus on Visual Rendering Quality through Content-Based Depth Map Coding
Author	Emilie Bosc, Luce Morin, Muriel Pressigout (INSA of Rennes, France)
Page	pp. 158 - 161
Keyword	3D video coding, adaptive coding, depth coding
Abstract	Multi-view video plus depth (MVD) data is a set of multiple sequences capturing the same scene at different viewpoints, with their associated per-pixel depth value. Overcoming this large amount of data requires an effective coding framework. Yet, a simple but essential question refers to the means assessing the proposed coding methods. While the challenge in compression is the optimization of the rate-distortion ratio, a widely used objective metric to evaluate the distortion is the Peak-Signal-to-Noise-Ratio (PSNR), because of its simplicity and mathematically easiness to deal with such purposes. This paper points out the problem of reliability, concerning this metric, when estimating 3D video codec performances. We investigated the visual performances of two methods, namely H.264/MVC and Locally Adaptive Resolution (LAR) method, by encoding depth maps and reconstructing existing views from those degraded depth images. The experiments revealed that lower coding efficiency, in terms of PSNR, does not imply a lower rendering visual quality and that LAR method preserves the depth map properties correctly.

Title	Adaptive Direct Vector Derivation for Video Coding
Author	Yusuke Itani, Shunichi Sekiguchi, Yoshihisa Yamada (Information Technology R&D Center, Mitsubishi Electric Corporation, Japan)
Page	pp. 190 - 193
Keyword	direct mode, motion vector predictor, HEVC, H.264, extended macroblock
Abstract	This paper proposes a new method for improving direct prediction scheme that has been employed in conventional video coding standards such as AVC/H.264. We extend direct prediction concept to achieve better adaptation to local statistics of video source with the assumption of the use of larger motion blocks than conventional macroblock size. Experimental results show the proposed method provides up to 3.3% bitrate saving in low-bitrate coding.

Title	Dictionary Learning-Based Distributed Compressive Video Sensing
Author	Hung-Wei Chen, Li-Wei Kang, Chun-Shien Lu (Academia Sinica, Taiwan)
Page	pp. 210 - 213
Keyword	compressive sensing, sparse representation, dictionary learning, single-pixel camera, l1-minimization
Abstract	We address an important issue of fully low-cost and low-complex video compression for use in resource-extremely limited sensors/devices. Conventional motion estimation-based video compression or distributed video coding (DVC) techniques all rely on the high-cost mechanism, namely, sensing/sampling and compression are disjointedly performed, resulting in unnecessary consumption of resources. That is, most acquired raw video data will be discarded in the (possibly) complex compression stage. In this paper, we propose a dictionary learning-based distributed compressive video sensing (DCVS) framework to “directly” acquire compressed video data. Embedded in the compressive sensing (CS)-based single-pixel camera architecture, DCVS can compressively sense each video frame in a distributed manner. At DCVS decoder, video reconstruction can be formulated as an l1-minimization problem via solving the sparse coefficients with respect to some basis functions. We investigate adaptive dictionary/basis learning for each frame based on the training samples extracted from previous reconstructed neighboring frames and argue that much better basis can be obtained to represent the frame, compared to fixed basis-based representation and recent popular “CS-based DVC” approaches without relying on dictionary learning.

Title	Automatic Moving Object Extraction Using X-means Clustering
Author	Kousuke Imamura, Naoki Kubo, Hideo Hashimoto (Kanazawa University, Japan)
Page	pp. 246 - 249
Keyword	moving object extraction, x-means clustering, watershed algorithm, voting method
Abstract	The present paper proposes an automatic extraction technique of moving objects using x-means clustering. The proposed technique is an extended k-means clustering and can determine the optimal number of clusters based on the Bayesian Information Criterion(BIC). In the proposed method, the feature points are extracted from a current frame, and x-means clustering classifies the feature points based on their estimated affine motion parameters. A label is assigned to the segmented region, which is obtained by morphological watershed, by voting for the feature point cluster in each region. The labeling result represents the moving object extraction. Experimental results reveal that the proposed method provides extraction results with the suitable object number.

Title	On the Duality of Rate Allocation and Quality Indices
Author	Thomas Richter (University of Stuttgart, Germany)
Page	pp. 270 - 273
Keyword	JPEG 2000, SSIM
Abstract	In a recent work, the author proposed to study the performance of still image quality indices such as the SSIM by using them as bjective function of rate allocation algorithms. The outcome of that work was not only a multi-scale SSIM optimal JPEG 2000 implementation, but also a first-order approximation of the MS-SSIM that is surprisingly similar to more traditional contrast-sensitivity and visual masking based approaches. It will be seen in this work that the only difference between the latter works and the MS-SSIM index is the choice of the exponent of the masking term, and furthermore, that a slight modification of the SSIM definition reproducing the traditional exponent is able to improve the performance of the index at or below the visual threshold. It is hence demonstrated that the duality of quality indices and rate allocation helps to improve both the visual performance of the compression codec and the performance of the index.