Tutorials

A. Murat Tekalp

Koç University, Turkey

Deep Learning for Image and Video Compression

Abstract

Conventional image/video compression methods employ a linear transform and block motion model (for video), and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are typically optimized individually due to the combinatorial nature of the end-to-end optimization problem. Learned image/video compression allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion compensation and entropy models simultaneously. A further benefit of data-driven deep learning approach is that neural models can be optimized for any differentiable loss function, including visual perceptual loss functions, leading to perceptual image/video compression, which cannot be easily handled by traditional codecs. This tutorial is divided into two parts: In the first part, I will review the fundamentals of and recent advances in learned image compression, including the multi-rate neural models and the rate-perception-distortion tradeoff in learned image coding. The second part is devoted to learned video compression. After an introduction to approaches for learned motion compensation, including recurrent models, I will discuss the state-of-the-art in learned video compression and present recent results on learned low-delay and random access codec configurations, including our own work on hierarchical bi-directional video compression that combines the benefits of hierarchical bi-directional motion compensation and end-to-end rate-distortion optimization.

Speaker’s Bio

A. Murat Tekalp received BS degrees in in electrical engineering and mathematics from Bogazici University (1980) with high honors, and his MS and PhD in electrical, computer, and systems engineering from Rensselaer Polytechnic Institute (RPI), Troy, New York (1982 and 1984, respectively). He was with Eastman Kodak Company, Rochester, New York, from December 1984 to June 1987, and with the University of Rochester, Rochester, New York, from July 1987 to June 2005, where he was promoted to Distinguished University Professor. Since June 2001, he is a Professor at Koç University, Istanbul, Turkey. He was the Dean of Engineering at Koç University between 2010-2013. His research interests are in the area of digital image and video processing, including image/video compression, super-resolution, segmentation, content-based video analysis and summarization, 3D video processing, deep learning for image and video processing, video streaming and real-time video communications services, and software-defined networking. Prof. Tekalp is a Fellow of IEEE and a member of Turkish Academy of Sciences, Science Academy, and Academia Europaea. He was named as Distinguished Lecturer by IEEE Signal Processing Society in 1998, and awarded a Fulbright Senior Scholarship in 1999. He has been the General Chair for IEEE Int. Conf. Image Processing (ICIP) in 2002, has been the Technical Program Co-Chair for ICIP 2020. He received the TUBITAK Science Award (highest scientific award in Turkey) in 2004. He served in the Editorial Boards of IEEE Signal Processing Magazine and Proceedings of the IEEE. He is currently on the Editorial Board of Wiley-IEEE Press. The new edition of his Prentice Hall book Digital Video Processing (1995) was published in June 2015. Dr. Tekalp holds eight US patents. He has participated in several European Framework projects, and is also a project evaluator for the European Commission and panel member for European Research Council.


Erdem Sahin

Tampere University, Finland

Robert Bregovic

Tampere University, Finland

Atanas Gotchev

Tampere University, Finland

Signal Processing Methods for Light Field Displays

Abstract

Light field displays are expected to truly recreate visual reality by accurately presenting all 3D visual cues, including binocularity, focus and continuous parallax. To achieve this, they aim at recreating the light field: a light model considering all light rays traveling in every direction through every point in space. Such a model comes in the form of high-dimensional functions, which must be properly sensed (sampled), processed and reconstructed. This calls for the corresponding signal processing methods, which set the topic of this tutorial. More specifically, the tutorial is organized in three parts. In the first part, we overview basics of light field imaging and displays, including light field parameterization and light ray propagation; visual cues and computational modelling of the human visual system; and a nomenclature of current light field displays. In the second part, we present approaches for display-specific light field analysis, including the notion of display bandwidth, display models and camera-display setup optimizations. In the third part, we address methods aimed at light field reconstruction, making use of signal sparsification in directional transform domains or data-driven machine learning models.

Speakers’ Bios

Erdem Sahin received the Ph.D. degree from the Electrical and Electronics Engineering Department, Bilkent University in 2013. In 2014, he joined the 3D Media Group in the Faculty of Information Technology and Communication Sciences at Tampere University, as Marie Curie Experienced Researcher, where he has been Senior Research Fellow since 2019. Currently, he is leading the Imaging Systems and Displays Team within the 3D Media Group and he is the principal investigator of the Plenoptics Lab at the Centre for Immersive Visual Technologies. Erdem has co-initiated several national and international research projects on plenoptic imaging. His current research interests include development of computational light field and holographic imaging algorithms and methods for next-generation plenoptic cameras and displays.

Robert Bregovic received the M.Sc. degree in electrical engineering from the University of Zagreb, in 1998, and the D.Sc. (Tech) degree in information technology from the Tampere University of Technology, in 2003. He has been with Tampere University (former Tampere University of Technology) since 1998. His research interests include the design and implementation of digital filters and filterbanks, multirate signal processing, and topics related to acquisition, processing/modeling, and visualization of 3D content. He has been Project Manager of four Marie Sklodowska-Curie doctoral networks on various aspects of light field (plenoptic) imaging.

Atanas Gotchev received the M.Sc. degree in radio and television engineering, the M.Sc. degree in applied mathematics, and the Ph.D. degree in telecommunications from the Technical University of Sofia, in 1990, 1992, and 1996, respectively, and the D.Sc. (Tech) degree in information technologies from the Tampere University of Technology in 2003. He is currently a Professor with Tampere University. He has been active in developing the doctoral education on European level in the field of plenoptic imaging, being Coordinator of four Marie Sklodowska-Curie doctoral networks in the field. His recent work concentrates on algorithms for multisensor 3D scene capture, transform-domain light-field reconstruction, and Fourier analysis of 3D displays.