Marknadens största urval
Snabb leverans

Böcker i Synthesis Lectures on Image, Video, and Multimedia Processing-serien

Filter
Filter
Sortera efterSortera Serieföljd
  • av Nasser Kehtarnavaz & Arian Azarang
    376,-

  • av Liqiang Nie
    796,-

    Micro-videos, a new form of user-generated contents, have been spreading widely across various social platforms, such as Vine, Kuaishou, and Tik Tok. Different from traditional long videos, micro-videos are usually recorded by smart mobile devices at any place within a few seconds. Due to its brevity and low bandwidth cost, micro-videos are gaining increasing user enthusiasm. The blossoming of micro-videos opens the door to the possibility of many promising applications, ranging from network content caching to online advertising. Thus, it is highly desirable to develop an effective scheme for the high-order micro-video understanding.Micro-video understanding is, however, non-trivial due to the following challenges:(1) how to represent micro-videos that only convey one or few high-level themes or concepts; (2) how to utilize the hierarchical structure of the venue categories to guide the micro-video analysis; (3) how to alleviate the influence of low-quality caused by complex surrounding environments and the camera shake; (4) how to model the multimodal sequential data, {i.e.}, textual, acoustic, visual, and social modalities, to enhance the micro-video understanding; and (5) how to construct large-scale benchmark datasets for the analysis? These challenges have been largely unexplored to date. In this book, we focus on addressing the challenges presented above by proposing some state-of-the-art multimodal learning theories. To demonstrate the effectiveness of these models, we apply them to three practical tasks of micro-video understanding: popularity prediction, venue category estimation, and micro-video routing. Particularly, we first build three large-scale real-world micro-video datasets for these practical tasks. We then present a multimodal transductive learning framework for micro-video popularity prediction. Furthermore, we introduce several multimodal cooperative learning approaches and a multimodal transfer learning scheme for micro-video venue category estimation. Meanwhile, we develop a multimodal sequential learning approach for micro-video recommendation. Finally, we conclude the book and figure out the future research directions in multimodal learning toward micro-video understanding.

  • av Sudipta Mukhopadhyay
    446,-

    Every year lives and properties are lost in road accidents. About one-fourth of these accidents are due to low vision in foggy weather. At present, there is no algorithm that is specifically designed for the removal of fog from videos. Application of a single-image fog removal algorithm over each video frame is a time-consuming and costly affair. It is demonstrated that with the intelligent use of temporal redundancy, fog removal algorithms designed for a single image can be extended to the real-time video application. Results confirm that the presented framework used for the extension of the fog removal algorithms for images to videos can reduce the complexity to a great extent with no loss of perceptual quality. This paves the way for the real-life application of the video fog removal algorithm. In order to remove fog, an efficient fog removal algorithm using anisotropic diffusion is developed. The presented fog removal algorithm uses new dark channel assumption and anisotropic diffusion for the initialization and refinement of the airlight map, respectively. Use of anisotropic diffusion helps to estimate the better airlight map estimation. The said fog removal algorithm requires a single image captured by uncalibrated camera system. The anisotropic diffusion-based fog removal algorithm can be applied in both RGB and HSI color space. This book shows that the use of HSI color space reduces the complexity further. The said fog removal algorithm requires pre- and post-processing steps for the better restoration of the foggy image. These pre- and post-processing steps have either data-driven or constant parameters that avoid the user intervention. Presented fog removal algorithm is independent of the intensity of the fog, thus even in the case of the heavy fog presented algorithm performs well. Qualitative and quantitative results confirm that the presented fog removal algorithm outperformed previous algorithms in terms of perceptual quality, color fidelity and execution time. The work presented in this book can find wide application in entertainment industries, transportation, tracking and consumer electronics.

  • av Sudipta Mukhopadhyay
    636,-

    Current vision systems are designed to perform in normal weather condition. However, no one can escape from severe weather conditions. Bad weather reduces scene contrast and visibility, which results in degradation in the performance of various computer vision algorithms such as object tracking, segmentation and recognition. Thus, current vision systems must include some mechanisms that enable them to perform up to the mark in bad weather conditions such as rain and fog. Rain causes the spatial and temporal intensity variations in images or video frames. These intensity changes are due to the random distribution and high velocities of the raindrops. Fog causes low contrast and whiteness in the image and leads to a shift in the color. This book has studied rain and fog from the perspective of vision. The book has two main goals: 1) removal of rain from videos captured by a moving and static camera, 2) removal of the fog from images and videos captured by a moving single uncalibrated camera system. The book begins with a literature survey. Pros and cons of the selected prior art algorithms are described, and a general framework for the development of an efficient rain removal algorithm is explored. Temporal and spatiotemporal properties of rain pixels are analyzed and using these properties, two rain removal algorithms for the videos captured by a static camera are developed. For the removal of rain, temporal and spatiotemporal algorithms require fewer numbers of consecutive frames which reduces buffer size and delay. These algorithms do not assume the shape, size and velocity of raindrops which make it robust to different rain conditions (i.e., heavy rain, light rain and moderate rain). In a practical situation, there is no ground truth available for rain video. Thus, no reference quality metric is very useful in measuring the efficacy of the rain removal algorithms. Temporal variance and spatiotemporal variance are presented in this book as no reference quality metrics. An efficient rain removal algorithm using meteorological properties of rain is developed. The relation among the orientation of the raindrops, wind velocity and terminal velocity is established. This relation is used in the estimation of shape-based features of the raindrop. Meteorological property-based features helped to discriminate the rain and non-rain pixels. Most of the prior art algorithms are designed for the videos captured by a static camera. The use of global motion compensation with all rain removal algorithms designed for videos captured by static camera results in better accuracy for videos captured by moving camera. Qualitative and quantitative results confirm that probabilistic temporal, spatiotemporal and meteorological algorithms outperformed other prior art algorithms in terms of the perceptual quality, buffer size, execution delay and system cost. The work presented in this book can find wide application in entertainment industries, transportation, tracking and consumer electronics. Table of Contents: Acknowledgments / Introduction / Analysis of Rain / Dataset and Performance Metrics / Important Rain Detection Algorithms / Probabilistic Approach for Detection and Removal of Rain / Impact of Camera Motion on Detection of Rain / Meteorological Approach for Detection and Removal of Rain from Videos / Conclusion and Scope of Future Work / Bibliography / Authors' Biographies

  • av Jayaraman J. Thiagarajan
    526,-

    Image understanding has been playing an increasingly crucial role in several inverse problems and computer vision. Sparse models form an important component in image understanding, since they emulate the activity of neural receptors in the primary visual cortex of the human brain. Sparse methods have been utilized in several learning problems because of their ability to provide parsimonious, interpretable, and efficient models. Exploiting the sparsity of natural signals has led to advances in several application areas including image compression, denoising, inpainting, compressed sensing, blind source separation, super-resolution, and classification. The primary goal of this book is to present the theory and algorithmic considerations in using sparse models for image understanding and computer vision applications. To this end, algorithms for obtaining sparse representations and their performance guarantees are discussed in the initial chapters. Furthermore, approaches for designing overcomplete, data-adapted dictionaries to model natural images are described. The development of theory behind dictionary learning involves exploring its connection to unsupervised clustering and analyzing its generalization characteristics using principles from statistical learning theory. An exciting application area that has benefited extensively from the theory of sparse representations is compressed sensing of image and video data. Theory and algorithms pertinent to measurement design, recovery, and model-based compressed sensing are presented. The paradigm of sparse models, when suitably integrated with powerful machine learning frameworks, can lead to advances in computer vision applications such as object recognition, clustering, segmentation, and activity recognition. Frameworks that enhance the performance of sparse models in such applications by imposing constraints based on the prior discriminatory information and the underlying geometrical structure, and kernelizing the sparse coding and dictionary learning methods are presented. In addition to presenting theoretical fundamentals in sparse learning, this book provides a platform for interested readers to explore the vastly growing application domains of sparse representations.

  • av William Pearlman
    480,-

    This book explains the stages necessary to create a wavelet compression system for images and describes state-of-the-art systems used in image compression standards and current research. It starts with a high level discussion of the properties of the wavelet transform, especially the decomposition into multi-resolution subbands. It continues with an exposition of the null-zone, uniform quantization used in most subband coding systems and the optimal allocation of bitrate to the different subbands. Then the image compression systems of the FBI Fingerprint Compression Standard and the JPEG2000 Standard are described in detail. Following that, the set partitioning coders SPECK and SPIHT, and EZW are explained in detail and compared via a fictitious wavelet transform in actions and number of bits coded in a single pass in the top bit plane. The presentation teaches that, besides producing efficient compression, these coding systems, except for the FBI Standard, are capable of writing bit streams that have attributes of rate scalability, resolution scalability, and random access decoding. Many diagrams and tables accompany the text to aid understanding. The book is generous in pointing out references and resources to help the reader who wishes to expand his knowledge, know the origins of the methods, or find resources for running the various algorithms or building his own coding system. Table of Contents: Introduction / Characteristics of the Wavelet Transform / Generic Wavelet-based Coding Systems / The FBI Fingerprint Image Compression Standard / Set Partition Embedded Block (SPECK) Coding / Tree-based Wavelet Transform Coding Systems / Rate Control for Embedded Block Coders / Conclusion

  • av Nasser Kehtarnavaz
    526,-

    This book presents an overview of the guidelines and strategies for transitioning an image or video processing algorithm from a research environment into a real-time constrained environment. Such guidelines and strategies are scattered in the literature of various disciplines including image processing, computer engineering, and software engineering, and thus have not previously appeared in one place. By bringing these strategies into one place, the book is intended to serve the greater community of researchers, practicing engineers, industrial professionals, who are interested in taking an image or video processing algorithm from a research environment to an actual real-time implementation on a resource constrained hardware platform. These strategies consist of algorithm simplifications, hardware architectures, and software methods. Throughout the book, carefully selected representative examples from the literature are presented to illustrate the discussed concepts. After reading the book, the readers are exposed to a wide variety of techniques and tools, which they can then employ to design a real-time image or video processing system.

  • av Nilanjan Ray & Scott Acton
    530,-

  • av Eric Dubois
    530,-

    This lecture describes the author's approach to the representation of color spaces and their use for color image processing. The lecture starts with a precise formulation of the space of physical stimuli (light). The model includes both continuous spectra and monochromatic spectra in the form of Dirac deltas. The spectral densities are considered to be functions of a continuous wavelength variable. This leads into the formulation of color space as a three-dimensional vector space, with all the associated structure. The approach is to start with the axioms of color matching for normal human viewers, often called Grassmann's laws, and developing the resulting vector space formulation. However, once the essential defining element of this vector space is identified, it can be extended to other color spaces, perhaps for different creatures and devices, and dimensions other than three. The CIE spaces are presented as main examples of color spaces. Many properties of the color space are examined. Once the vector space formulation is established, various useful decompositions of the space can be established. The first such decomposition is based on luminance, a measure of the relative brightness of a color. This leads to a direct-sum decomposition of color space where a two-dimensional subspace identifies the chromatic attribute, and a third coordinate provides the luminance. A different decomposition involving a projective space of chromaticity classes is then presented. Finally, it is shown how the three types of color deficiencies present in some groups of humans leads to a direct-sum decomposition of three one-dimensional subspaces that are associated with the three types of cone photoreceptors in the human retina. Next, a few specific linear and nonlinear color representations are presented. The color spaces of two digital cameras are also described. Then the issue of transformations between different color spaces is addressed. Finally, these ideas are applied to signal and system theory for color images. This is done using a vector signal approach where a general linear system is represented by a three-by-three system matrix. The formulation is applied to both continuous and discrete space images, and specific problems in color filter array sampling and displays are presented for illustration. The book is mainly targeted to researchers and graduate students in fields of signal processing related to any aspect of color imaging.

Gör som tusentals andra bokälskare

Prenumerera på vårt nyhetsbrev för att få fantastiska erbjudanden och inspiration för din nästa läsning.