Multimodal signal processing is an important research and development field that processes signals and combines information from a variety of modalities – speech, vision, language, text – which significantly enhance the understanding, modelling, and performance of human-computer interaction devices or systems enhancing human-human communication. The overarching theme of this book is the application of signal processing and statistical machine learning techniques to problems arising in this multi-disciplinary field. It describes the capabilities and limitations of current technologies, and discusses the technical challenges that must be overcome to develop efficient and user-friendly multimodal interactive systems. With contributions from the leading experts in the field, the present book should serve as a reference in multimodal signal processing for signal processing researchers, graduate students, R&D engineers, and computer engineers who are interested in this emerging field. Presents state-of-art methods for multimodal signal processing, analysis, and modeling Contains numerous examples of systems with different modalities combined Describes advanced applications in multimodal Human-Computer Interaction (HCI) as well as in computer-based analysis and modelling of multimodal human-human communication scenes.
A comprehensive synthesis of recent advances in multimodal signal processing applications for human interaction analysis and meeting support technology. With directly applicable methods and metrics along with benchmark results, this guide is ideal for those interested in multimodal signal processing, its component disciplines and its application to human interaction analysis.
|Author||: Jean-Philippe Thiran|
|Release Date||: 2009|
|Pages||: 448 pages|
|Author||: Sharon Oviatt,Björn Schuller,Philip Cohen,Daniel Sonntag,Gerasimos Potamianos,Antonio Krüger|
|Publisher||: Morgan & Claypool|
|Release Date||: 2018-10-08|
|ISBN 10||: 1970001690|
|Pages||: 555 pages|
The Handbook of Multimodal-Multisensor Interfaces provides the first authoritative resource on what has become the dominant paradigm for new computer interfaces: user input involving new media (speech, multi-touch, hand and body gestures, facial expressions, writing) embedded in multimodal-multisensor interfaces that often include biosignals. This edited collection is written by international experts and pioneers in the field. It provides a textbook, reference, and technology roadmap for professionals working in this and related areas. This second volume of the handbook begins with multimodal signal processing, architectures, and machine learning. It includes recent deep learning approaches for processing multisensorial and multimodal user data and interaction, as well as context-sensitivity. A further highlight is processing of information about users' states and traits, an exciting emerging capability in next-generation user interfaces. These chapters discuss real-time multimodal analysis of emotion and social signals from various modalities, and perception of affective expression by users. Further chapters discuss multimodal processing of cognitive state using behavioral and physiological signals to detect cognitive load, domain expertise, deception, and depression. This collection of chapters provides walk-through examples of system design and processing, information on tools and practical resources for developing and evaluating new systems, and terminology and tutorial support for mastering this rapidly expanding field. In the final section of this volume, experts exchange views on the timely and controversial challenge topic of multimodal deep learning. The discussion focuses on how multimodal-multisensor interfaces are most likely to advance human performance during the next decade.
tionship indicates how multimodal medical image processing can be unified to a large extent, e. g. multi-channel segmentation and image registration, and extend information theoretic registration to other features than image intensities. The framework is not at all restricted to medical images though and this is illustrated by applying it to multimedia sequences as well. In Chapter 4, the main results from the developments in plastic UIs and mul- modal UIs are brought together using a theoretic and conceptual perspective as a unifying approach. It is aimed at defining models useful to support UI plasticity by relying on multimodality, at introducing and discussing basic principles that can drive the development of such UIs, and at describing some techniques as proof-of-concept of the aforementioned models and principles. In Chapter 4, the authors introduce running examples that serve as illustration throughout the d- cussion of the use of multimodality to support plasticity.
|Author||: Ervin Sejdic,Tiago H. Falk|
|Publisher||: CRC Press|
|Release Date||: 2018-07-04|
|ISBN 10||: 1351061216|
|Pages||: 606 pages|
This will be a comprehensive, multi-contributed reference work that will detail the latest research and developments in biomedical signal processing related to big data medical analysis. It will describe signal processing, machine learning, and parallel computing strategies to revolutionize the world of medical analytics and diagnosis as presented by world class researchers and experts in this important field. The chapters will desribe tools that can be used by biomedical and clinical practitioners as well as industry professionals. It will give signal processing researchers a glimpse into the issues faced with Big Medical Data.
Multimodal Behavioral Analysis in the Wild: Advances and Challenges presents the state-of- the-art in behavioral signal processing using different data modalities, with a special focus on identifying the strengths and limitations of current technologies. The book focuses on audio and video modalities, while also emphasizing emerging modalities, such as accelerometer or proximity data. It covers tasks at different levels of complexity, from low level (speaker detection, sensorimotor links, source separation), through middle level (conversational group detection, addresser and addressee identification), and high level (personality and emotion recognition), providing insights on how to exploit inter-level and intra-level links. This is a valuable resource on the state-of-the- art and future research challenges of multi-modal behavioral analysis in the wild. It is suitable for researchers and graduate students in the fields of computer vision, audio processing, pattern recognition, machine learning and social signal processing. Gives a comprehensive collection of information on the state-of-the-art, limitations, and challenges associated with extracting behavioral cues from real-world scenarios Presents numerous applications on how different behavioral cues have been successfully extracted from different data sources Provides a wide variety of methodologies used to extract behavioral cues from multi-modal data
|Author||: Bee Hock David Koh|
|Release Date||: 2019|
|Pages||: 247 pages|
|Author||: Sina Fateri|
|Release Date||: 2015|
|Pages||: 329 pages|
|Author||: Sharon Oviatt,Philip R. Cohen|
|Publisher||: Morgan & Claypool Publishers|
|Release Date||: 2015-04-01|
|ISBN 10||: 1627057528|
|Pages||: 243 pages|
During the last decade, cell phones with multimodal interfaces based on combined new media have become the dominant computer interface worldwide. Multimodal interfaces support mobility and expand the expressive power of human input to computers. They have shifted the fulcrum of human-computer interaction much closer to the human. This book explains the foundation of human-centered multimodal interaction and interface design, based on the cognitive and neurosciences, as well as the major benefits of multimodal interfaces for human cognition and performance. It describes the data-intensive methodologies used to envision, prototype, and evaluate new multimodal interfaces. From a system development viewpoint, this book outlines major approaches for multimodal signal processing, fusion, architectures, and techniques for robustly interpreting users' meaning. Multimodal interfaces have been commercialized extensively for field and mobile applications during the last decade. Research also is growing rapidly in areas like multimodal data analytics, affect recognition, accessible interfaces, embedded and robotic interfaces, machine learning and new hybrid processing approaches, and similar topics. The expansion of multimodal interfaces is part of the long-term evolution of more expressively powerful input to computers, a trend that will substantially improve support for human cognition and performance.
|Author||: Anna Esposito,Amir Hussain,Maria Marinaro,Raffaele Martone|
|Publisher||: Springer Science & Business Media|
|Release Date||: 2009-02-27|
|ISBN 10||: 3642005241|
|Pages||: 348 pages|
This volume brings together the peer-reviewed contributions of the participants at the COST 2102 and euCognition International Training School on “Multimodal Signals: C- nitive and Algorithmic Issues” held in Vietri sul Mare, Salerno, Italy, April 22 –26, 2008. The school was sponsored by COST (European Cooperation in the Field of Scientific and Technical Research, www.cost.esf.org) in the domain of Information and Communi- tion Technologies (ICT) for disseminating the advances of the research activities developed within Action 2102: “Cross-Modal Analysis of Verbal and Nonverbal Communication” (www.cost.esf.org/domains_actions/ict/Actions/Verbal_and_Non- verbal _Communication) and by euCognition: The European Network for Advancement of Artificial Cognitive Systems (www.euCognition.org). COST Action 2102, in its second year of life, brought together about 60 European and 6 overseas scientific laboratories whose aim is to develop interactive dialogue systems and intelligent virtual avatars graphically embodied in a 2D and/or 3D int- active virtual world, able to interact intelligently with the environment, other avatars, and particularly with human users. The main theme of the school was to investigate the mathematical and psycholo- cal tools for modelling human–machine interaction through access to a graded series of tasks for measuring the amount of adjustment (as well as intelligence and achie- ment) needed for introducing new concepts in the information communication te- nology domain in order to develop adaptive, socially enabled and human-centered automatic systems able to serve remote applications in medicine, learning, care, re- bilitation, and for accessibility to work, employment, and information.
This volume of original papers has been assembled to honor the achievements of Professor Thomas S Huang in the area of image processing and image analysis. Professor Huang's life of inquiry has spanned a number of decades as his work on imaging problems began in 1960's. Over these 40 years, he has made many fundamental and pioneering contributions to nearly every area of this field. Professor Huang has received numerous Awards, including the prestigious Jack Kilby Signal Processing Medal from IEEE. He has been elected to the National Academy of Engineering, and named Fellow of IEEE, Fellow of OSA, Fellow of IAPR, and Fellow of SPIE. Professor Huang has made fundamental contributions to image processing, pattern recognition, and computer vision: including design and stability test of multidimensional digital filters, digital holography; compression techniques for documents and images; 3D motion and modeling, analysis and visualization of the human face, hand and body, multi-modal human-computer interfaces; and multimedia databases. Many of his research ideas have been seminal, opening up new areas of research. Professor Huang is continuing his contribution to the field in the new millennium This book is intended to highlight his contributions by showing the breadth of areas in which his students are working. As such, contributed chapters were written by some of his many former graduate students (some with Professor Huang as a coauthor) and illustrate not only his contributions to imaging science but also his commitment to educational endeavor. The breadth of contributions is an indication of influence of Professor Huang to the field of signal processing, image processing, computer vision and applications; the book includes chapters on learning in image retrieval, facial motion analysis, cloud motion tracking, wavelet coding, robust video transmission, and many other topics. The Appendix contains several reprints of Professor Huang's most influential papers from 1970's to 1990's. This book is directed towards image processing researchers, including academic faculty, graduate students and industry researchers, as well as toward professionals working in application areas.