Multimodal human behavior analysis in the wild: Recent advances and open problems
The automated analysis of human behavior in unstructured scenarios has many potential applications in health care, conflict and people management, sociology, marketing and surveillance. It is therefore unsurprising that many researchers invested efforts into developing computational approaches able to automatically describe the behavior of a group or an individual. Generally speaking, the extraction of high-level information (e.g. emotional states, personality traits) is unfeasible if the low- and mid-level feature retrieving methods used are not robust and accurate. In this tutorial we will describe recent research efforts in the area of unstructured social scene analysis. Special emphasis will be given to approaches combining signal processing and machine learning to robustly extract crucial low and middle level information. We will consider tasks such as: (i) head and body pose estimation from multiple sensors, (ii) free-standing conversational group detection, (iii) audio-visual speaker detection, (iv) separation of moving sound sources and (v) estimation of physiological signals from visual data. An overall description of the extent of the current research, its limitations and promising future work lines will conclude the session.