Session 15
Scene 1:
>>scenes<<
- The scenes address normal as well as abnormal human movement and behavior mainly from a vision point of view. Therefore, a “person lying on the ground” is defined to be an abnormal event, whereas “lying on a sofa” is defined as a normal event. This will be combined with acoustic events, namely knocking, from different positions. In particular, “knocking on a table” while lying on a sofa is defined as an abnormal event, whereas “knocking sounds coming from the door” while lying on a sofa is defined as a normal event.
Scene 2:
>>scenes<<
- Incongruent event detection is performed using a combination of audio-visual localizers. In particular, “speech coming from a single person” in a room – which is detected both by an acoustic localization scheme and a person detector from the visual modality – is defined to be a normal event, whereas a discrepancy between the detector-outputs of different modalities, i.e. “the localization does not match”, is defined to be an incongruent event. The latter case is addressed by adding a non-visible sound source to the scene (“speech from outside the field of view”, “speech from behind the door”). The source could be an object or a person, respectively.
Scene 3
>>scenes<<
- Scene 3 describes a situation where a “person is looking for her/his ringing phone” in a living lab scenario. The phone is not visible in the scene, but can be detected using acoustic information. Therefore, video localizers for the searching person as well as specialized acoustic event detectors and localizers are applied to the audio-visual recordings. Restrictions and requirements on the cell-phone ring tones are given in Section 4.3.1.