Technical Setup

Recording Platform

  • A self-constructed recording platform was used.
  • The "AWEAR II" called system consists of two cameras and four microphones and can be powered by rechargeable batteries or line current.
  • It can be controlled via a wifi-connected netbook.

>>Detailed information<<


  • Portable Network Graphics (PNG) format is used for visual data.
  • Audio data are stored as wav-files with 48kHz sampling rate and 32 bit resolution.
  • Optionally, the audio-visual scene is rendered as a preview video using one or both camera signals and the front stereo microfon set.

>>Detailed information<<

Metadata description

  • the metadata, i.e. the detector output and the ground truth annotations, are available as a zip-file for each audio-visual sequence . The name of the sequence contains a 5bit key to illustrate the presence of the metadata in the zip-file.
    • 1st bit: detector_SnS.mat
    • 2nd bit: detector_DoA.mat
    • 3rd bit: detector_TT.mat
    • 4th bit: groundtruth_SnS.lab
    • 5th bit: groundtruth_TT.txt

>>Detailed information<<

Ground Truth Audio

  • Ground truth annotations were generated for the database of audiovisual recordings.
  • This includes labels for speech-non-speech discrimination, acoustic object detection as well as ground truth labels for acoustic object localization.
  • A semi-supervised tools was developed for ground truth annotation of the data in an efficient way. This tool provides output interfaces for MATLAB, EXCEL and the HTK Speech Recognition framework

>>Detailed information<<

Ground Truth Video

  • Ground truth annotations were generated for the database of audiovisual recordings.
  • This includes binary labels for human movement analysis and interpretation as well as absolute positions of body parts within a video frame in pixels.
  • A semi-supervised tools was developed for ground truth annotation of the data in an efficient way. This tool provides a *.txt output interfaces for easy import and further processing steps

>>Detailed information<<


  • Each audio-visual recording contains a label file which includes information about the name of the recording and the scene, the date, the location, the used device, frame rate, a placeholder for comments, as well as a detailed description of the scene.
  • All the labels have been generated manually, either by hand or by using custom made semi-supervised tools.

>>Detailed information<<