Metadata Description

detector_SnS.mat

  • a MATLAB file that contains binary labels (vLabel) for Speech-non-Speech discrimination over time (vTime)

detector_DoA.mat

  • a MATLAB file that contains detector outputs for Direction of Arrival as source labels as well as intermediate results from the generalized cross-correlation approach for acoustic source localization.
    • SVMresults.sourceLabel:
      the binary labels for acoustic objects within a scene over61directions and time.
    • SVMresults.sourceConfidence:
      the confidence measures for acoustic object localization within a scene over 61 directions and time.
    • matGCC:
      the output of the generalized cross correlation
    • tau:
      the labels for all directions as a time-delay
    • theta:
      the labels for all directions as angles
    • time:
      the labels for the time axis

detector_TT.mat

  • a MATLAB file that contains the detector output of the tracker tree.
  • ROIOutput [t,roi,nModel] describes the best found ROI of each manifold-basedtracker, without considering the confidence of this tracker. t is the framenumber, roi is the ROI as
    [upperLeftCornerY upperLeftCornerX lowerRightCornerY  lowerRightCornerX centerY centerX],
    nModel is in the current version between 1 and 7,
  1.  walking/standing
  2.  sitting
  3.  head and torso
  4.  upper body
  5.  lower body
  6.  picking up
  7.  lying on couch
  • Please note that the ROI has the size the tracker really uses. For displaying, the width might want to be scaled.
  • fzOut [t,val] is the output of the Felzenszwalb detector. t is the framenumberval is the output [upperLeftCornerX upperLeftCornerY lowerLeftCornerX lowerLeftCornerY score]. It is taken from [2] out of the box. If score>0, a person is detected.
  • cstOut [t,val] is the output of the camShiftTracker. It has no confidence measure.
    val is [upperLeftCornerX upperLeftCornerY width height e1 e2 e3 e4 e5]; e1-5 are used to draw the ellipse, please consult the code.
  • maxWeight [t,nModel] is the confidence output of the trackers (the likelihood of the best sample). It should be used in log-scale. Based on this confidence, a tracker is judged active or not.
  • All the other outputs are not of importance.

groundtruth_SnS.lab

  • a text file that contains the ground truth annotations for the acoustic objects in the scene in HTK format. For each annotated segment, its temporal location within the acoutic scene and a label is available.

groundtruth_TT.txt

  • a text file that contains the ground truth annotations for the tracker tree. It consist of a frame-wise annotation of the audio-visual sequence. For each frame, the following information is available:
Name of the frame

binary label for: Person
binary label for: Head
binary label for: Upper Body
binary label for: Lower Body
binary label for: Sitting
binary label for: Stumbling
binary label for: Lying
binary label for: Walking
binary label for: Limping
binary label for: Standing
binary label for: Pick Up
location of body parts: Head (x-direction)
location of body parts: Head (y-direction)
location of body parts: Upper Body (x-direction)
location of body parts: Upper Body (y-direction)
location of body parts: Lower Body (x-direction)
location of body parts: Lower Body (y-direction)
location of body parts: Person (x-direction)
location of body parts: Person (y-direction)