Metadata Description
detector_SnS.mat
- a MATLAB file that contains binary labels (vLabel) for Speech-non-Speech discrimination over time (vTime)
detector_DoA.mat
- a MATLAB file that contains detector outputs for Direction of Arrival as source labels as well as intermediate results from the generalized cross-correlation approach for acoustic source localization.
- SVMresults.sourceLabel:
the binary labels for acoustic objects within a scene over61directions and time.
- SVMresults.sourceConfidence:
the confidence measures for acoustic object localization within a scene over 61 directions and time.
- matGCC:
the output of the generalized cross correlation
- tau:
the labels for all directions as a time-delay
- theta:
the labels for all directions as angles
- time:
the labels for the time axis
detector_TT.mat
- a MATLAB file that contains the detector output of the tracker tree.
- ROIOutput [t,roi,nModel] describes the best found ROI of each manifold-basedtracker, without considering the confidence of this tracker. t is the framenumber, roi is the ROI as
[upperLeftCornerY upperLeftCornerX lowerRightCornerY lowerRightCornerX centerY centerX],
nModel is in the current version between 1 and 7,
- walking/standing
- sitting
- head and torso
- upper body
- lower body
- picking up
- lying on couch
- Please note that the ROI has the size the tracker really uses. For displaying, the width might want to be scaled.
- fzOut [t,val] is the output of the Felzenszwalb detector. t is the framenumberval is the output [upperLeftCornerX upperLeftCornerY lowerLeftCornerX lowerLeftCornerY score]. It is taken from [2] out of the box. If score>0, a person is detected.
- cstOut [t,val] is the output of the camShiftTracker. It has no confidence measure.
val is [upperLeftCornerX upperLeftCornerY width height e1 e2 e3 e4 e5]; e1-5 are used to draw the ellipse, please consult the code.
- maxWeight [t,nModel] is the confidence output of the trackers (the likelihood of the best sample). It should be used in log-scale. Based on this confidence, a tracker is judged active or not.
- All the other outputs are not of importance.
groundtruth_SnS.lab
- a text file that contains the ground truth annotations for the acoustic objects in the scene in HTK format. For each annotated segment, its temporal location within the acoutic scene and a label is available.
groundtruth_TT.txt
- a text file that contains the ground truth annotations for the tracker tree. It consist of a frame-wise annotation of the audio-visual sequence. For each frame, the following information is available:
Name of the frame
binary label for: Person
binary label for: Head
binary label for: Upper Body
binary label for: Lower Body
binary label for: Sitting
binary label for: Stumbling
binary label for: Lying
binary label for: Walking
binary label for: Limping
binary label for: Standing
binary label for: Pick Up
location of body parts: Head (x-direction)
location of body parts: Head (y-direction)
location of body parts: Upper Body (x-direction)
location of body parts: Upper Body (y-direction)
location of body parts: Lower Body (x-direction)
location of body parts: Lower Body (y-direction)
location of body parts: Person (x-direction)
location of body parts: Person (y-direction)