Sie sind hier: DIRAC. Databases. Audio-Visual Data. Technical Setup. Metadata Description

Metadata Description

detector_SnS.mat

a MATLAB file that contains binary labels (vLabel) for Speech-non-Speech discrimination over time (vTime)

detector_DoA.mat

a MATLAB file that contains detector outputs for Direction of Arrival as source labels as well as intermediate results from the generalized cross-correlation approach for acoustic source localization.

SVMresults.sourceLabel:
the binary labels for acoustic objects within a scene over61directions and time.

SVMresults.sourceConfidence:
the confidence measures for acoustic object localization within a scene over 61 directions and time.

matGCC:
the output of the generalized cross correlation

tau:
the labels for all directions as a time-delay

theta:
the labels for all directions as angles

time:
the labels for the time axis

detector_TT.mat

a MATLAB file that contains the detector output of the tracker tree.

ROIOutput [t,roi,nModel] describes the best found ROI of each manifold-basedtracker, without considering the confidence of this tracker. t is the framenumber, roi is the ROI as
[upperLeftCornerY upperLeftCornerX lowerRightCornerY lowerRightCornerX centerY centerX],
nModel is in the current version between 1 and 7,

walking/standing

sitting

head and torso

upper body

lower body

picking up

lying on couch

Please note that the ROI has the size the tracker really uses. For displaying, the width might want to be scaled.

fzOut [t,val] is the output of the Felzenszwalb detector. t is the framenumberval is the output [upperLeftCornerX upperLeftCornerY lowerLeftCornerX lowerLeftCornerY score]. It is taken from [2] out of the box. If score>0, a person is detected.

cstOut [t,val] is the output of the camShiftTracker. It has no confidence measure.
val is [upperLeftCornerX upperLeftCornerY width height e1 e2 e3 e4 e5]; e1-5 are used to draw the ellipse, please consult the code.

maxWeight [t,nModel] is the confidence output of the trackers (the likelihood of the best sample). It should be used in log-scale. Based on this confidence, a tracker is judged active or not.
All the other outputs are not of importance.

groundtruth_SnS.lab

a text file that contains the ground truth annotations for the acoustic objects in the scene in HTK format. For each annotated segment, its temporal location within the acoutic scene and a label is available.

groundtruth_TT.txt

a text file that contains the ground truth annotations for the tracker tree. It consist of a frame-wise annotation of the audio-visual sequence. For each frame, the following information is available:

Name of the frame

binary label for: Person
binary label for: Head
binary label for: Upper Body
binary label for: Lower Body
binary label for: Sitting
binary label for: Stumbling
binary label for: Lying
binary label for: Walking
binary label for: Limping
binary label for: Standing
binary label for: Pick Up
location of body parts: Head (x-direction)
location of body parts: Head (y-direction)
location of body parts: Upper Body (x-direction)
location of body parts: Upper Body (y-direction)
location of body parts: Lower Body (x-direction)
location of body parts: Lower Body (y-direction)
location of body parts: Person (x-direction)
location of body parts: Person (y-direction)

Skip-links

Navigation Menue

Metadata Description

detector_SnS.mat

detector_DoA.mat

detector_TT.mat

groundtruth_SnS.lab

groundtruth_TT.txt