Acoustic Object Detection

For the speech/non‐speech discrimination, recordings from locations Oldenburg, Portland and Zurich have been evaluated. The evaluation results in detail can be found in the table below. From the Oldenburg data set, 74,4% of 1928 frames have been detected correctly.

Oldenburg data set:
1295
 383
109
141
 

From the Portland data set, 76,2% of 15312 frames have been detected correctly.

Portland data set:
7478
 2559
1073 4202
 

From the Zurich data set, 80,8% of 3240 frames have been detected correctly.


Zurich data set:
1327
 591
28
1294
 

For the joined data set, 76,8% of all 20480 frames have been detected correctly.


Oldenburg data set:
10100
3533
1210
5637
 

 

 

Audio‐visual Recordings True Positive True Negative False Positive False Negative
23_01Apr2010_HDH_office



1_HDH_office_bird_a 80 12 32 6
1_HDH_office_bird_b 66 12 34 12
1_HDH_office_bird_c 97 5 18 8
1_HDH_office_bird_d 90 9 28 15
1_HDH_office_birdnoise_a 88 6 39 21
1_HDH_office_tel_a 70 26 27 3
1_HDH_office_tel_b 73 5 41 9
1_HDH_office_tel_c 89 8 30 3
1_HDH_office_tel_d 100 12 50 6
1_HDH_office_tel_f 85 14 40 5
1_HDH_office_telnoise_a 104 16 32 14
Sum 942 125 371 102
27_12Jul2010_OHSU_watch_tv



take1 46 165 144 2
take2 98 177 119 9
take3 89 219 92 12
26_01Jul2010_OHSU_fall_picking_up_object



take1 74 76 25 17
take2 78 67 25 11
21_06Mar2010_OHSU_walk_walk_with_oov_walk



scenario9_take3_20100303 226 109 21 73
scenario9_take4_20100303 246 136 37 37
20_05Mar2010_OHSU_walk_lay_on_couch_walk



scenario8_take4_20100303 57 177 47 14
scenario8_take3_20100303 40 190 72 10
19_05Mar2010_OHSU_walk_pick_something_up_walk



scenario7_take3_20100303 54 170 16 10
scenario7_take4_20100303 55 145 18 23
trimmed



s39t10 30 20 45 0
s39t3 10 136 39 1
s39t5 33 22 33 5
s39t6 35 22 44 1
s39t7 38 13 46 1
s39t8 54 21 30 3
s39t9 36 50 38 1
s40t10 22 79 49 12
s40t5 2 59 52 3
s40t7 3 64 52 1
s40t9 22 62 82 11
s41t1 101 16 28 25
s41t10 63 38 27 7
s41t2 94 16 11 17
s41t3 93 29 14 19
s41t4 78 38 19 9
s41t5 52 44 27 16
s41t6 56 36 31 13
s41t7 60 31 29 12
s41t8 58 31 31 10
s41t9 64 22 36 12
s42t1 41 49 45 26
s42t10 98 35 20 9
s42t3 60 30 37 17
s42t5 76 54 17 17
s42t6 77 59 35 9
s42t7 95 57 23 7
s42t8 79 53 30 3
s42t9 100 11 13 1
s43t1 314 20 23 32
s43t10 366 14 1 52
s43t2 252 20 23 9
s43t3 227 9 10 25
s43t5 123 27 17 8
s43t6 243 35 3 32
s43t7 246 43 16 34
s43t8 227 18 9 26
s45t1 18 43 58 8
s45t10 41 50 34 7
s45t2 8 47 66 1
s45t3 21 61 52 2
s45t4 22 61 62 3
s45t5 18 49 46 4
s45t6 23 44 42 3
s45t7 29 35 34 13
s45t8 38 28 31 8
s45t9 18 51 43 1
s46t1 182 21 29 12
s46t10 128 43 11 13
s46t2 232 28 18 33
s46t3 412 3 7 46
s46t4 460 11 7 45
s46t5 223 6 4 32
s46t6 353 10 1 66
s46t7 92 28 25 5
s46t8 138 25 11 12
s46t9 129 29 10 13
s48t1 15 107 35 1
s48t10 27 49 34 13
s48t3 17 62 29 5
s48t4 24 52 35 4
s48t5 27 60 35 4
s48t7 27 68 35 2
s48t8 29 53 26 5
s48t9 36 64 38 8
Sum 7478 4202 2559 1073
22_17Mar2010_Zurich_living_lab



scene_01_woman_telephone_take_c 71 69 22 0
scene_01_woman_telephone_take_d 72 64 27 1
scene_01_woman_telephone_light_take_a 78 45 22 1
scene_01_woman_telephone_light_take_b 81 71 17 1
scene_02_knocking_light_take_c 12 39 17 0
scene_02_knocking_take_a 14 37 12 1
scene_02_knocking_take_b 6 32 14 0
scene_03_enter_room_fab_dan_light_take_a 37 32 27 0
scene_03_enter_room_fab_dan_light_take_b 46 48 18 0
scene_03_enter_room_fab_dan_take_c 42 38 28 0
scene_03_enter_room_fab_dan_take_d 37 43 32 0
scene_03_enter_room_fab_dan_take_e 43 37 34 0
scene_04_dan_radio_active_take_a 62 34 7 1
scene_04_dan_radio_active_take_b 71 40 4 1
scene_04_dan_radio_active_light_take_c 82 53 5 0
scene_04_dan_radio_active_light_take_d 87 45 4 2
scene_05_dan_radio_remote_take_d 40 45 1 0
scene_05_dan_radio_remote_light_take_a 48 41 9 4
scene_05_dan_radio_remote_light_take_b 40 40 4 4
scene_07_fab_standup_talkshimself_light_take_d 45 50 14 1
scene_07_fab_standup_talkshimself_light_take_e 44 48 15 1
scene_07_fab_standup_talkshimself_take_b 48 54 14 0
scene_07_fab_standup_talkshimself_take_c 48 52 17 1
scene_16_fab_oov_couch_take_e 31 33 12 4
scene_16_fab_oov_couch_take_f 29 28 18 3
scene_16_fab_oov_couch_light_take_c 28 29 23 2
scene_20_woman_hits_limping_speech_light_take_f 10 24 32 0
scene_20_woman_hits_limping_speech_take_d 10 31 23 0
scene_20_woman_hits_limping_speech_take_e 13 23 32 0
scene_20_fab_hits_limping_speech_light_take_a 18 20 32 0
scene_20_fab_hits_limping_speech_light_take_b 18 26 30 0
scene_20_fab_hits_limping_speech_take_c 16 23 25 0
Sum 1327 1294 591 28