Out‐Of‐Vocabulary (OOV) word detection
For demonstrational purposes and internal testing of the OOV word detection BUT collected a set of recordings of OOV words and non-speech events. This data collection include
- 16 recordings, 7 speakers, 2 female, 5 male
- ogg-compressed 8kHz audio comparable to quality of conversational telephone speech (CTS) data
- ASR transcripts obtained by BUT CTS recognizer
- strong and weak phone posteriors
- scores from the NN-based OOV detection
- ground truth OOV labels of words in the recognition output
- the pronunciation dictionary used in recognition
The data sets are prepared to be used by the:
and can be downloaded in separate files from the following links