Out-of-language (OOL) detection
These recordings were internally done using volunteers for the purpose of Out-of-language (OOL) detection by BUT http://speech.fit.vutbr.cz/.
Available data sets:
- * 0/
At the Odysee 2010 conference featuring multiple speakers and multiple languages. It was supposed to serve as a reference test for the LID, showing, that is somewhat works!
Duration 1284s
- * 1/
Spanish Speaker + Interview Partner (English) mixed with some Spanish
Duration 598s
- * 2/
Talk between German and Czech Speaker (English) mixed with some Czech
Duration 381s
- * 3/
Talk between two German Speakers (English) mixed with some German
Duration 775s
- * 4/
Israelian Speaker + Interview Partner (English) mixed with some Hebrew
Duration 567s
Format of the reference segmentation:
- s,e nr_label (*.seg files) e.g. 50.739567,10.1479 5_IL where
- s,e are start and end times of the segment in seconds
- nr is the segment number
- label is one of the following reference labels:
- *IL - in language, only English speech
- *ool - up to 50% of non-English speech, e.g. a few words in a foreign language
- *OOL - from 50% up to 100% non-English speech
All data sets are included in the following >>package<< for download!