|
|
|
Oddities
Optimizing Automatic Speech Recognition for Low-Proficient Non-Native Speakers
Department of Language and Speech, Radboud University, 6500 HD Nijmegen, The Netherlands
Abstract
Computer-Assisted Language Learning (CALL) applications for improving the oral skills of low-proficient learners have to cope with non-native speech that is particularly challenging. Since unconstrained non-native ASR is still problematic, a possible solution is to elicit constrained responses from the learners. In this paper, we describe experiments aimed at selecting utterances from lists of responses. The first experiment on utterance selection indicates that the decoding process can be improved by optimizing the language model and the acoustic models, thus reducing the utterance error rate from 29–26% to 10–8%. Since giving feedback on incorrectly recognized utterances is confusing, we verify the correctness of the utterance before providing feedback. The results of the second experiment on utterance verification indicate that combining duration-related features with a likelihood ratio (LR) yield an equal error rate (EER) of 10.3%, which is significantly better than the EER for the other measures in isolation.
|
|
|
Carnegie Mellon Computer Vision Systems Decipher Outdoor Scenes
A new method devised by researchers at the Pittsburgh-based university enables computers to better understand an image by reasoning about the physical constraints of the scene.
By Robotics Trends Staff - Filed Sep 12, 2010
Computer vision systems can struggle to make sense of a single image, but a new method devised by computer scientists at Carnegie Mellon University enables computers to gain a deeper understanding of an image by reasoning about the physical constraints of the scene.
In much the same way that a child might use a set of toy building blocks to assemble something that looks like a building depicted on the cover of the toy set, the computer would analyze an outdoor scene by using virtual blocks to build a three-dimensional approximation of the image that makes sense based on volume and mass.
|
|
Read more...
|
|
|
|
Multimodal interaction
Multimodal interaction provides the user with multiple modes of interfacing with a system.
Multimodal input
Two major groups of multimodal interfaces have merged. The first group of interfaces combined various user input modes beyond the traditional keyboard and mouse input/output, such as speech, pen, touch, manual gestures, gaze and head and body movements. The most common such interface combines a visual modality (e.g. a display, keyboard, and mouse) with a voice modality (speech recognition for input, speech synthesis and recorded audio for output). However other modalities, such as pen-based input or haptic input/output may be used. Multimodal user interfaces are a research area in human-computer interaction (HCI).
|
|
Read more...
|
|
|
<< Start < Prev 1 2 3 Next > End >>
|
|
Page 1 of 3 |