ossible word sequences to be accepted by the system without providing any statistical information. This type of design is suitable for CALL applications in which the possible word combinations and phrases are known in advance and can be easily anticipated (eg, based on user data collected with a system pre-prototype). Because of the a priori constraining function of a grammar network, applications with clearly defined task grammars tend to perform at much higher accuracy rates than the quality of the acoustic recognition would suggest.
E. Decoder
Simply put, the decoder is an algorithm that tries to find the utterance that maximizes the probability that a given sequence of speech sounds corresponds to that utterance. This is a search problem, and especially in large vocabulary systems careful consideration must be given to questions of efficiency and optimization, for example to whether the decoder should pursue only the most likely hypothesis or a number of them in parallel (Young, 1996). An exhaustive search of all possible completions of an utterance might ultimately be more accurate but of questionable value if one has to wait two days to get a result. Trade-offs are therefore necessary to maximize the search results while at the same time minimizing the amount of CPU and recognition time.
В
PERFORMANCE AND DESIGN ISSUES IN SPEECH APPLICATIONS
For educators and developers interested in deploying ASR in CALL applications, perhaps the most important consideration is recognition performance: How good is the technology? Is it ready to be deployed in language learning? These questions cannot be answered except with reference to particular applications of the technology, and therefore touch on a key issue in ASR development: the issue of human-machine interface design.
As we recall, speech recognition performance is always domain specific - a machine can only do what it is programmed to do, and a recognizer with models trained to recognize business news dictation under laboratory conditions will be unable to handle spontaneous conversational speech transmitted over noisy telephone channels. The question that needs to be answered is therefore not simply "How good is ASR technology?" but rather, "What do we want to use it for?" and "How do we get it to perform the task? "
In the following section, we will address the issue of system performance as it relates to a number of successful commercial speech applications. By emphasizing the distinction between recognizer performance on the one hand - understood in terms of "raw" recognition accuracy - and system performance on the other; we suggest how the latter can be optimized within an overall design that takes into account not only the factors that affect recognizer performance as such, but also, and perhaps even more importantly, considerations of human-machine interface design.
Historically, basic speech recognition research has focused almost exclusively on optimizing large vocabulary speaker-independent recognition of continuous dictation. A major impetus for this research has come from US government sponsored competitions held annually by the Defense Advanced Research Projects Agency (DARPA). The main emphasis of these competitions has been on improving the "raw" recognition accuracy - calculated in terms of average omissions, insertions, and substitutions - of large-vocabulary continuous speech recognizers (LVCSRs) in the task of recognizing read sentence material from a number of standard sources (eg, The Wall Street Journal or The New York Times ). The best laboratory systems that participated in the WSJ large-vocabulary continuous dictation task have achieved word error rates as low as 5%, that is, on average, one recognition error in every twenty words (Pallet, 1994). h4 align=center> CURRENT TRENDS IN VOICE-INTERACTIVE CALL
In recent years, an increasing number of speech laboratories have begun deploying speech technology in CALL applications. Results include voice-interactive prototype systems for teaching pronunciation, reading, and limited conversational skills in semi-constrained contexts. Our review of these applications is far from exhaustive. It covers a select number of mostly experimental systems that explore paths we found promising and worth pursuing. We will discuss the range of voice-interactions these systems offer for practicing certain language skills, explain their technical implementation, and comment on the pedagogical value of these implementations. Apart from giving a brief system overview, we report experimental results if available and provide an assessment of how far away the technology is from being deployed in the commercial and educational environments.
Pronunciation Training
A useful and remarkably successful application of speech reco...