resenting queries about graphical displays of maps and charts. Students infer the right answers to a set of multiple-choice questions and produce spoken responses.
A more recent prototype currently under development in SRI is the Voice Interactive Language Training System (VILTS), a system designed to foster speaking and listening skills for beginning through advanced L2 learners of French (Egan, 1996; Neumeyer et al., 1996; Rypa, 1996). The system incorporates authentic , Unscripted conversational materials collected from French speakers into an engaging, flexible, and user-centered lesson architecture. The system deploys speech recognition to guide students through the lessons and automatic pronunciation scoring to provide feedback on the fluency of student responses. As far as we know, only the pronunciation scoring aspect of the system has been validated in experimental trials (Neumeyer et al., 1996). p> In pedagogically more sophisticated systems, the query-response mode is highly contextualized and presented as part of a simulated conversation with a virtual interlocutor. To stimulate student interest, closed response queries are often presented in the form of games or goal-driven tasks. One commercial system that exploits the full potential of this design is TraciTalk (Courseware Publishing International, Inc., Cupertino, CA), a voice-driven multimedia CALL system aimed at more advanced ESL learners. In a series of loosely connected scenarios, the system engages students in solving a mystery. Prior to each scenario, students are given a task (eg, eliciting a certain type of information), and they accomplish this task by verbally interacting with characters on the screen. Each voice interaction offers several possible responses, and each spoken response moves the conversation in a slightly different direction. There are many paths through each scenario, and not every path yields the desired information. This motivates students to return to the beginning of the scene and try out a different interrogation strategy. Moreover, TraciTalk features an agent that students can ask for assistance and accepts spoken commands for navigating the system. Apart from being more fun and interesting, games and task-oriented programs implicitly provide positive feedback by giving students the feeling of having solved a problem solely by communicating in the target language.
The speech recognition technology underlying closed response query implementations is very simple, even in the more sophisticated systems. For any given interaction, the task perplexity is low and the vocabulary size is comparatively small. As a result, these systems tend to be very robust. Recognition accuracy rates in the low to upper 90% range can be expected depending on task definition, vocabulary size, and the degree of non-native disfluency.
FUTURE TRENDS IN VOICE-INTERACTIVE CALL
In the previous sections, we reviewed the current state of speech technology, discussed some of the factors affecting recognition performance, and introduced a number of research prototypes that illustrate the range of speech-enabled CALL applications that are currently technically and pedagogically feasible. With the exception of a few exploratory open response dialog systems, most of these systems are designed to teach and evaluate linguistic form (pronunciation, fluency, vocabulary study, or grammatical structure). This is no coincidence. Formal features can be clearly identified and integrated into a focused task design. This means that robust performance can be expected. Furthermore, mastering linguistic form remains an important component of L2 instruction, despite the emphasis on communication (Holland, 1995). Prolonged, focused practice of a large number of items is still considered an effective means of expanding and reinforcing linguistic competence (Waters, 1994). However, such practice is time consuming. CALL can automate these aspects of language training, thereby freeing up valuable class time that would otherwise be spent on drills.
While such systems are an important step in the right direction, other more complex and ambitious applications are conceivable and no doubt desirable. Imagine a student being able to access the Internet, find the language of his or her choice, and tap into a comprehensive voice-interactive multimedia language program that would provide the equivalent of an entire first year of college instruction. The computer would evaluate the student's proficiency level and design a course of study tailored to his or her needs. Or think of using the same Internet resources and a set of high-level authoring tools to put together a series of virtual encounters surrounding the task of finding an apartment in Berlin. As a minimum, one would hope that natural speech input capacity becomes a routine feature of any CALL application.
To m...