Теми рефератів
> Реферати > Курсові роботи > Звіти з практики > Курсові проекти > Питання та відповіді > Ессе > Доклади > Учбові матеріали > Контрольні роботи > Методички > Лекції > Твори > Підручники > Статті Контакти
Реферати, твори, дипломи, практика » Курсовые проекты » Modern technologies in teaching FLT

Реферат Modern technologies in teaching FLT





Analysis

The first step in automatic speech recognition consists of analyzing the incoming speech signal. When a person speaks into an ASR device - usually through a high quality noise-canceling microphone - the computer samples the analog input into a series of 16 - or 8-bit values ​​at a particular sampling frequency (ranging from 8 to 22KHz). These values ​​are grouped together in predetermined overlapping temporal intervals called "frames." These numbers provide a precise description of the speech signal's amplitude. In a second step, a number of acoustically relevant parameters such as energy, spectral features, and pitch information, are extracted from the speech signal (For a visual representation of some of these parameters, see Figure 2 on page 53). During training, this information is used to model that particular portion of the speech signal. During recognition, this information is matched against the pre-existing model of the signal.

B. Phone Models

Training a machine to recognize spoken language amounts to modeling the basic sounds of speech (phones). Automatic speech recognition strings together these models to form words. Recognizing an incoming speech signal involves matching the observed acoustic sequence with a set of HMM models. An HMM can model either phones or other sub-word units or it can model words or even whole sentences. Phones are either modeled as individual sounds - so-called monophones - or as phone combinations that model several phones and the transitions between them (biphones or triphones). After comparing the incoming acoustic signal with the HMMs representing the sounds of language, the system computes a hypothesis based on the sequence of models that most closely resembles the incoming signal. The HMM model for each linguistic unit (phone or word) contains a probabilistic representation of all the possible pronunciations for that unit - just as the model of the handwritten cursive b would have many different representations. Building HMMs - a process called training - requires a large amount of speech data of the type the system is expected to recognize. Large-vocabulary speaker-independent continuous dictation systems are typically trained on tens of thousands of read utterances by a cross-section of the population, including members of different dialect regions and age-groups. As a general rule, an automatic speech recognizer cannot correctly process speech that differs in kind from the speech it has been trained on. This is why most commercial dictation systems, when trained on standard American English, perform poorly when encountering accented speech, whether by non-native speakers or by speakers of different dialects. We will return to this point in our discussion of voice-interactive CALL applications.

C. Lexicon

The lexicon, or dictionary, contains the phonetic spelling for all the words that are expected to be observed by the recognizer. It serves as a reference for converting the phone sequence determined by the search algorithm into a word. It must be carefully designed to cover the entire lexical domain in which the system is expected to perform. If the recognizer encounters a word it does not "know" (ie, a word not defined in the lexicon), it will either choose the closest match or return an out-of-vocabulary recognition error. Whether a recognition error is registered as a misrecognition or an out-of-vocabulary error depends in part on the vocabulary size. If, for example, the vocabulary is too small for an unrestricted dictation task - let's say less than 3K - the out-of-vocabulary errors are likely to be very high. If the vocabulary is too large, the chance of misrecognition errors increases because with more similar-sounding words, the confusability increases. The vocabulary size in most commercial dictation systems tends to vary between 5K and 60K.

D. The Language Model

The language model predicts the most likely continuation of an utterance on the basis of statistical information about the frequency in which word sequences occur on average in the language to be recognized. For example, the word sequence A bare attacked him will have a very low probability in any language model based on standard English usage, whereas the sequence A bear attacked him will have a higher probability of occurring. Thus the language model helps constrain the recognition hypothesis produced on the basis of the acoustic decoding just as the context helps decipher an unintelligible word in a handwritten note. Like the HMMs, an efficient language model must be trained on large amounts of data, in this case texts collected from the target domain.

In ASR applications with constrained lexical domain and/or simple task definition, the language model consists of a grammatical network that defines the p...


Назад | сторінка 3 з 11 | Наступна сторінка





Схожі реферати:

  • Реферат на тему: Methods of teaching speech
  • Реферат на тему: Оцінка акцій методом САРМ (Capital assets price model)
  • Реферат на тему: Дослідження клітинних циклів моделі Тайсона в програмі Model Vision Studium
  • Реферат на тему: Моделювання в пакеті Model Vision Studium коливань матеріальної точки в пол ...
  • Реферат на тему: The motivation as training to foreign language