Speech Recognition and Synthesis

Speech Recognition is a branch of Artificial Intelligence that enables spoken communication between human and computer, but there are some difficulties in the attempt of getting a more or less acceptable interpretation of the message, because the coopertation between information from different sources (such as the acoustic, phonetic, semantic or pragmatic) is ambiguous and some mistakes are unavoidable in the process.

Nearly all the Speech synthesizers use libraries of speech sound. The creation of this dicctionarie is important, because it is important to recognize the word user uses. To make the recognition easier, here is a recognition of vowels and recognition of consonants, and also a noise masking (some movile phones, for example, can work when we “talk” to them, and if we are on the street, there must be something that makes the sound clear). But even if the system has these advantages, mistakes may not be avoided. Most speech recognition algorithms rely only on the sound of the individual words, and not on their context, so they don’t understand speech, but recognize words. Here is an example of what could happen:

The child wore a spider ring on Halloween.

He was an American spy during the war.

The sound of “spider ring” and “spy during” is exactly the same. We hear the correct words depending on the context, and is something that we do unconsciously.

There are many ways of application of this system, but I think that the fact people with disabilities benefit from it is the most interesting. Some of them are unable to use their hands, others are deaf and use deaf telephony (voicemail to text, realy services or captioned telephone), and others have learning disabilities. There’s no doubt that our life will be easier in some years’ time when this systems get better.


