3. Characteristics of the main tasks solved in the construction of speech interface. Dynamic range of beeps.

Lecture



  3. Characteristics of the main tasks solved in the construction of speech interface.  Dynamic range of beeps.

The speech interface is a hardware-software complex, since its implementation requires the use of external (additional) hardware in relation to the computer system (microphone and audio output means, for example, headphones, as well as sound

pay). This circumstance imposes additional requirements not only at the level of developing computer programs that provide a computer with the ability to speak and hear, but also at the hardware level, on which depends, in particular, the quality of sound reproduction, speed of information processing, etc.

The main classes of speech interface tasks include:

• speech synthesis - this task includes a complex of subtasks and consists in providing the possibility of pronouncing speech by a computer based on an arbitrary spelling text;

• speech analysis and recognition - a set of tasks, including recording, digitizing and analyzing speech to recognize a received speech message by a computer system;

• understanding (interpretation) of speech is a complex of tasks related to the analysis of the meaning of speech messages and the formation of the reaction (response) of a computer system. Often this task

is a subtask of the speech recognition task;

• voice recognition - a set of tasks, including analyzing the characteristics of the speaker's voice in order to identify any of its individual (personal) characteristics and qualities. This set of tasks is also called speech verification and identification;

• computer cloning of voice and diction [4] (Lobanov B.M.2002st-Comp_K_P_G) is the creation of a close copy, not biological, but computer, and not of the whole being (in this case, a human being), but only one of his intelligent features: reading arbitrary

spelling text. This task is to preserve

personal acoustic features of the voice, phonetic features of pronunciation and

accent, as well as prosodic (intonational) individuality of speech (melody, rhythmics,

dynamics).

In addition to the above tasks, which are part of the task group for developing the actual speech

interface, it should also be noted that there are a number of auxiliary tasks, the solution of which

the scientific and technical teams developing speech systems are engaged. This is due to the fact

that the task of implementing a speech interface has not yet been completely solved. There are many more

questions that many research teams are looking for both in our country and abroad. TO

Such tasks, in particular, include the following:

• study of the features of the phonetic structure of speech of various natural languages; the study of the features of the intonation coloring of the speech of various languages;

• identification of sets of parameters for the description of speech, used both for speech synthesis and for its recognition;

• development of new speech synthesis methods;

• study of the differences in the speech of different announcers and, in particular, male and female voices;

• development of new speech recognition methods;

• search for optimal ways of transmitting speech through communication channels;

• development of special noise-canceling microphones;

• development of special equipment for the study of speech characteristics;

• development of new methods for digitizing and optimal compression of the speech signal;

• development of special sound cards focused on speech synthesis and analysis;

• formation of databases with “samples” of speech of various speakers in order to increase the naturalness of the sound of synthesized speech;

• study of the structure of the human speech tract and features of the formation of speech sounds;

• study of the structure of the human ear;

• research of features of human speech perception;

• search for ways to optimally use the speech interface in various technical and consumer systems and the development of appropriate technologies, etc.

Dynamic range of beeps

A person hears sound in an extremely wide range of sound pressure. This range extends from the absolute threshold of hearing to a pain threshold of 140 dB SPL relative to the zero level, for which the pressure of 0.00002 Pa is assumed (Fig. 1.). The risk zone in this figure denotes the area of ​​sound pressure, which, with prolonged exposure, can lead to complete hearing loss. The pain threshold for tonal sounds depends on the frequency, for sounds with an arbitrary spectrum, a pressure level of 120 dB SPL is taken as the pain threshold. The graph of the absolute threshold of hearing is accurately described by empirical equality.
  3. Characteristics of the main tasks solved in the construction of speech interface.  Dynamic range of beeps.

  3. Characteristics of the main tasks solved in the construction of speech interface.  Dynamic range of beeps.

In silence, the sensitivity of hearing increases, and in the atmosphere of loud sounds decreases, the hearing adapts to the surrounding sound environment, so the dynamic range of hearing is not so large - about 70..80 dB. From above it is limited by pressure of 100 dB SPL, and from below by noise with a level of -30 ... 35 dB SPL. This dynamic range can move up and down by up to 20 dB. For comfortable perception of music, it is recommended that the sound pressure does not exceed 104 dB SPL at home and 112 dB SPL in specially equipped rooms. The dynamic range of music is determined by the ratio in decibels of the loudest sound (fortissimo) and the quietest sound (pianissimo). The dynamic range of symphonic music is 65 ... 75 dB, and it increases at rock concerts

  3. Characteristics of the main tasks solved in the construction of speech interface.  Dynamic range of beeps.

  3. Characteristics of the main tasks solved in the construction of speech interface.  Dynamic range of beeps.

up to 105 dB, with the peaks of sound pressure can reach 122 ... 130 dB SPL. The dynamic range of vocal performers does not exceed 35 ... 45 dB (table 1). The dynamic range of music essentially depends on the choice of the maximum sound pressure Pmax, since it is limited from below by an absolute threshold of audibility. This dependence is most pronounced at the edges of the sound range. In fig. 2 shows examples of changing the dynamic range of tonal sounds. Depending on the choice of Pmax and the frequency of tonal sounds, the dynamic range of the sound 80 dB decreases at the edges of the sound range up to 40 - 50 dB. That is why it is customary to measure the dynamic range of sounds at a frequency of 1 kHz, at which it can reach 117 dB. The noise of the room masks the sound and thereby reduces its dynamic range of music from below. In Fig.3. It is shown how when reducing the sound pressure from 120 to 80 dB SPL, the dynamic range of music due to room noise decreases from 90 to 50 dB.

The effect of noise can be completely neglected only if its level is 10 ... 20 dB SPL below the minimum level of musical sounds. In recording studios, the noise level does not exceed 20 ... 30 dB SPL, at night in apartments of "quiet" houses this level is 40 dB SPL, any conversation increases the noise level to 60 dB SPL. That is why quiet music often drowns in the noises of the listening room and involuntarily there is a desire to increase the volume. The quantization noise, which is white noise, is noticeable by hearing when its intensity is only 4 dB SPL, even when the total noise of audio equipment in the room reaches 50 dB SPL. These figures must be compared with the fact that the full scale FS digital level meter corresponds to a level between 105 and 112 dB SPL. Therefore, for domestic premises, the dynamic range of music should not exceed 101 - 108 dB.

  3. Characteristics of the main tasks solved in the construction of speech interface.  Dynamic range of beeps.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Methods and means of computer information technology

Terms: Methods and means of computer information technology