Reverse Mastering: Is It Possible to Increase the Dynamic Range of Compressed Recordings? Speech synthesis and recognition. Modern solutions

Youtube

Compression is one of the most myth-ridden topics in sound production. They say that Beethoven even scared the neighbor's children with her:(

Okay, in fact, using compression is no more difficult than using distortion, the main thing is to understand the principle of its operation and have good control. This is what we will see together now.

What is audio compression

The first thing to understand before preparation is compression. working with the dynamic range of sound. And, in turn, is nothing more than the difference between the loudest and quietest signal levels:

So, compression is compression of the dynamic range. Yes, Just dynamic range compression, or in other words lowering the level of loud parts of the signal and increasing the volume of quiet parts. No more.

You may quite reasonably wonder why such hype is connected then? Why does everyone talk about recipes for correct compressor settings, but no one shares them? Why, despite the huge number of cool plugins, do many studios still use expensive, rare models of compressors? Why do some producers use compressors at extreme settings, while others do not use them at all? And which one of them is right in the end?

Problems solved by compression

The answers to such questions lie in the plane of understanding the role of compression in working with sound. And it allows:

Emphasize the attack sound, making it more pronounced;
“Setting” individual parts of instruments into the mix, adding power and “weight” to them;
Make groups of instruments or an entire mix more cohesive, such a single monolith;
Resolve conflicts between tools using sidechain;
Correct the mistakes of the vocalist or musicians, leveling their dynamics;
With a certain setting act as an artistic effect.

As you can see, this is no less significant creative process than, say, coming up with melodies or creating interesting timbres. Moreover, any of the above problems can be solved using 4 main parameters.

Basic parameters of the compressor

Despite the huge number of software and hardware models of compressors, all the “magic” of compression occurs when the main parameters are correctly configured: Threshold, Ratio, Attack and Release. Let's look at them in more detail:

Threshold or response threshold, dB

This parameter allows you to set the value from which the compressor will work (that is, compress the audio signal). So, if we set the threshold to -12dB, the compressor will only work in those parts of the dynamic range that exceed this value. If all our sound is quieter than -12db, the compressor will simply pass it through without affecting it in any way.

Ratio or compression ratio

The ratio parameter determines how much a signal exceeding the threshold will be compressed. A little math to complete the picture: let's say we set up a compressor with a threshold of -12dB, ratio 2:1 and fed it a drum loop in which the volume of the kick drum is -4dB. What will be the result of the compressor operation in this case?

In our case, the kick level exceeds the threshold by 8dB. This difference according to the ratio will be compressed to 4dB (8dB / 2). Combined with the unprocessed part of the signal, this will lead to the fact that after processing by a compressor, the volume of the kick drum will be -8db (threshold -12dB + compressed signal 4dB).

Attack, ms

This is the time after which the compressor will respond to exceeding the response threshold. That is, if the attack time is above 0ms - the compressor begins compression exceeding the threshold signal not immediately, but after a specified time.

Release or recovery, ms

The opposite of an attack - the value of this parameter allows you to specify how long after the signal level returns below the threshold the compressor will stop compressing.

Before we move on, I strongly recommend taking a well-known sample, placing any compressor on its channel and experimenting with the above parameters for 5-10 minutes to securely fix the material

All other parameters are optional. They can differ between different compressor models, which is partly why producers use different models for specific purposes (for example, one compressor for vocals, another for a drum group, a third for the master channel). I will not dwell on these parameters in detail, but will only give general information to understand what it is all about:

Knee or kink (Hard/Soft Knee). This parameter determines how quickly the compression ratio (ratio) will be applied: hard along a curve or smoothly. I note that in the Soft Knee mode the compressor does not operate linearly, but begins to smoothly (as far as this may be appropriate when we are talking about milliseconds) compress the sound already before the threshold value. To process groups of channels and the overall mix, soft knee is often used (as it works unnoticed), and to emphasize the attack and other features of individual instruments, hard knee is used;
Response Mode: Peak/RMS. The Peak mode is justified when you need to strictly limit amplitude bursts, as well as on signals with a complex shape, the dynamics and readability of which need to be fully conveyed. The RMS mode is very gentle on the sound, allowing you to thicken it while maintaining attack;
Foresight (Lookahead). This is the time during which the compressor will know what is coming to it. A kind of preliminary analysis of incoming signals;
Makeup or Gain. A parameter that allows you to compensate for the decrease in volume as a result of compression.

First and the most important advice, which eliminates all further questions about compression: if you a) understand the principle of compression, b) firmly know how this or that parameter affects the sound, and c) managed to try several different models in practice - you don't need any advice anymore.

I'm absolutely serious. If you carefully read this post, experimented with the standard compressor of your DAW and one or two plug-ins, but still did not understand in what cases you need to set large attack values, what ratio to use and in which mode to process the source signal - then you will continue to search the Internet for ready-made recipes, applying them thoughtlessly anywhere.

Compressor fine tuning recipes it's kind of like recipes for fine-tuning a reverb or chorus - it makes no sense and has nothing to do with creativity. Therefore, I persistently repeat the only correct recipe: arm yourself with this article, good monitor headphones, a plug-in for visual control of the waveform, and spend the evening in the company of a couple of compressors.

Take action!

At a time when researchers were just beginning to solve the problem of creating a speech interface for computers, they often had to make their own equipment that would allow audio information to be input into the computer and also output it from the computer. Today, such devices may only be of historical interest, since modern computers can easily be equipped with audio input and output devices, such as sound adapters, microphones, headphones and speakers.

We will not delve into the details of the internal structure of these devices, but we will talk about how they work and provide some recommendations for choosing audio computer devices for working with speech recognition and synthesis systems.

As we already said in the previous chapter, sound is nothing more than air vibrations, the frequency of which lies in the range of frequencies perceived by humans. The exact boundaries of the audible frequency range may vary from person to person, but sound vibrations are believed to lie in the range of 16-20,000 Hz.

The purpose of a microphone is to convert sound vibrations into electrical vibrations, which can then be amplified, filtered to remove interference, and digitized for input of audio information into a computer.

Based on their operating principle, the most common microphones are divided into carbon, electrodynamic, condenser and electret. Some of these microphones require an external current source for their operation (for example, carbon and condenser), others, under the influence of sound vibrations, are capable of independently generating alternating electrical voltage (these are electrodynamic and electret microphones).

You can also separate the microphones according to their purpose. There are studio microphones that can be held in your hand or mounted on a stand, there are radio microphones that can be clipped to clothing, and so on.

There are also microphones designed specifically for computers. Such microphones are usually mounted on a stand placed on the surface of a table. Computer microphones can be combined with headphones, as shown in Fig. 2-1.

Rice. 2-1. Headphones with microphone

How can you choose from the variety of microphones that are best suited for speech recognition systems?

In principle, you can experiment with any microphone you have, as long as it can be connected to your computer's audio adapter. However, developers of speech recognition systems recommend purchasing a microphone that, during operation, will be at a constant distance from the speaker’s mouth.

If the distance between the microphone and the mouth does not change, then the average level of the electrical signal coming from the microphone will not change too much either. This will have a positive impact on the performance of modern speech recognition systems.

What's the problem?

A person is able to successfully recognize speech, the volume of which varies over a very wide range. The human brain is able to filter out quiet speech from interference, such as the noise of cars passing on the street, outside conversations and music.

As for modern speech recognition systems, their abilities in this area leave much to be desired. If the microphone is on a table, then when you turn your head or change your body position, the distance between your mouth and the microphone will change. This will change the microphone output level, which in turn will reduce the reliability of speech recognition.

Therefore, when working with speech recognition systems, the best results will be achieved if you use a microphone attached to headphones, as shown in Fig. 2-1. When using such a microphone, the distance between the mouth and the microphone will be constant.

We also draw your attention to the fact that all experiments with speech recognition systems are best carried out in privacy in a quiet room. In this case, the influence of interference will be minimal. Of course, if you need to select a speech recognition system that can operate in conditions of strong interference, then the tests need to be conducted differently. However, as far as the authors of the book know, the noise immunity of speech recognition systems is still very, very low.

The microphone converts sound vibrations into electrical current vibrations for us. These fluctuations can be seen on the oscilloscope screen, but do not rush to the store to purchase this expensive device. We can carry out all oscillographic studies using a regular computer equipped with a sound adapter, for example, a Sound Blaster adapter. Later we will tell you how to do this.

In Fig. 2-2 we showed an oscillogram of a sound signal obtained when pronouncing a long sound a. This waveform was obtained using the GoldWave program, which we will talk about later in this chapter of the book, as well as using a Sound Blaster audio adapter and a microphone similar to that shown in Fig. 2-1.

Rice. 2-2. Audio signal oscillogram

The GoldWave program allows you to stretch the oscillogram along the time axis, which allows you to see the smallest details. In Fig. 2-3 we showed a stretched fragment of the above-mentioned oscillogram of sound a.

Rice. 2-3. Fragment of an oscillogram of an audio signal

Please note that the magnitude of the input signal coming from the microphone changes periodically and takes on both positive and negative values.

If there was only one frequency present in the input signal (that is, if the sound was “clean”), the waveform received from the microphone would be a sine wave. However, as we have already said, the spectrum of human speech sounds consists of a set of frequencies, as a result of which the shape of the oscillogram of the speech signal is far from sinusoidal.

We will call a signal whose magnitude changes continuously over time analog signal. This is exactly the signal that comes from the microphone. Unlike an analog signal, a digital signal is a set of numerical values that change discretely over time.

In order for a computer to process an audio signal, it must be converted from analogue to digital form, that is, presented as a set of numerical values. This process is called analog signal digitization.

Digitization of an audio (and any analog) signal is performed using a special device called analog-to-digital converter ADC (Analog to Digital Converter, ADC). This device is located on the sound adapter board and is a regular-looking microcircuit.

How does an analog-to-digital converter work?

It periodically measures the level of the input signal and outputs a numerical value of the measurement result. This process is illustrated in Fig. 2-4. Here, gray rectangles indicate input signal values measured at some constant time interval. A set of such values is a digitized representation of the input analog signal.

Rice. 2-4. Measurements of signal amplitude versus time

In Fig. 2-5 we showed connecting an analog-to-digital converter to a microphone. In this case, an analog signal is supplied to input x 1, and a digital signal is removed from outputs u 1 -u n.

Rice. 2-5. Analog-to-digital converter

Analog-to-digital converters are characterized by two important parameters - the conversion frequency and the number of quantization levels of the input signal. Correct selection of these parameters is critical to achieving adequate digital representation of the analog signal.

How often do you need to measure the amplitude of the input analog signal so that information about changes in the input analog signal is not lost as a result of digitization?

It would seem that the answer is simple - the input signal needs to be measured as often as possible. Indeed, the more often an analog-to-digital converter makes such measurements, the better it will be able to track the slightest changes in the amplitude of the input analog signal.

However, excessively frequent measurements can lead to an unjustified increase in the flow of digital data and a waste of computer resources when processing the signal.

Fortunately, choosing the right conversion frequency (sampling frequency) is quite simple. To do this, it is enough to turn to Kotelnikov’s theorem, known to specialists in the field of digital signal processing. The theorem states that the conversion frequency must be twice the maximum frequency of the spectrum of the converted signal. Therefore, to digitize without losing the quality of an audio signal whose frequency lies in the range of 16-20,000 Hz, you need to select a conversion frequency no less than 40,000 Hz.

Note, however, that in professional audio equipment the conversion frequency is selected several times higher than the specified value. This is done to achieve very high quality digitized audio. This quality is not relevant for speech recognition systems, so we will not focus your attention on this choice.

What conversion frequency is needed to digitize the sound of human speech?

Since the sounds of human speech lie in the frequency range of 300-4000 Hz, the minimum required conversion frequency is 8000 Hz. However, many computer speech recognition programs use the standard conversion frequency of 44,000 Hz for conventional audio adapters. On the one hand, such a conversion frequency does not lead to an excessive increase in the digital data flow, and on the other hand, it ensures speech digitization with sufficient quality.

Back in school, we were taught that with any measurements errors arise, which cannot be completely eliminated. Such errors arise due to the limited resolution of measuring instruments, as well as due to the fact that the measurement process itself can introduce some changes into the measured value.

An analog-to-digital converter represents the input analog signal as a stream of numbers of limited capacity. Conventional audio adapters contain 16-bit ADC blocks capable of representing the amplitude of the input signal as 216 = 65536 different values. ADC devices in high-end audio equipment can be 20-bit, providing greater accuracy in representing the amplitude of the audio signal.

Modern speech recognition systems and programs were created for ordinary computers equipped with ordinary sound adapters. Therefore, to conduct experiments with speech recognition, you do not need to purchase a professional audio adapter. An adapter such as Sound Blaster is quite suitable for digitizing speech for the purpose of its further recognition.

Along with the useful signal, various noises usually enter the microphone - noise from the street, wind noise, extraneous conversations, etc. Noise has a negative impact on the performance of speech recognition systems, so it has to be dealt with. We have already mentioned one of the ways - today's speech recognition systems are best used in a quiet room, alone with the computer.

However, it is not always possible to create ideal conditions, so it is necessary to use special methods to get rid of interference. To reduce the noise level, special tricks are used when designing microphones and special filters that remove frequencies from the spectrum of the analog signal that do not carry useful information. In addition, a technique such as compression of the dynamic range of input signal levels is used.

Let's talk about all this in order.

Frequency filter is a device that converts the frequency spectrum of an analog signal. In this case, during the conversion process, vibrations of certain frequencies are released (or absorbed).

You can imagine this device as a kind of black box with one input and one output. In relation to our situation, a microphone will be connected to the input of the frequency filter, and an analog-to-digital converter will be connected to the output.

There are different frequency filters:

· low pass filters;

high pass filters;

· transmitting bandpass filters;

· band-stop filters.

Low Pass Filters(low-pass filter) remove from the spectrum of the input signal all frequencies whose values are below a certain threshold frequency, depending on the filter setting.

Since audio signals lie in the range of 16-20,000 Hz, all frequencies less than 16 Hz can be cut off without degrading the sound quality. For speech recognition, the frequency range of 300-4000 Hz is important, so frequencies below 300 Hz can be cut out. In this case, all interference whose frequency spectrum lies below 300 Hz will be cut out from the input signal, and they will not interfere with the speech recognition process.

Likewise, high pass filters(high-pass filter) cut out from the spectrum of the input signal all frequencies above a certain threshold frequency.

Humans cannot hear sounds with a frequency of 20,000 Hz and higher, so they can be cut out of the spectrum without noticeable deterioration in sound quality. As for speech recognition, here you can cut out all frequencies above 4000 Hz, which will lead to a significant reduction in the level of high-frequency interference.

Band pass filter(band -pass filter) can be thought of as a combination of a low-pass and high-pass filter. Such a filter delays all frequencies below the so-called lower pass frequency, and also above upper pass frequency.

Thus, a passband filter is convenient for a speech recognition system, which delays all frequencies except frequencies in the range of 300-4000 Hz.

As for band-stop filters, they allow you to cut out all frequencies lying in a given range from the spectrum of the input signal. Such a filter is convenient, for example, for suppressing interference that occupies a certain continuous part of the signal spectrum.

In Fig. 2-6 we showed the connection of a pass bandpass filter.

Rice. 2-6. Filtering the audio signal before digitizing

It must be said that conventional sound adapters installed in a computer include a bandpass filter through which the analog signal passes before digitization. The passband of such a filter usually corresponds to the range of audio signals, namely 16-20,000 Hz (in different audio adapters, the values of the upper and lower frequencies may vary within small limits).

How to achieve a narrower bandwidth of 300-4000 Hz, corresponding to the most informative part of the spectrum of human speech?

Of course, if you have a penchant for designing electronic equipment, you can make your own filter from an operational amplifier chip, resistors and capacitors. This is roughly what the first creators of speech recognition systems did.

However, industrial speech recognition systems must be operational on standard computer hardware, so the route of making a special bandpass filter is not suitable here.

Instead, modern speech processing systems use so-called digital frequency filters, implemented in software. This became possible after the computer's central processor became powerful enough.

A digital frequency filter, implemented in software, converts an input digital signal into an output digital signal. During the conversion process, the program processes in a special way the stream of numerical values of the signal amplitude coming from the analog-to-digital converter. The result of the transformation will also be a stream of numbers, but this stream will correspond to an already filtered signal.

While talking about the analog-to-digital converter, we noted such an important characteristic as the number of quantization levels. If a 16-bit analog-to-digital converter is installed in the sound adapter, then after digitization the audio signal levels can be represented as 216 = 65536 different values.

If there are few quantization levels, then the so-called quantization noise. To reduce this noise, high-quality audio digitization systems should use analog-to-digital converters with the maximum number of quantization levels available.

However, there is another technique to reduce the impact of quantization noise on the quality of the audio signal, which is used in digital audio recording systems. When using this technique, the signal is passed through a nonlinear amplifier before digitization, emphasizing signals with low signal amplitude. This device amplifies weak signals more than strong ones.

This is illustrated by the graph of the output signal amplitude versus the input signal amplitude shown in Fig. 2-7.

Rice. 2-7. Nonlinear amplification before digitization

In the step of converting digitized audio back to analog (we'll look at this step later in this chapter), the analog signal is again passed through a nonlinear amplifier before being output to the speakers. This time, a different amplifier is used, which emphasizes high-amplitude signals and has a transfer characteristic (the dependence of the amplitude of the output signal on the amplitude of the input signal) inverse to that used during digitization.

How can all this help the creators of speech recognition systems?

A person, as is known, recognizes speech spoken in a quiet whisper or in a fairly loud voice quite well. We can say that the dynamic range of loudness levels of successfully recognized speech for a person is quite wide.

Today's computer speech recognition systems, unfortunately, cannot yet boast of this. However, in order to slightly expand the specified dynamic range, before digitizing, you can pass the signal from the microphone through a nonlinear amplifier, the transfer characteristic of which is shown in Fig. 2-7. This will reduce the quantization noise level when digitizing weak signals.

Developers of speech recognition systems, again, are forced to focus primarily on commercially produced sound adapters. They do not provide for the nonlinear signal conversion described above.

However, it is possible to create the software equivalent of a nonlinear amplifier that converts the digitized signal before passing it on to the speech recognition module. Although such a software amplifier will not be able to reduce quantization noise, it can be used to emphasize those signal levels that carry the most speech information. For example, you can reduce the amplitude of weak signals, thus ridding the signal of noise.

People who are passionate about home audio exhibit an interesting paradox. They are ready to shovel the listening room, build speakers with exotic drivers, but shyly retreat in front of the canned music, like a wolf in front of a red flag. But actually, why can’t you stand up for the flag and try to cook something more edible from canned food?

From time to time, plaintive questions arise on the forum: “Recommend well-recorded albums.” This is understandable. Special audiophile publications, although they delight the ear for the first minute, no one listens to them until the end, the repertoire is too dull. As for the rest of the music library, the problem seems obvious. You can save, or you can not save and pour a ton of money into components. Still, few people like to listen to their favorite music at high volumes and the capabilities of the amplifier have nothing to do with it.

Today, even in Hi-Res albums, the peaks of the soundtrack are cut off and the volume is driven into clipping. It is believed that the majority listens to music on all kinds of junk, and therefore it is necessary to “step on the gas”, to make a kind of loud compensation.

Of course, this is not done on purpose to upset audiophiles. Few people remember them at all. Perhaps they thought of giving them the master files from which the main circulation is copied - CDs, MP3s, etc. Of course, the master has long been flattened by a compressor; no one will deliberately prepare special versions for HD Tracks. Unless a certain procedure is performed for vinyl media, which for this reason sounds more humane. And for the digital route, it all ends the same way - with a big fat compressor.

So, currently, 100% of published phonograms, minus classical music, are subject to compression during mastering. Some perform this procedure more or less skillfully, while others perform it completely stupidly. As a result, we have pilgrims on the forums with a line of DR plugins in their bosoms, painful comparisons of editions, a flight to vinyl, where we also need to mine first presses.

The most frostbitten at the sight of all these outrages literally turned into audio Satanists. No joke, they are reading the sound engineer's holy scripture backwards! Modern audio editing programs have some tools for restoring a clipped sound wave.

Initially, this functionality was intended for studios. When mixing, there are situations when clipping is included in the recording, and for a number of reasons it is no longer possible to redo the session, and here the arsenal of an audio editor comes to the rescue - a declipper, a decompressor, etc.

And now ordinary listeners, whose ears are bleeding after the next new product, are increasingly reaching out to such software more and more boldly. Some people prefer iZotope, others Adobe Audition, others divide operations between several programs. The point of restoring the previous dynamics is to programmatically correct the clipped signal peaks, which, resting at 0 dB, resemble a gear.

Yes, there is no talk of a 100% revival of the source code, since interpolation processes take place using rather speculative algorithms. But still, some of the processing results seemed interesting to me and worthy of study.

For example, Lana Del Rey’s album “Lust For Life”, consistently swearing, ugh, swearing! The original song “When the World Was at War We Kept Dancing” was like this.

And after a series of declippers and decompressors it became like this. The DR coefficient has changed from 5 to 9. You can download and listen to the sample before and after processing.

I can’t say that the method is universal and suitable for all ruined albums, but in this case I chose to keep this version in the collection, processed by a root tracker activist, instead of the official 24-bit edition.

Even if artificially extracting peaks from the sound stuff does not return the true dynamics of a musical performance, your DAC will still thank you. After all, it was so difficult for him to work without errors at extreme levels, where there is a high probability of so-called inter-sample peaks (ISP) occurring. And now only rare flashes of the signal will jump to 0 dB. In addition, a silent soundtrack when compressed into FLAC or another lossless codec will now be smaller in size. More “air” in the signal saves hard drive space.

Try to revive your most hated albums that were killed in the “loudness war.” To reserve dynamics, you first need to lower the track level by -6 dB, and then run the declipper. Those who don't trust computers can simply plug a studio expander between the CD player and the amplifier. This device essentially does the same thing - it restores and stretches the peaks of a dynamically compressed audio signal as best it can. Similar devices from the 80-90s are not very expensive, and it will be very interesting to try them as an experiment.

The DBX 3BX dynamic range controller processes the signal separately in three bands - LF, MF and HF

Once upon a time, equalizers were a taken-for-granted component of an audio system, and no one was afraid of them. Today there is no need to level out the high-frequency roll of a magnetic tape, but something needs to be done about the ugly dynamics, brothers.

Encoding technology used in DVD players with their own

sound decoders and receivers. Dynamic range compression (or reduction) is used to limit audio peaks when watching movies. If the viewer wants to watch a film in which sudden changes in volume level are possible (a film about a war,

for example), but does not want to disturb his family members, then DRC mode should be turned on. Subjectively, by ear, after turning on DRC, the proportion of low frequencies in the sound decreases and high sounds lose transparency, so you shouldn’t turn on the DRC mode unless necessary.

DreamWeaver (See – FrontPage)

A visual editor for hypertext documents developed by the software company Macromedia Inc. The powerful, professional DreamWeaver program contains the ability to generate HTML pages of any complexity and scale, and also has built-in support for large network projects. It is a visual design tool that supports advanced WYSIWYG concepts.

Driver (See Driver)

A software component that allows you to interact with devices

computer, such as a network interface card (NIC), keyboard, printer, or monitor. Network equipment (such as a hub) connected to a PC requires drivers in order for the PC to communicate with the equipment.

DRM (Digital Rights Management - Managing access and copying of copyrighted information, Digital Rights Management)

u A concept that involves the use of special technologies and methods for protecting digital materials to ensure that they are provided only to authorized users.

v A client program for interacting with Digital Rights Management Services, which is designed to control access to and copying of copyright-protected information. DRM Services runs on Windows Server 2003. The client software will run on Windows 98, Me, 2000 and XP, allowing applications such as Office 2003 to access related services. In the future, Microsoft should release a digital rights management module for the Internet Explorer browser. In the future, it is planned that such a program will be required on the computer to work with any content that uses DRM technologies to protect against illegal copying.

Droid (Robot) (See. Agent)

DSA(Digital Signature Algorithm - Digital signature algorithm)

Public key digital signature algorithm. Developed by NIST (USA) in 1991.

DSL (Digital Subscrabe Line)

Modern technology supported by city telephone exchanges for exchanging signals at higher frequencies than those used in conventional analog modems. A DSL modem can work simultaneously with both a telephone (analog signal) and a digital line. Since the spectra of the voice signal from the telephone and the digital DSL signal do not “intersect”, i.e. do not affect each other, DSL allows you to surf the Internet and talk on the phone on the same physical line. Moreover, DSL technology usually uses several frequencies, and DSL modems on both sides of the line try to find the best ones to transmit data. A DSL modem not only transmits data, but also acts as a router. Equipped with an Ethernet port, the DSL modem makes it possible to connect several computers to it.

DSOM(Distributed System Object Model, Distributed SOM – Distributed System Object Model)

IBM technology with appropriate software support.

DSR? (Data set ready – Data readiness signal, DSR signal)

A serial interface signal indicating that a device (for example,

modem) is ready to send a bit of data to the PC.

DSR? (Device Status Report – Device status report)

DSR? (Device Status Register - Device Status Register)

DSS? (Decision Support System - Decision support system) (See.

The sound level is the same throughout the entire composition, there are several pauses.

Narrowing dynamic range

Narrowing of the dynamic range, or more simply put compression, is necessary for various purposes, the most common of which are:

1) Achieving a uniform volume level throughout the entire composition (or instrument part).

2) Achieving a uniform volume level for songs throughout the album/radio broadcast.

2) Increased intelligibility, mainly when compressing a certain part (vocals, bass drum).

How does dynamic range narrowing occur?

The compressor analyzes the sound level at the input by comparing it to a user-specified Threshold value.

If the signal level is below the value Threshold– then the compressor continues to analyze the sound without changing it. If the sound level exceeds the Threshold value, then the compressor begins its action. Since the role of the compressor is to narrow the dynamic range, it is logical to assume that it limits the largest and smallest amplitude values (signal level). At the first stage, the largest values are limited, which are reduced with a certain force, which is called ratio(Attitude). Let's look at an example:

Green curves display the sound level; the greater the amplitude of their oscillations from the X axis, the greater the signal level.

The yellow line is the threshold (Threshold) for the compressor to operate. By making the Threshold value higher, the user moves it away from the X axis. By making the Threshold value lower, the user brings it closer to the Y axis. It is clear that the lower the threshold value, the more often the compressor will operate and vice versa, the higher it is, the less often. If the Ratio value is very high, then after the Threshold signal level is reached, all subsequent signal will be suppressed by the compressor until silence. If the Ratio value is very small, then nothing will happen. The choice of Threshold and Ratio values will be discussed later. Now we should ask ourselves the following question: What is the point of suppressing all subsequent sound? Indeed, this makes no sense, we only need to get rid of the amplitude values (peaks) that exceed the Threshold value (marked in red on the graph). It is to solve this problem that there is a parameter Release(Attenuation), which sets the duration of the compression.

The example shows that the first and second exceedances of the Threshold threshold last less than the third exceedance of the Threshold threshold. So, if the Release parameter is set to the first two peaks, then when processing the third, an unprocessed part may remain (since exceeding the Threshold threshold lasts longer). If the Release parameter is set to the third peak, then when processing the first and second peaks, an undesirable decrease in the signal level is formed behind them.

The same goes for the Ratio parameter. If the Ratio parameter is adjusted to the first two peaks, then the third one will not be sufficiently suppressed. If the Ratio parameter is configured to process the third peak, then the processing of the first two peaks will be too excessive.

These problems can be solved in two ways:

1) Setting the attack parameter (Attack) - a partial solution.

2) Dynamic compression - a complete solution.

Parameter Astill (Attack) is intended to set the time after which the compressor will start operating after exceeding the Threshold threshold. If the parameter is close to zero (equal to zero in the case of parallel compression, see the corresponding article) - then the compressor will begin to suppress the signal immediately, and will work for the amount of time specified by the Release parameter. If the attack speed is high, then the compressor will begin its action after a certain period of time (this is necessary to give clarity). In our case, we can adjust the parameters of the threshold (Threshold), attenuation (Release) and compression level (Ratio) to process the first two peaks, and set the Attack value close to zero. Then the compressor will suppress the first two peaks, and when processing the third, it will suppress it until the threshold is exceeded (Threshold). However, this does not guarantee high-quality sound processing and is close to limiting (a rough cut of all amplitude values, in this case the compressor is called a limiter).

Let's look at the result of sound processing with a compressor:

The peaks disappeared, I note that the processing settings were quite gentle and we suppressed only the most prominent amplitude values. In practice, the dynamic range narrows much more and this trend only progresses. In the minds of many composers, they make the music louder, but in practice they completely deprive it of dynamics for those listeners who may be listening to it at home and not on the radio.

We just have to consider the last compression parameter, this Gain(Gain). Gain is designed to increase the amplitude of the entire composition and, in fact, is equivalent to another sound editor tool - normalize. Let's look at the final result:

In our case, compression was justified and improved the quality of the sound, since the prominent peak is more likely an accident than a deliberate result. In addition, it is clear that the music is rhythmic, therefore it has a narrow dynamic range. In cases where high amplitude values are intentional, compression may be a mistake.

Dynamic compression

The difference between dynamic compression and non-dynamic compression is that with the former, the level of signal suppression (Ratio) depends on the level of the input signal. Dynamic compressors are found in all modern programs; the Ratio and Threshold parameters are controlled using a window (each parameter has its own axis):

There is no single standard for displaying a graph; somewhere along the Y axis the level of the incoming signal is displayed, somewhere on the contrary, the signal level after compression. Somewhere the point (0,0) is in the upper right corner, somewhere in the lower left. In any case, when you move the mouse cursor over this field, the values of the numbers that correspond to the Ratio and Threshold parameters change. Those. You set the compression level for each Threshold value, allowing for very flexible compression settings.

Side Chain

A side chain compressor analyzes the signal of one channel, and when the sound level exceeds a threshold (threshold), it applies compression to another channel. Side chaining has its advantages of working with instruments that are located in the same frequency region (the bass-kick combination is actively used), but sometimes instruments located in different frequency regions are also used, which leads to an interesting side-chain effect.

Part Two – Compression Stages

There are three stages of compression:

1) The first stage is compression of individual sounds (singleshoots).

The timbre of any instrument has the following characteristics: Attack, Hold, Decay, Delay, Sustain, Release.

The stage of compression of individual sounds is divided into two parts:

1.1) Compression of individual sounds of rhythmic instruments

Often the components of a beat require separate compression to give them clarity. Many people process the bass drum separately from other rhythmic instruments, both at the stage of compression of individual sounds and at the stage of compression of individual parts. This is due to the fact that it is located in the low-frequency region, where in addition to it, only bass is usually present. The clarity of a bass drum means the presence of a characteristic click (the bass drum has a very short attack and hold time). If there is no click, then you need to process it with a compressor, setting the threshold to zero and the attack time from 10 to 50 ms. The roll-off (Realese) of the compressor must end before the next kick drum hit. The last problem can be solved using the formula: 60,000 / BPM, where BPM is the tempo of the composition. So, for example) 60,000/137=437.96 (time in milliseconds until a new downbeat of a 4-dimensional composition).

All of the above applies to other rhythmic instruments with a short attack time - they should have an accentuated click that should not be suppressed by the compressor at any stage of the compression levels.

1.2) Compressionindividual soundsharmonic instruments

Unlike rhythmic instruments, parts of harmonic instruments are rarely composed of individual sounds. However, this does not mean that they should not be processed at the sound compression level. If you use a sample with a recorded part, then this is the second level of compression. Only synthesized harmonic instruments apply to this compression level. These can be samplers, synthesizers using various methods of sound synthesis (physical modeling, FM, additive, subtractive, etc.). As you probably already guessed, we are talking about programming the synthesizer settings. Yes! This is also compression! Almost all synthesizers have a programmable envelope parameter (ADSR), which means envelope. Using the envelope, you set the time of Attack, Decay, Sustain, and Release. And if you tell me that this is not compression of each individual sound - you are my enemy for life!

2) Second stage – Compression of individual parts.

By compression of individual parts I mean narrowing the dynamic range of a number of combined individual sounds. This stage also includes recordings of parts, including vocals, which require compression processing to give it clarity and intelligibility. When processing parts by compression, you need to take into account that when adding individual sounds, unwanted peaks may appear, which you need to get rid of at this stage, since if this is not done now, the picture may worsen at the stage of mixing the entire composition. At the stage of compression of individual parts, it is necessary to take into account the compression of the stage of processing individual sounds. If you have achieved clarity of the bass drum, then incorrect re-processing at the second stage can ruin everything. It is not necessary to process all parts with a compressor, just as it is not necessary to process all individual sounds. I advise you to install, just in case, an amplitude analyzer to determine the presence of unwanted side effects of combining individual sounds. In addition to compression, at this stage it is necessary to ensure that the parts are, if possible, in different frequency ranges so that quantization can be performed. It is also useful to remember that sound has such a characteristic as masking (psychoacoustics):

1) A quieter sound is masked by a louder one coming in front of it.

2) A quieter sound at a low frequency is masked by a louder sound at a high frequency.

So, for example, if you have a synthesizer part, then often the notes begin to play before the previous notes finish sounding. Sometimes this is necessary (creating harmony, playing style, polyphony), but sometimes it is not at all - you can cut off their end (Delay - Release) if it is audible in solo mode, but not audible in playback mode of all parts. The same applies to effects, such as reverb - it should not last until the sound source starts again. By cutting and removing unnecessary signal, you make the sound cleaner, and this can also be considered as compression - because you are removing unnecessary waves.

3) The third stage – Compression of the composition.

When compressing an entire composition, you need to take into account the fact that all parts are a combination of many individual sounds. Therefore, when combining them and subsequent compression, we need to make sure that the final compression does not spoil what we achieved in the first two stages. You also need to separate compositions in which a wide or narrow range is important. when compressing compositions with a wide dynamic range, it is enough to install a compressor that will crush short-term peaks that were formed as a result of adding parts together. When compressing a composition in which a narrow dynamic range is important, everything is much more complicated. Here compressors have recently been called maximizers. Maximizer is a plugin that combines a compressor, limiter, graphic equalizer, enhyzer and other sound transformation tools. At the same time, he must have sound analysis tools. Maximizing, the final processing with a compressor, is largely necessary to combat mistakes made at previous stages. Errors - not so much in compression (however, if you do at the last stage what you could have done at the first stage, this is already a mistake), but in the initial selection of good samples and instruments that would not interfere with each other (we are talking about frequency ranges) . This is precisely why the frequency response is corrected. It often happens that with strong compression on the master, it is necessary to change the compression and mixing parameters at earlier stages, since with a strong narrowing of the dynamic range, quiet sounds that were previously masked come out, and the sound of individual components of the composition changes.

In these parts, I deliberately did not talk about specific compression parameters. I considered it necessary to write about the fact that when compression it is necessary to pay attention to all sounds and all parts at all stages of creating a composition. This is the only way in the end you will get a harmonious result not only from the point of view of music theory, but also from the point of view of sound engineering.

The following table provides practical advice for processing individual batches. However, in compression, numbers and presets can only suggest the desired area in which to search. The ideal compression settings depend on each individual case. The Gain and Threshold parameters assume a normal sound level (logical use of the entire range).

Part Three - Compression Parameters

Brief information:

Threshold – determines the sound level of the incoming signal, upon reaching which the compressor starts working.

Attack – determines the time after which the compressor will start working.

Level (ratio) – determines the degree of reduction in amplitude values (relative to the original amplitude value).

Release – defines the time after which the compressor will stop working.

Gain – determines the level of increase in the incoming signal after processing by a compressor.

Compression table:

Tool	Threshold	Attack	ratio	Release	Gain	Description
Vocals	0 dB	1-2 ms 2-5 mS 10 ms 0.1 ms 0.1 ms	less than 4:1 2,5: 1 4:1 – 12:1 2:1 -8:1	150 ms 50-100 mS 150 ms 150 ms 0.5s		Compression during recording should be minimal; it requires mandatory processing at the mixing stage to give clarity and intelligibility.
Wind instruments		1 – 5ms	6:1 – 15:1	0.3s
Barrel		10 to 50 ms 10-100 mS	4:1 and higher 10:1	50-100 ms 1 mS		The lower the Thrshold and the higher the Ratio and the longer the Attack, the more pronounced the click at the beginning of the kick drum.
Synthesizers						Depends on the wave type (ADSR envelopes).
Snare drum:		10-40 mS 1- 5ms	5:1 5:1 – 10:1	50 mS 0.2s
Hi-Hat		20 mS	10:1	1 mS
Overhead microphones		2-5 mS	5:1	1-50 mS
Drums		5ms	5:1 – 8:1	10ms
Bas-guitar		100-200 mS 4ms to 10ms	5:1	1 mS 10ms
Strings		0-40 mS	3:1	500 mS
Synth. bass		4ms – 10ms	4:1	10ms		Depends on the envelopes.

Percussion		0-20 mS	10:1	50 mS
Acoustic guitar, Piano		10-30 mS 5 – 10ms	4:1 5:1 -10:1	50-100 mS 0.5s
Electro-nitara		2 – 5ms	8:1	0.5s

Final compression		0.1 ms 0.1 ms	2:1 from 2:1 to 3:1	50 ms 0.1 ms	0 dB output	The attack time depends on the purpose - whether you need to remove peaks or make the track smoother.
Limiter after final compression		0 mS	10:1	10-50 mS	0 dB output	If you need a narrow dynamic range and a rough “cut” of waves.

The information was taken from various sources referenced by popular resources on the Internet. The difference in compression parameters is explained by different sound preferences and working with different materials.

Reverse Mastering: Is It Possible to Increase the Dynamic Range of Compressed Recordings? Speech synthesis and recognition. Modern solutions

What is audio compression

Problems solved by compression

Basic parameters of the compressor

Threshold or response threshold, dB

Ratio or compression ratio

Attack, ms

Release or recovery, ms

Part Two – Compression Stages

Part Three - Compression Parameters

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts