As we know, digital audio may suffer from quantization error when we’re dealing with extremely low levels, where we’re storing information near the value of one LSB. If the level falls below 0.5LSB, the audio isn’t registered at all, because it’s quantized to 0. These low levels could be reverb tails, fade-outs or some other very low-level details in otherwise silent passages. Dither does not make sure there’s always enough bits to register the quietest parts; that’s what an upward compression is for. Dither is for randomizing the quantization error. The error is still as bad in the low levels, but it’s just so random we won’t register it as quantization error, or quantization distortion specifically at the low levels.
What dithering fights against is quantization distortion from bit truncation. At very low levels without dither, when we try to store values smaller than what discrete bits can represent precisely, the rounding errors become correlated with the signal itself, creating harmonic distortion. With dither, those errors become random noise instead.
In theory, dithering may not be needed when working entirely in 24-bit, since the quantization noise floor (-144 dBFS) is far below the noise floor of any recording equipment or playback environment. However, dither becomes essential when reducing bit depth from 24-bit to 16-bit for distribution, or when applying certain processing that can expose quantization artifacts.
At high levels, it’s impossible to notice the effect of dithering, but approaching very low levels, we start noticing the lack of quantization distortion. That’s the effect of dithering.

The process of quantization is somewhat simple: we generate two random numbers between -0.5 and 0.5 for every sample. For example, number A = -0.2 and number B = 0.4. Our dither sample value is -0.2+0.4=0.2. This value gets added to the sample value before quantization. If the original sample value was 1234.466, and we didn’t use dither, this would be quantized to 1234.0. But if we added dither, the resulting sample value would be 1234.666, which quantizes to 1235.0. These random numbers are added to every sample, no matter what’s the original sample value.
An example of what dither sounds like can be heard below. The audio has been amplified by 70 dB.

There are two types of generating the dither values: RPDF and TPDF. The first, rectangular probability density function is a fancy way of saying single number. The second, triangular probability density function is a more sophisticated way of producing the number, and it also makes sure the random numbers we’re adding result in values weighted towards the middle (0), rather than being all over the place. In simple terms, this works so that to achieve either extreme (-1 or 1), you need to get exactly 0.5 for both numbers. For zero you have much more different random values that result in zero. Over time, we see we average near zero, which means we’re more often not adding anything to the sample.

Using dither comes with a price. Essentially, we’re sacrificing around 3 dB of S/N ratio, which comes from summing two sources whose RMS is around -96 dBFS. The first is the noise of quantization error (Equation 1). The second is dither, whose RMS is also at -96 dBFS (because of the random noise we’re adding at the level of the LSB). Summing these together means summing their power, which we do in Equation 2. For clarity, single sample could be nearer to -96 dBFS rather than -93 dBFS, but over time, thanks to the random nature of the dither, we get to -93 dBFS. Since the noise is uncorrelated, we use the formula for power addition.
32 ⇒ 24 ⇒ 16 (origin is higher than destination - BIT REDUCTION)
Necessary
16 ⇒ 24 ⇒ 32
Not necessary since nothing to dither
16 ⇒ no processing ⇒ 16
Nothing to dither
16 ⇒ processing ⇒ 16
Necessary, if not done automatically, but only on final output
24 ⇒ 24 or 32 ⇒ 32
Not necessary since noise floor already higher than the LSB
EQ1:
\begin{aligned}S/E &= 20log(2^n) \\&= 20(n)log2 \\&= 6.0206n \\S/E_{16}&=20log(2^{16}) \\&=96.33\end{aligned}
EQ2:
\begin{aligned}
dB_{res16} &= 10log_{10}(10^{-96\over 10} + 10^{-96\over 10}) \\
&= 10log_10(2\times 10^{-96\over 10} \\
&=10\times log_{10}(2) + 10\times log_{10}(10^{-96\over 10}) \\
&= 3.01 + (-96) \\
&= 93 dBFS
\end{aligned} Or simplified: \begin{aligned}
\Delta dB&=10log(2) \\
\Delta dB&=3.01
\end{aligned}