© 2006 DIGIFON
From "The ISDN Studio" by Dave Immer
Audio Engineering Society 99th Convention
Oct. 8, 1995 , New York City
For audio applications, the algorithm is a model - or a set
of rules - by which a PCM bit-stream is analyzed and re-quantized into a reduced
bit-stream. Audio coding algorithms differ in how they deal with the irrelevancy
and redundancy contained in the PCM signal and fall into two basic categories:
Transform and Predictive, both of which have subband variations. Transform is
frequency-domain based and the Predictive is time-domain based. A frequency-domain
based algorithm will employ bit reduction following known characteristics (contained
in an on-board lookup table) of human hearing. This process is called perceptual
coding and only psycho-acoustically "relevant" waveform information
is transmitted and reconstructed at the decoder where aliasing noise gets dynamically
masked within subbands of audio having the most energy at the moment. Audio
frequency response for frequency-domain coding is much less bit-rate dependent
(but has more coding delay) than a time-domain process. A time-domain approach
will use predictive analysis based on look-up tables available to the coder,
and transmit only the differences between the prediction and the actual sample
and then add the redundant information back at the decoder. The audio frequency
response is dependent on the bit-rate of the transmission but this method results
in a very low coding delay. Both approaches work quite well and each has its
advantages. A major advantage of predictive coding such as APT-X is that it
is a "near-lossless" treatment, making it a good choice for production
applications. MPEG layer 2 allows for "tweaks" and improvements in
the coding side (such as the Musicam implementation) that are "followed"
by the decoding side. This makes an MPEG layer 2 decoder much less complex (and
cheaper) than the encoder and therefore a major contender for digital radio
broadcasting.
The reason all this number crunching must be done in the first place, can be
illustrated with a few simple formulas. Stereo audio (2 channels) which is sampled
at the CD rate of 44,100 samples per second with a 16-bit resolution creates
a real-time "bit-rate" of 1,411,200 bits per second. But the available
bit-rate - or "digital bandwidth" - of Basic Rate ISDN service is
only 128,000 bps; roughly 9% of what is needed for unreduced digital audio.
So a 12:1 bit reduction process is needed to "fit" the digital audio
into the available speed of a single ISDN line. With the use of two ISDN lines
(256 Kbps) the bit-reduction need only be 6:1, and with 3 lines (384 Kbps) 4:1.
"CD quality" stereo audio: 2 x 44,100 x 16 = 1,411,200 bits per second
(1.411 Mbps)
Basic Rate ISDN = 128,000 bits per second (128 Kbps)
128,000 ÷ 1,411,200 = .09 (9%)
With inverse multiplexing, ISDN "B" - or bearer - channels can be
aggregated and synchronized in increments of 64 Kbps to create "data pipes"
of any size desired. Each 64 Kbps B channel, when used for a domestic call,
is billed by the phone company at roughly the same rate as a standard phone
call. So a 128 Kbps connection, which requires 2 B channels, is billed for 2
phone calls. A 256 Kbps connection = 4 phone calls, etc. At higher data speeds,
the digital audio bit-stream requires less reduction, resulting in either more
of the original waveform being transmitted and less noise to mask at the receiving
end, or, in the case of time-domain coding, improved audio bandwidth.
G.722: The first popular "hi-fi" CODECs.
As recent as 1990 the only algorithm in wide use was a predictive time-domain
based one developed by AT&T for broadcast applications called G.722 (pronounced
"gee dot seven twenty two"). Back when ISDN was still only a gleam
in most telephone companies (and equipment manufacturers) eyes, this algorithm
made possible 7.5 kHz mono audio over a single 56 Kbs channel or "Switched
56". Numerous manufacturers built G.722 CODECs and they all were able to
interconnect with each other. G.722 using SW56 phone service or a single ISDN
"B" channel is still in wide use today and is quite adequate for speech-only
applications such as commercial voice-overs, live announce, interviews, sports
feeds, news reporting and high quality audio conferences. Voice programming
that has been transmitted via G.722 codecs are heard quite frequently on radio
and TV.