You don't need to know anything about sound or acoustics to use
the MusicKit sound facilities for simple recording and playback. However,
to access and manipulate sound data intelligently, you should be
familiar with a few basic terms and concepts. This section presents a
brief tutorial on the basic concepts of sound and its digital
representation, followed by an in-depth examination of
SndSoundStruct
, the structure that's used by
the SndKit software to represent
sound.
Sound is a physical phenomenon produced by the vibration of matter. The matter can be almost anything: a violin string or a block of wood, for example. As the matter vibrates, pressure variations are created in the air surrounding it. This alternation of high and low pressure is propagated through the air in a wave-like motion. When the wave reaches our ears, we hear a sound.
Figure 2-1 graphs the oscillation of a pressure wave over time.
The pattern of the pressure oscillation is called a waveform. Notice that the waveform in Figure 2-1 repeats the same shape at regular intervals; the gray area shows one complete shape. This portion of the waveform is called a period. A waveform with a clearly defined period occurring at regular intervals is called aperiodic waveform.
Since they occur naturally, sound waveforms are never as perfectly smooth nor as uniformly periodic as the waveform shown in Figure 2-1. However, sounds that display a recognizable periodicity tend to be more musical than those that are nonperiodic. Here are some sources of periodic and nonperiodic sounds:
Musical instruments other than unpitched percussion
Vowel sounds
Bird songs
Whistling wind
Unpitched percussion instruments
Consonants, such as “t,” “f,” and “s”
Coughs and sneezes
Rushing water
The frequency of a sound―the number of times the pressure rises and falls, or oscillates, in a second―is measured in hertz (Hz). A frequency of 100 Hz means 100 oscillations per second. A convenient abbreviation, kHz for kilohertz, is used to indicate thousands of oscillations per second: 1 kHz equals 1000 Hz.
The frequency range of normal human hearing extends from around 20 Hz up to about 20 kHz.
The frequency axis is logarithmic, not linear: To traverse the audio range from low to high by equal-sounding steps, each successive frequency increment must be greater than the last. For example, the frequency difference between the lowest note on a piano and the note an octave above it is about 27 Hz. Compare this to the piano's top octave, where the frequency difference is over 2000 Hz. Yet, subjectively, the two intervals sound the same.
A sound also has an amplitude, a property subjectively heard as loudness. The amplitude of a sound is the measure of the displacement of air pressure from its mean, or quiescent state. The greater the amplitude, the louder the sound.
The smooth, continuous curve of a sound waveform isn't directly represented in a computer. A computer measures the amplitude of the waveform at regular time intervals to produce a series of numbers. Each of these measurements is called a sample. Figure 2-2 illustrates one period of a digitally sampled waveform.
Each vertical bar in Figure 2-2 represents a single sample. The height of a bar indicates the value of that sample.
The mechanism that converts an audio signal into digital samples is called an analog-to-digital converter, or ADC. To convert a digital signal back to analog, you need a digital-to-analog converter, or DAC (pronounced “dack”).
The rate at which a waveform is sampled is called the sampling rate. Like frequencies, sampling rates are measured in hertz. The CD standard sampling rate of 44100 Hz means that the waveform is sampled 44100 times per second. This may seem a bit excessive, considering that we can't hear frequencies above 20 kHz; however, the highest frequency that a digitally sampled signal can represent is equal to half the sampling rate. So a sampling rate of 44100 Hz can only represent frequencies up to 22050 Hz, a boundary much closer to that of human hearing.
Just as a waveform is sampled at discrete times, the value of the sample is also discrete. The quantization of a sample value depends on the number of bits used in measuring the height of the waveform. An 8-bit quantization yields 256 possible values; 16-bit CD-quality quantization results in over 65000 values. As an extreme example, Figure 2-3 shows the waveform used in the previous example sampled with a 3-bit quantization. This results in only eight possible values: .75, .5, .25, 0, -.25, -.5, -.75, and -1.
As you can see, the shape of the waveform becomes less discernible with a coarser quantization. The coarser the quantization, the “buzzier” the sound.
An increased sampling rate and refined quantization improves the fidelity of a digitally sampled waveform; however, the sound will also take up more storage space. Five seconds of sound sampled at 44.1 kHz with a 16-bit quantization uses more than 400,000 bytes of storage―a minute will consume more than five megabytes. A number of data compression schemes have been devised to decrease storage while sacrificing some fidelity.
SndSoundStruct
: How Sound is RepresentedThe SndKit defines the
SndSoundStruct
structure to represent sound.
This structure defines the soundfile formats
and the sound pasteboard type. It's also used to describe sounds in
Interface Builder. In addition, each instance of the
SndKit's Snd
class
encapsulates a SndSoundStruct
and provides
methods to access and modify its attributes.
Basic sound operations, such as playing, recording, and
cut-and-paste editing, are most easily performed by a
Snd
object. In many cases, the
SndKit obviates the need for in-depth
understanding of the SndSoundStruct
architecture. For example, if you simply want to incorporate sound
effects into an application, or to provide a simple graphic sound
editor (such as the one in the Mail
application), you needn't be aware of the details of the
SndSoundStruct
. However, if you want to
closely examine or manipulate sound data you should be familiar with
this structure.
The SndSoundStruct
contains a header,
information that describes the attributes of a sound, followed by the
data (usually samples) that represents the sound. The structure is
defined (in SndKit/soundstruct.h)
as:
typedef struct { int magic /* magic number SND_MAGIC */ int dataLocation; /* offset or pointer to the data */ int dataSize; /* number of bytes of data */ int dataFormat; /* the data format code */ int samplingRate; /* the sampling rate */ int channelCount; /* the number of channels */ char info[4]; /* optional text information */ } SndSoundStruct; |
SndSoundStruct
Fieldsmagic is a magic number that's used to identify the structure as
a SndSoundStruct
. Keep in mind that the
structure also defines the soundfile format, so the magic number is
also used to identify these entities as containing a sound.
It was mentioned above that the
SndSoundStruct
contains a header followed by
sound data. In reality, the structure only
contains the header; the data itself is external to, although usually
contiguous with, the structure. (Nonetheless, it's often useful to
speak of the SndSoundStruct
as the header and
the data.) dataLocation is used to
point to the data. Usually, this value is an offset (in bytes) from
the beginning of the SndSoundStruct
to the
first byte of sound data. The data, in this case, immediately follows
the structure, so dataLocation can
also be thought of as the size of the structure's header. The other
use of dataLocation, as an address
that locates data that isn't contiguous with the structure, is
described in the Section called Format Codes, below.
These fields describe the sound data.
dataSize is its size in bytes
(not including the size of the SndSoundStruct
).
dataFormat is a code that identifies the type of sound. For sampled sounds, this is the quantization format. However, the data can also be instructions for synthesizing a sound on the DSP. The codes are listed and explained in the Section called Format Codes, below.
samplingRate is the sampling rate (if the data is samples). Three sampling rates, represented as integer constants, are supported by the hardware:
Table 2-1. Sample Rate Constants
Constant | Sampling Rate (Hz) | Description |
---|---|---|
SND_RATE_CODEC | 8012.821 | CODEC input |
SND_RATE_LOW | 22050.0 | low sampling rate output |
SND_RATE_HIGH | 44100.0 | high sampling rate output |
channelCount is the number of channels of sampled sound.
info is a NULL-terminated string that you can supply to provide a textual description of the sound. The size of the info field is set when the structure is created and thereafter can't be enlarged. It's at least four bytes long (even if it's unused).
A sound's format is represented as a positive 32-bit integer. NeXT reserves the integers 0 through 255; you can define your own format and represent it with an integer greater than 255. Most of the formats defined by NeXT describe the amplitude quantization of sampled sound data:
Table 2-2. NeXT/Sun Sound File Format Codes
Code | Format |
---|---|
SND_FORMAT_MULAW_8 | 8-bit mu-law samples |
SND_FORMAT_LINEAR_8 | 8-bit linear samples |
SND_FORMAT_LINEAR_16 | 16-bit linear samples |
SND_FORMAT_EMPHASIZED | 16-bit linear with emphasis |
SND_FORMAT_COMPRESSED | 16-bit linear with compression |
SND_FORMAT_COMPRESSED_EMPHASIZED | A combination of the two above |
SND_FORMAT_LINEAR_24 | 24-bit linear samples |
SND_FORMAT_LINEAR_32 | 32-bit linear samples |
SND_FORMAT_FLOAT | floating-point samples |
SND_FORMAT_DOUBLE | double-precision float samples |
SND_FORMAT_DSP_DATA_8 | 8-bit fixed-point samples |
SND_FORMAT_DSP_DATA_16 | 16-bit fixed-point samples |
SND_FORMAT_DSP_DATA_24 | 24-bit fixed-point samples |
SND_FORMAT_DSP_DATA_32 | 32-bit fixed-point samples |
SND_FORMAT_DSP_CORE | DSP program |
SND_FORMAT_DSP_COMMANDS | Music Kit DSP commands |
SND_FORMAT_DISPLAY | non-audio display data |
SND_FORMAT_INDIRECT | fragmented sampled data |
SND_FORMAT_UNSPECIFIED | unspecified format |
All but the last five formats identify different sizes and types of sampled data. The others deserve special note:
SND_FORMAT_DSP_CORE
format contains data that
represents a loadable DSP core program. Sounds in this format are
required by the SNDBootDSP()
and
SNDRunDSP()
functions. You create a
SND_FORMAT_DSP_CORE
sound by reading a DSP load file (extension
“.lod”) with the SNDReadDSPfile()
function.
SND_FORMAT_DSP_COMMANDS
is used
to distinguish sounds that contain DSP commands created by the
MusicKit. Sounds in this format can only
be created through the MusicKit's
MKOrchestra
class, but can be played back
through the SNDStartPlaying()
function.
SND_FORMAT_DISPLAY
format is used
by the SndKit's
SndView
class. Such sounds can't be
played.
SND_FORMAT_INDIRECT
indicates
data that has become fragmented, as described in
a separate section, below.
SND_FORMAT_UNSPECIFIED
is used
for unrecognized formats.
Sound data is usually stored in a contiguous block of memory.
However, when sampled sound data is edited (such that a portion of the
sound is deleted or a portion inserted), the data may become
discontiguous, or fragmented. Each fragment of
data is given its own SndSoundStruct
header;
thus, each fragment becomes a separate
SndSoundStruct
structure. The addresses of
these new structures are collected into a contiguous, NULL-terminated
block; the dataLocation field of the
original SndSoundStruct
is set to the address
of this block, while the original format, sampling rate, and channel
count are copied into the new
SndSoundStruct
s.
Fragmentation serves one purpose: It avoids the high cost of
moving data when the sound is edited. Playback of a fragmented sound
is transparent―you never need to know whether the sound is
fragmented before playing it. However, playback of a heavily
fragmented sound is less efficient than that of a contiguous sound.
The SNDCompactSamples()
C function
can be used to compact fragmented sound data.
Sampled sound data is naturally unfragmented. A sound that's freshly recorded or retrieved from a soundfile, the Mach-O segment, or the pasteboard won't be fragmented. Keep in mind that only sampled data can become fragmented.
A number of C functions are provided that let you record,
manipulate, and play sounds. These C functions operate on
SndSoundStruct
s and demand a familiarity with
the structure. It's expected that most sound operations will be
performed through the SndKit, where
knowledge of the SndSoundStruct
isn't
necessary. Nonetheless, the C functions are provided for generality
and to allow sound manipulation without the
SndKit. The functions are fully described
in SndKit
Function References.