MusicKit and SndKit Concepts
Prev	Chapter 2. Sound and the SndKit	Next

Basic Sound Concepts

You don't need to know anything about sound or acoustics to use the MusicKit sound facilities for simple recording and playback. However, to access and manipulate sound data intelligently, you should be familiar with a few basic terms and concepts. This section presents a brief tutorial on the basic concepts of sound and its digital representation, followed by an in-depth examination of SndSoundStruct, the structure that's used by the SndKit software to represent sound.

What is Sound?

Sound is a physical phenomenon produced by the vibration of matter. The matter can be almost anything: a violin string or a block of wood, for example. As the matter vibrates, pressure variations are created in the air surrounding it. This alternation of high and low pressure is propagated through the air in a wave-like motion. When the wave reaches our ears, we hear a sound.

Figure 2-1 graphs the oscillation of a pressure wave over time.

Figure 2-1. Air Pressure Wave

The pattern of the pressure oscillation is called a waveform. Notice that the waveform in Figure 2-1 repeats the same shape at regular intervals; the gray area shows one complete shape. This portion of the waveform is called a period. A waveform with a clearly defined period occurring at regular intervals is called aperiodic waveform.

Since they occur naturally, sound waveforms are never as perfectly smooth nor as uniformly periodic as the waveform shown in Figure 2-1. However, sounds that display a recognizable periodicity tend to be more musical than those that are nonperiodic. Here are some sources of periodic and nonperiodic sounds:

Periodic

Musical instruments other than unpitched percussion
Vowel sounds
Bird songs
Whistling wind

Nonperiodic

Unpitched percussion instruments
Consonants, such as “t,” “f,” and “s”
Coughs and sneezes
Rushing water

Frequency

The frequency of a sound―the number of times the pressure rises and falls, or oscillates, in a second―is measured in hertz (Hz). A frequency of 100 Hz means 100 oscillations per second. A convenient abbreviation, kHz for kilohertz, is used to indicate thousands of oscillations per second: 1 kHz equals 1000 Hz.

The frequency range of normal human hearing extends from around 20 Hz up to about 20 kHz.

The frequency axis is logarithmic, not linear: To traverse the audio range from low to high by equal-sounding steps, each successive frequency increment must be greater than the last. For example, the frequency difference between the lowest note on a piano and the note an octave above it is about 27 Hz. Compare this to the piano's top octave, where the frequency difference is over 2000 Hz. Yet, subjectively, the two intervals sound the same.

Amplitude

A sound also has an amplitude, a property subjectively heard as loudness. The amplitude of a sound is the measure of the displacement of air pressure from its mean, or quiescent state. The greater the amplitude, the louder the sound.

How the Computer Represents Sound

The smooth, continuous curve of a sound waveform isn't directly represented in a computer. A computer measures the amplitude of the waveform at regular time intervals to produce a series of numbers. Each of these measurements is called a sample. Figure 2-2 illustrates one period of a digitally sampled waveform.

Figure 2-2. Sampled Waveform

Each vertical bar in Figure 2-2 represents a single sample. The height of a bar indicates the value of that sample.

The mechanism that converts an audio signal into digital samples is called an analog-to-digital converter, or ADC. To convert a digital signal back to analog, you need a digital-to-analog converter, or DAC (pronounced “dack”).

Sampling Rate

The rate at which a waveform is sampled is called the sampling rate. Like frequencies, sampling rates are measured in hertz. The CD standard sampling rate of 44100 Hz means that the waveform is sampled 44100 times per second. This may seem a bit excessive, considering that we can't hear frequencies above 20 kHz; however, the highest frequency that a digitally sampled signal can represent is equal to half the sampling rate. So a sampling rate of 44100 Hz can only represent frequencies up to 22050 Hz, a boundary much closer to that of human hearing.

Quantization

Just as a waveform is sampled at discrete times, the value of the sample is also discrete. The quantization of a sample value depends on the number of bits used in measuring the height of the waveform. An 8-bit quantization yields 256 possible values; 16-bit CD-quality quantization results in over 65000 values. As an extreme example, Figure 2-3 shows the waveform used in the previous example sampled with a 3-bit quantization. This results in only eight possible values: .75, .5, .25, 0, -.25, -.5, -.75, and -1.

Figure 2-3. Three-Bit Quantization

As you can see, the shape of the waveform becomes less discernible with a coarser quantization. The coarser the quantization, the “buzzier” the sound.

Storing Sampled Data

An increased sampling rate and refined quantization improves the fidelity of a digitally sampled waveform; however, the sound will also take up more storage space. Five seconds of sound sampled at 44.1 kHz with a 16-bit quantization uses more than 400,000 bytes of storage―a minute will consume more than five megabytes. A number of data compression schemes have been devised to decrease storage while sacrificing some fidelity.

SndSoundStruct: How Sound is Represented

The SndKit defines the SndSoundStruct structure to represent sound. This structure defines the soundfile formats and the sound pasteboard type. It's also used to describe sounds in Interface Builder. In addition, each instance of the SndKit's Snd class encapsulates a SndSoundStruct and provides methods to access and modify its attributes.

Basic sound operations, such as playing, recording, and cut-and-paste editing, are most easily performed by a Snd object. In many cases, the SndKit obviates the need for in-depth understanding of the SndSoundStruct architecture. For example, if you simply want to incorporate sound effects into an application, or to provide a simple graphic sound editor (such as the one in the Mail application), you needn't be aware of the details of the SndSoundStruct. However, if you want to closely examine or manipulate sound data you should be familiar with this structure.

The SndSoundStruct contains a header, information that describes the attributes of a sound, followed by the data (usually samples) that represents the sound. The structure is defined (in SndKit/soundstruct.h) as:

typedef struct {
    int magic                /* magic number SND_MAGIC */
    int dataLocation;        /* offset or pointer to the data */
    int dataSize;            /* number of bytes of data */
    int dataFormat;          /* the data format code */
    int samplingRate;        /* the sampling rate */
    int channelCount;        /* the number of channels */
    char info[4];            /* optional text information */
} SndSoundStruct;

SndSoundStruct Fields

magic

magic is a magic number that's used to identify the structure as a SndSoundStruct. Keep in mind that the structure also defines the soundfile format, so the magic number is also used to identify these entities as containing a sound.

dataLocation

It was mentioned above that the SndSoundStruct contains a header followed by sound data. In reality, the structure only contains the header; the data itself is external to, although usually contiguous with, the structure. (Nonetheless, it's often useful to speak of the SndSoundStruct as the header and the data.) dataLocation is used to point to the data. Usually, this value is an offset (in bytes) from the beginning of the SndSoundStruct to the first byte of sound data. The data, in this case, immediately follows the structure, so dataLocation can also be thought of as the size of the structure's header. The other use of dataLocation, as an address that locates data that isn't contiguous with the structure, is described in the Section called Format Codes, below.

dataSize, dataFormat, samplingRate, and channelCount

These fields describe the sound data.

dataSize is its size in bytes (not including the size of the SndSoundStruct).

dataFormat is a code that identifies the type of sound. For sampled sounds, this is the quantization format. However, the data can also be instructions for synthesizing a sound on the DSP. The codes are listed and explained in the Section called Format Codes, below.

samplingRate is the sampling rate (if the data is samples). Three sampling rates, represented as integer constants, are supported by the hardware:

Table 2-1. Sample Rate Constants

Constant	Sampling Rate (Hz)	Description
`SND_RATE_CODEC`	8012.821	CODEC input
`SND_RATE_LOW`	22050.0	low sampling rate output
`SND_RATE_HIGH`	44100.0	high sampling rate output

channelCount is the number of channels of sampled sound.

info

info is a NULL-terminated string that you can supply to provide a textual description of the sound. The size of the info field is set when the structure is created and thereafter can't be enlarged. It's at least four bytes long (even if it's unused).

Format Codes

A sound's format is represented as a positive 32-bit integer. NeXT reserves the integers 0 through 255; you can define your own format and represent it with an integer greater than 255. Most of the formats defined by NeXT describe the amplitude quantization of sampled sound data:

Table 2-2. NeXT/Sun Sound File Format Codes

Code	Format
`SND_FORMAT_MULAW_8`	8-bit mu-law samples
`SND_FORMAT_LINEAR_8`	8-bit linear samples
`SND_FORMAT_LINEAR_16`	16-bit linear samples
`SND_FORMAT_EMPHASIZED`	16-bit linear with emphasis
`SND_FORMAT_COMPRESSED`	16-bit linear with compression
`SND_FORMAT_COMPRESSED_EMPHASIZED`	A combination of the two above
`SND_FORMAT_LINEAR_24`	24-bit linear samples
`SND_FORMAT_LINEAR_32`	32-bit linear samples
`SND_FORMAT_FLOAT`	floating-point samples
`SND_FORMAT_DOUBLE`	double-precision float samples
`SND_FORMAT_DSP_DATA_8`	8-bit fixed-point samples
`SND_FORMAT_DSP_DATA_16`	16-bit fixed-point samples
`SND_FORMAT_DSP_DATA_24`	24-bit fixed-point samples
`SND_FORMAT_DSP_DATA_32`	32-bit fixed-point samples
`SND_FORMAT_DSP_CORE`	DSP program
`SND_FORMAT_DSP_COMMANDS`	Music Kit DSP commands
`SND_FORMAT_DISPLAY`	non-audio display data
`SND_FORMAT_INDIRECT`	fragmented sampled data
`SND_FORMAT_UNSPECIFIED`	unspecified format

All but the last five formats identify different sizes and types of sampled data. The others deserve special note:

SND_FORMAT_DSP_CORE format contains data that represents a loadable DSP core program. Sounds in this format are required by the SNDBootDSP() and SNDRunDSP() functions. You create a SND_FORMAT_DSP_CORE sound by reading a DSP load file (extension “.lod”) with the SNDReadDSPfile() function.
SND_FORMAT_DSP_COMMANDS is used to distinguish sounds that contain DSP commands created by the MusicKit. Sounds in this format can only be created through the MusicKit's MKOrchestra class, but can be played back through the SNDStartPlaying() function.
SND_FORMAT_DISPLAY format is used by the SndKit's SndView class. Such sounds can't be played.
SND_FORMAT_INDIRECT indicates data that has become fragmented, as described in a separate section, below.
SND_FORMAT_UNSPECIFIED is used for unrecognized formats.

Fragmented Sound Data

Sound data is usually stored in a contiguous block of memory. However, when sampled sound data is edited (such that a portion of the sound is deleted or a portion inserted), the data may become discontiguous, or fragmented. Each fragment of data is given its own SndSoundStruct header; thus, each fragment becomes a separate SndSoundStruct structure. The addresses of these new structures are collected into a contiguous, NULL-terminated block; the dataLocation field of the original SndSoundStruct is set to the address of this block, while the original format, sampling rate, and channel count are copied into the new SndSoundStructs.

Fragmentation serves one purpose: It avoids the high cost of moving data when the sound is edited. Playback of a fragmented sound is transparent―you never need to know whether the sound is fragmented before playing it. However, playback of a heavily fragmented sound is less efficient than that of a contiguous sound. The SNDCompactSamples() C function can be used to compact fragmented sound data.

Sampled sound data is naturally unfragmented. A sound that's freshly recorded or retrieved from a soundfile, the Mach-O segment, or the pasteboard won't be fragmented. Keep in mind that only sampled data can become fragmented.

Sound C Functions

A number of C functions are provided that let you record, manipulate, and play sounds. These C functions operate on SndSoundStructs and demand a familiarity with the structure. It's expected that most sound operations will be performed through the SndKit, where knowledge of the SndSoundStruct isn't necessary. Nonetheless, the C functions are provided for generality and to allow sound manipulation without the SndKit. The functions are fully described in SndKit Function References.