Schroeter, J., Mehta, S.K., Carter, G.C. “Acoustic Signal Processing” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000 19 Acoustic Signal Processing 19.1 Digital Signal Processing in Audio and Electroacoustics Steerable Microphone Arrays?Digital Hearing Aids?Spatial Processing?Audio Coding?Echo Cancellation?Active Noise and Sound Control 19.2 Underwater Acoustical Signal Processings What Is Underwater Acoustical Signal Processing??Technical Overview?Underwater Propagation?Processing Functions?Advanced Signal Processing?Application 19.1 Digital Signal Processing in Audio and Electroacoustics Juergen Schroeter In this section we will focus on advances in algorithms and technologies in digital signal processing (DSP) that have already had or, most likely, will soon have, a major impact on audio and electroacoustics (A&E). Because A&E embraces a wide range of topics, it is impossible for us to go here into any depth in any one of them. Instead, this section will try to give a compressed overview of the topics the author judges to be most important. In the following, we will look into steerable microphone arrays, digital hearing aids, spatial processing, audio coding, echo cancellation, and active noise and sound control. We will not cover basic techniques in digital recording [Pohlmann, 1989] and computer music [Moore, 1990]. Steerable Microphone Arrays Steerable microphone arrays have controllable directional characteristics. One important application is in teleconferencing. Here, sound pickup can be highly degraded by reverberation and room noise. One solution to this problem is to utilize highly directional microphones. Instead of pointing such a microphone manually to a desired talker, steerable microphone arrays can be used for reliable automatic tracking of speakers as they move around in a noisy room or auditorium, if combined with a suitable speech detection algorithm. Figure 19.1 depicts the simplest kind of steerable array using N microphones that are uniformly spaced with distance d along the linear x-axis. It can be shown that the response of this system to a plane wave impinging at an angle q is: (19.1) Here, j = , w is the radian frequency, and c is the speed of sound. Equation (19.1) is a spatial filter with coefficients a n and the delay operator z –1 = exp(–jdw/c cosq). Therefore, we can apply finite impulse response (FIR) filter theory. For example, we could taper the weights a n to suppress sidelobes of the array. We also have to guard against spatial aliasing, that is, grating lobes that make the directional characteristic of the array Hj ae n n N cnd () cos w wq = = - - ? 0 1 j(/) -1 Juergen Schroeter Acoustics Research Dept., AT&T Bell Laboratories Sanjay K. Mehta NUWC Detachment G. Clifford Carter NUWC Detachment ? 2000 by CRC Press LLC ambiguous. The array is steered to an angle q 0 by introducing appropriate delays into the N microphone lines. In Eq. (19.1), we can incorporate these delays by letting a n (19.2) Here t 0 is an overall delay equal to or larger than Nd/c cosq 0 that ensures causality, while the second term in Eq. (19.2) cancels the corresponding term in Eq. (19.1) at q =q 0 . Due to the axial symmetry of the one- dimensional (linear, 1-D) array, the directivity of the array is a figure of revolution around the x-axis. Therefore, in case we want the array to point to a single direction in space, we need a 2-D array. Since most of the energy of typical room noise and the highest level of reverberation in a room is at low frequencies, one would like to use arrays that have their highest directivity (i.e., narrowest beamwidth) at low frequencies. Unfortunately, this need collides with the physics of arrays: the smaller the array relative to the wavelength, the wider the beam. (Again, the corresponding notion in filter theory is that systems with shorter impulse responses have wider bandwidth.) One solution to this problem is to superimpose different-size arrays and filter each output by the appropriate bandpass filter, similar to a crossover network used in two- or three- way loudspeaker designs. Such a superposition of three five-element arrays is shown in Fig. 19.2. Note that we only need nine microphones in this example, instead of 5 2 3 = 15. Another interesting application is the use of an array to mitigate discrete noise sources in a room. For this, we need to attach an FIR filter to each of the microphone signal outputs. For any given frequency, one can show that N microphones can produce N – 1 nulls in the directional characteristic of the array. Similarly, attaching an M-point FIR filter to each of the microphones, we can get these zeros at M – 1 frequencies. The weights for these filters have to be adapted, usually under the constraint that the transfer function (frequency characteristic) of the array for the desired source is optimally flat. In practical tests, systems of this kind work nicely in (almost) anechoic environments. Their performance degrades, however, with increasing reverberation. More information on microphone arrays can be found in Flanagan et al. [1991]; in particular, they describe how to make arrays adapt to changing talker positions in a room by constantly scanning the room with a moving search beam and by switching the main beam accordingly. Current research issues are, among others, 3-D arrays and how to take advantage of low-order wall reflections. Digital Hearing Aids Commonly used hearing aids attempt to compensate for sensorineural (cochlear) hearing loss by delivering an amplifed acoustic signal to the external ear canal. As will be pointed out below, the most important problem is how to find the best aid for a given patient. FIGURE 19.1 A linear array of N microphones (here, N = 5; t = d/c cos q). FIGURE 19.2 Three superimposed linear arrays depicted by large, midsize, and small circles. The largest array covers the low frequencies, the midsize array covers the midrange frequencies, and the smallest covers the high frequencies. t dx = 0 x = dx = 2d 2t 3t 4t q X aee n jjcnd = -wt w q 0 +/( ) cos 0 ? 2000 by CRC Press LLC Historically, technology has been the limiting factor in hearing aids. Early on, carbon hearing aids provided a limited gain and a narrow, peaky frequency response. Nowadays, hearing aids have a broader bandwidth and a flatter frequency response. Consequently, more people can benefit from the improved technology. With the advent of digital technology, the promise is that even more people would be able to do so. Unfortunately, as will be pointed out below, we have not fulfilled this promise yet. We distinguish between analog, digitally controlled analog, and digital hearing aids. Analog hearing aids contain only (low-power) pre-amp, filter(s), (optional) automatic gain control (AGC) or compressor, power amp, and output limiter. Digitally controlled aids have certain additional components: one kind adds a digital controller to monitor and adjust the analog components of the aid. Another kind contains switched-capacitor circuits that represent sampled signals in analog form, in effect allowing simple discrete-time processing (e.g., filtering). Aids with switched-capacitor circuits have a lower power consumption compared to digital aids. Digital aids—none are yet commercially available—contain A/D and D/A converters and at least one program- mable digital signal processing (DSP) chip, allowing for the use of sophisticated DSP algorithms, (small) microphone arrays, speech enhancement in noise, etc. Experts disagree, however, as to the usefulness of these techniques. To date, the most successful approach seems to be to ensure that all parts of the signal get amplified so that they are clearly audible but not too loud and to “let the brain sort out signal and noise.” Hearing aids pose a tremendous challenge for the DSP engineer, as well as for the audiologist and acous- tician. Due to the continuing progress in chip tech- nology, the physical size of a digital aid should no longer be a serious problem in the near future; how- ever, power consumption will still be a problem for quite some time. Besides the obvious necessity of avoiding howling (acoustic feedback), for example, by employing sophisticated models of the electroa- coustic transducers, acoustic leaks, and ear canal to control the aid accordingly, there is a much more fundamental problem: since DSP allows complex schemes of splitting, filtering, compressing, and (re- ) combining the signal, hearing aid performance is no longer limited by bottlenecks in technology. It is still limited, however, by the lack of basic knowledge about how to map an arbitrary input signal (i.e., speech from a desired speaker) onto the reduced capabilities of the auditory system of the targeted wearer of the aid. Hence, the selection and fitting of an appropriate aid becomes the most important issue. This serious problem is illustrated in Fig. 19.3. It is important to note that for speech presented at a constant level, a linear (no compression) hearing aid can be tuned to do as well as a hearing aid with compression. However, if parameters like signal and background noise levels change dynamically, com- pression aids, in particular those with two bands or more, should have an advantage. While a patient usually has no problem telling whether setting A or B is “clearer,” adjusting more than just 2–3 (usually interdependent) parameters is very time consuming. For a multiparameter aid, an efficient fitting procedure that maximizes a certain objective is needed. Possible objectives are, for example, intelligibility maximization or loudness restoration. The latter objective is assumed in the following. It is known that an impaired ear has a reduced dynamic range. Therefore, the procedure for fitting a patient with a hearing aid could estimate the so-called loudness-growth function (LGF) that relates the sound pressure FIGURE 19.3Peak third-octave band levels of normal to loud speech (hatched) and typical levels/dominant frequen- cies of speech sounds (identifiers). Both can be compared to the third-octave threshold of normal-hearing people (solid line), thresholds for a mildly hearing-impaired person (A), for a severely hearing-impaired person (B), and for a pro- foundly hearing-impaired person (C). For example, for per- son (A), sibilants and some weak consonants in a normal conversation cannot be perceived. (Source: H. Levitt, “Speech discrimination ability in the hearing impaired: spectrum con- siderations,” in The Vanderbilt Hearing-Aid Report: State of the Art-Research Needs, G.A. Studebaker and F.H. Bess (Eds.), Monographs in Contemporary Audiology, Upper Darby, Pa., 1982, p. 34. With permission.) ? 2000 by CRC Press LLC level of a specific (band-limited) sound to its loudness. An efficient way of measuring the LGF is described by Allen et al. [1990]. Once the LGF of an impaired ear is known, a multiband hearing aid can implement the necessary compression for each band [Villchur, 1973]. Note, however, that this assumes that interactions between the bands can be neglected (problem of summation of partial loudnesses). This might not be valid for aids with a large number of bands. Other open questions include the choice of widths and filter shape of the bands, and optimization of dynamic aspects of the compression (e.g., time constants). For aids with just two bands, the crossover frequency is a crucial parameter that is difficult to optimize. Spatial Processing In spatial processing, audio signals are modified to give them new spatial attributes, such as, for example, the perception of having been recorded in a specific concert hall. The auditory system—using only the two ears as inputs—is capable of perceiving the direction and distance of a sound source with a high degree of accuracy, by exploiting binaural and monaural spectral cues. Wave propagation in the ear canal is essentially one- dimensional. Hence, the 3-D spatial information is coded by sound diffraction into spectral information before the sound enters the ear canal. The sound diffraction is caused by the head/torso (on the order of 20-dB and 600-ms interaural level difference and delay, respectively) and at the two pinnae (auriculae); see, for example, Shaw [1980]. Binaural techniques like the one discussed below can be used for evaluating room and concert- hall acoustics (optionally in reduced-scale model rooms using a miniature dummy head), for noise assessment (e.g., in cars), and for “Kunstkopfstereophonie” (dummy-head stereophony). In addition, there are techniques for loudspeaker reproduction (like “Q-Sound”) that try to extend the range in horizontal angle of traditional stereo speakers by using interaural cross cancellation. Largely an open question is how to reproduce spatial information for large audiences, for example, in movie theaters. Figure 19.4 illustrates the technique for filtering a single-channel source using measured headrelated transfer functions, in effect, creating a virtual sound source in a given direction of the listener’s auditory space (assuming plane waves, i.e., infinite source distance). On the left in this figure, the measurement of head-related transfer functions is shown. Focusing on the left ear for a moment (subscript l), we need to estimate the so-called free- field transfer function (subscript ff) for given angles of incidence in the horizontal plane (azimuth j) and vertical plane (elevation d): (19.3) where P probe,l is the Fourier transform of the sound pressure measured in the subject’s left ear, and P ref is the Fourier transform of the pressure measured at a suitable reference point in the free field without the subject being present (e.g., at the midpoint between the two ears). (Note that P ref is independent of the direction of sound incidence since we assume an anechoic environment.) The middle of Fig. 19.4 depicts the convolution FIGURE 19.4 Measuring and using transfer functions of the external ear for binaural mixing (FIR = finite impulse response). (Source: E.M. Wenzel, Localization in virtual acoustic displays, Presence, vol. 1, p. 91, 1992. With permission.) Hj Pj Pj ff,l probe,l ref (,,) (,,)/()wjd wjd w= ? 2000 by CRC Press LLC of any “dry” (e.g., mono, low reverberation) source with the stored H ff,1 (jw, j, d)s and corresponding H ff,r (jw, j, d)s. On the right side in the figure, the resulting binaural signals are reproduced via equalized headphones. The equalization ensures that a sound source with a flat spectrum (e.g., white noise) does not suffer any perceivable coloration for any direction (j, d). Implemented in a real-time “binaural mixing console,” the above scheme can be used to create “virtual” sound sources. When combined with an appropriate scheme for interpolating head-related transfer functions, moving sound sources can be mimicked realistically. Furthermore, it is possible to superimpose early reflections of a hypothetical recording room, each filterered by the appropriate head-related transfer function. Such inclusion of a room in the simulation makes the spatial reproduction more robust against individual differences between “recording” and “listening” ears, in particular, if the listener’s head movements are fed back to the binaural mixing console. (Head movements are useful for disambiguating spatial cues.) Finally, such a system can be used to create “virtual acoustic displays,” for example, for pilots and astronauts [Wenzel, 1992]. Other research issues are, for example, the required accuracy of the head-related transfer functions, intersubject variability, and psychoacoustic aspects of room simulations. Audio Coding Audio coding is concerned with compressing (reducing the bit rate) of audio signals. The uncompressed digital audio of compact disks (CDs) is recorded at a rate of 705.6 kbit/s for each of the two channels of a stereo signal (i.e., 16 bit/sample, 44.1-kHz sampling rate; 1411.2 kbit/s total). This is too high a bit rate for digital audio broadcasting (DAB) or for transmission via end-to-end digital telephone connections (integrated services digital network, ISDN). Current audio coding algorithms provide at least “better than FM” quality at a combined rate of 128 kbit/s for the two stereo channels (2 ISDN B channels!), “transparent coding” at rates of 96 to 128 kbit/s per mono channel, and “studio quality” at rates between 128 and 196 kbit/s per mono channel. (While a large number of people will be able to detect distortions in the first class of coders, even so-called “golden ears” should not be able to detect any differences between original and coded versions of known “critical” test signals; the highest quality category adds a safety margin for editing, filtering, and/or recoding.) To compress audio signals by a factor as large as eleven while maintaining a quality exceeding that of a local FM radio station requires sophisticated algorithms for reducing the irrelevance and redundancy in a given signal. A large portion (but usually less than 50%) of the bit-rate reduction in an audio coder is due to the first of the two mechanisms. Eliminating irrelevant portions of an input signal is done with the help of psychoa- coustic models. It is obvious that a coder can eliminate portions of the input signal that—when played back—will be below the threshold of hearing. More complicated is the case when we have multiple signal components that tend to cover each other, that is, when weaker components cannot be heard due to the presence of stronger components. This effect is called masking. To let a coder take advantage of masking effects, we need to use good masking models. Masking can be modeled in the time domain where we distinguish so-called simultaneous masking (masker and maskee occur at the same time), forward masking (masker occurs before maskee), and backward masking (masker occurs after maskee). Simultaneous masking usually is modeled in the frequency domain. This latter case is illustrated in Fig. 19.5. Audio coders that employ common frequency-domain models of masking start out by splitting and sub- sampling the input signal into different frequency bands (using filterbanks such as subband filterbanks or time- frequency transforms). Then, the masking threshold (i.e., predicted masked threshold) is determined, followed by quantization of the spectral information and (optional) noiseless compression using variable-length coding. The encoding process is completed by multiplexing the spectral information with side information, adding error protection, etc. The first stage, the filter bank, has the following requirements. First, decomposing and then simply recon- structing the signal should not lead to distortions (“perfect reconstruction filterbank”). This results in the advantage that all distortions are due to the quantization of the spectral data. Since each quantizer works on band-limited data, the distortion (also band-limited due to refiltering) is controllable by using the masking models described above. Second, the bandwidths of the filters should be narrow to provide sufficient coding gain. On the other hand, the length of the impulse responses of the filters should be short enough (time resolution of the coder!) to avoid so-called pre-echoes, that is, backward spreading of distortion components ? 2000 by CRC Press LLC that result from sudden onsets (e.g., castanets). These two contradictory requirements, obviously, have to be worked out by a compromise. “Critical band” filters have the shortest impulse responses needed for coding of transient signals. On the other hand, the optimum frequency resolution (i.e., the one resulting in the highest coding gain) for a typical signal can be achieved by using, for example, a 2048-point modified discrete cosine transform (MDCT). In the second stage, the (time-varying) masking threshold as determined by the psychoacoustic model usually controls an iterative analysis-by-synthesis quantization and coding loop. It can incorporate rules for masking of tones by noise and of noise by tones, though little is known in the psychoacoustic literature for more general signals. Quantizer step sizes can be set and bits can be allocated according to the known spectral estimate, by block companding with transmission of the scale factors as side information or iteratively in a variable-length coding loop (Huffman coding). In the latter case, one can low-pass filter the signal if the total required bit rate is too high. The decoder has to invert the processing steps of the encoder, that is, do the error correction, perform Huffman decoding, and reconstruct the filter signals or the inverse-transformed time-domain signal. Since the decoder is significantly less complex than the encoder, it is usually implemented on a single DSP chip, while the encoder uses several DSP chips. Current research topics encompass tonality measures and time-frequency representations of signals. More information can be found in Johnston and Brandenburg [1991]. Echo Cancellation Echo cancellers were first deployed in the U.S. telephone network in 1979. Today, they are virtually ubiquitous in long-distance telephone circuits where they cancel so-called line echoes (i.e., electrical echoes) resulting from nonperfect hybrids (the devices that couple local two-wire to long-distance four-wire circuits). In satellite circuits, echoes bouncing back from the far end of a telephone connection with a round-trip delay of about 600 ms are very annoying and disruptive. Acoustic echo cancellation—where the echo path is characterized by the transfer function H(z) between a loudspeaker and a microphone in a room (e.g., in a speakerphone)—is crucial for teleconferencing where two or more parties are connected via full-duplex links. Here, echo cancel- lation can also alleviate acoustic feedback (“howling”). The principle of acoustic echo cancellation is depicted in Fig. 19.6(a). The echo path H(z) is cancelled by modeling H(z) by an adaptive filter and subtracting the filter’s output $ y(t) from the microphone signal y(t). The adaptability of the filter is necessary since H(z) changes appreciably with movement of people or objects in the room and because periodic measurements of the room would be impractical. Acoustic echo cancellation is more challenging than cancelling line echoes for several reasons. First, room impulse responses h(t) are longer than 200 ms compared to less than 20 ms for line echo cancellers. Second, the echo path of a room h(t) is likely to change constantly (note that even small changes in temperature can cause significant changes of h). Third, FIGURE 19.5Masked threshold in the frequency domain for a hypothetical input signal. In the vicinity of high-level spectral components, signal components below the current masked threshold cannot be heard. ? 2000 by CRC Press LLC teleconferencing eventually will demand larger audio bandwidths (e.g., 7 kHz) compared to standard telephone connections (about 3.2 kHz). Finally, we note that echo cancellation in a stereo setup (two microphones and two loudspeakers at each end) is an even harder problem on which very little work has been done so far. It is obvious that the initially unknown echo path H(z) has to be “learned” by the canceller. It is also clear that for adaptation to work there needs to be a nonzero input signal x(t) that excites all the eigenmodes of the system (resonances, or “peaks” of the system magnitude response *H(jv)*). Another important problem is how to handle double-talk (speakers at both ends are talking simultaneously). In such a case, the canceller could easily get confused by the speech from the near end that acts as an uncorrelated noise in the adaptation. Finally, the convergence rate, that is, how fast the canceller adapts to a change in the echo path, is an important measure to compare different algorithms. Adaptive filter theory suggests several algorithms for use in echo cancellation. The most popular one is the so-called least-mean square (LMS) algorithm that models the echo path by an FIR filter with an impulse response $ h(t). Using vector notation h for the true echo path impulse response, $ h for its estimate, and x for the excitation time signal, an estimate of the echo is obtained by $ y(t) = $ h¢x, where the prime denotes vector transpose. A reasonable objective for a canceller is to minimize the instantaneous squared error e 2 (t), where e(t) = y(t) – $y(t). The time derivative of $ h can be set to (19.4) resulting in the simple update equation $ h k+1 = $ h k + ae k x k , where a (or m) control the rate of change. In practice, whenever the far-end signal x(t) is low in power, it is a good idea to freeze the canceller by setting a = 0. Sophisticated logic is needed to detect double talk. When it occurs, then also set a = 0. It can be shown that the spread of the eigenvalues of the autocorrelation matrix of x(t) determines the convergence rate, where the slowest-converging eigenmode corresponds to the smallest eigenvalue. Since the eigenvalues themselves scale with the power of the predominant spectral components in x(t), setting a = 2m/(x¢x) will make the convergence rate independent of the far-end power. This is the normalized LMS method. Even then, however, all eigenmodes will converge at the same rate only if x(t) is white noise. Therefore, pre-whitening the far-end signal will help in speeding up convergence. The LMS method is an iterative approach to echo cancellation. An example of a noniterative, block-oriented approach is the least-squares (LS) algorithm. Solving a system of equations to get $ h, however, is computationally more costly. This cost can be reduced considerably by running the LS method on a sample-by-sample basis and by taking advantage of the fact that the new signal vectors are the old vectors with the oldest sample dropped and one new sample added. This is the recursive least-squares (RLS) algorithm. It also has the advantage FIGURE 19.6 (a) Principle of using an echo canceller in teleconferencing. (b) Realization of the echo canceller in subbands. ( After M. M. Sondhi and W. Kellermann, “Adaptive echo cancellation for speech signals,” in Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, Eds., New York: Marcel Dekker, 1991. By courtesy of Marcel Dekker, Inc.) d d ^ ^^ h x hh t e t et et et=- ? =- ? =mm m 2 22() () () () ? 2000 by CRC Press LLC of normalizing x by multiplying it with the inverse of its autocorrelation matrix. This, in effect, equalizes the adaptation rate of all eigenmodes. Another interesting approach is outlined in Fig. 19.6(b). As in subband coding (discussed earlier), splitting the signals x and y into subbands with analysis filterbanks A, doing the cancellation in bands, and resynthesizing the outgoing (“error”) signal e through a synthesis filterbank S also reduces the eigenvalue spread of each bandpass signal compared to the eigenvalue spread of the fullband signal. This is true for the eigenvalues that correspond to the “center” (i.e., unattenuated) portions of each band. It turns out, however, that the slowly converging “transition-band” eigenmodes get attenuated significantly by the synthesis filter S. The main advan- tage of the subband approach is the reduction in computational complexity due to the down-sampling of the filterbank signals. The drawback of the subband approach, however, is the introduction of the combined delay of A and S. Eliminating the analysis filterbank on y(t) and moving the synthesis filterbank into the adaptation branch $ Y will remove this delay with the result that the canceller will not be able to model the earliest portions of the echo-path impulse response h(t). To alleviate this problem, we could add in parallel a fullband echo canceller with a short filter. Further information and an extensive bibliography can be found in Haensler [1992]. Active Noise and Sound Control Active noise control (ANC) is a way to reduce the sound pressure level of a given noise source through electroacoustic means. ANC and echo cancellation are somewhat related. While even acoustic echo cancellation is actually done on electrical signals, ANC could be labeled “wave cancellation,” since it involves using one or more secondary acoustic or vibrational sources. Another important difference is the fact that in ANC one usually would like to cancel a given noise in a whole region in space, while echo cancellation commonly involves only one microphone picking up the echo signal at a single point in space. Finally, the transfer function of the transducer used to generate a cancellation (“secondary source”) signal needs to be considered in ANC. Active sound control (ASC) can be viewed as an offspring of ANC. In ASC, instead of trying to cancel a given sound field, one tries to control specific spatial and temporal characteristics of the sound field. One application is in adaptive sound reproduction systems. Here, ASC aims at solving the large-audience spatial reproduction problem mentioned in the spatial processing section of this chapter. Two important principles of ANC are depicted in Fig. 19.7. In the upper half [Fig. 19.7(a) and (b)], a feedback loop is formed between the controller G(s) and the transfer function C(s) of the secondary source, and the acoustic path to the error microphone. Control theory suggests that E/Y = 1/[1 + C(s)G(s)], where E(s) and Y(s) are Laplace transforms of e(t) and y(t), respectively. Obviously, if we could make C a real constant and G ? ¥, we would get a “zone of quiet” around the error microphone. Unfortunately, in practice, C(s) will introduce at least a delay, thus causing stability problems for too large a magnitude *G* at high enough frequencies. The system can be kept stable, for example, by including a low-pass filter in G and by positioning the secondary source in close vicinity to the error microphone. A highly successful application of the feedback FIGURE 19.7Two principles of active noise control. Feedback control system (a) and (b); feedforward control system (c) and (d). Physical block diagrams (a) and (c), and equivalent electrical forms (b) and (d). (After P. A. Nelson and S. J. Elliott, Active Control of Sound, London: Academic Press, 1992. With permission.) ? 2000 by CRC Press LLC control in ANC is in active hearing protective devices (HPDs) and high-quality headsets and “motional- feedback” loudspeakers. Passive HPDs offer little or no noise attenuation at low frequencies due to inherent physical limitations. Since the volume enclosed by earmuffs is rather small, HPDs can benefit from the increase in low-frequency attenuation brought about by feedback-control ANC. Finally, note that the same circuit can be used for high-quality reproduction of a communications signal s(t) fed into a headset by subtracting s(t) electrically from e(t). The resulting transfer function is E/S = C(s)G(s)/[1 + C(s)G(s)] assuming Y(s) = 0. Thus, a high loop gain *G(s)* will ensure both, a high noise attenuation at low frequencies and a faithful bass reproduction of the communications signal. The principle of the feedforward control method in ANC is outlined in the lower half of Fig. 19.6(c) and (d). The obvious difference to the feedback control method is that the separate reference signal x(t) is used. Here, cancellation is achieved for the filter transfer function W = H(s)/C(s) which is most often implemented by an adaptive filter. The fact that x(t) reaches the ANC system earlier than e(t) allows for a causal filter, needed in broadband systems. However, a potential problem with this method is the possibility of feedback of the secondary source signal $ y(t) into the path of the reference signal x(t). This is obviously the case when x(t) is picked up by a microphone in a duct just upstream of the secondary source C. An elegant solution for ANC in a duct without explicit feedback cancellation is to use a recursive filter W. Single error signal/single secondary source systems cannot achieve global cancellation or sound control in a room. An intuitive argument for this fact is that one needs at least as many secondary sources and error microphones as there are orthogonal wave modes in the room. Since the number of wave modes in a room below a given frequency is approximately proportional to the third power of this frequency, it is clear that ANC (and ASC) is practical only at low frequencies. In practice, using small (point-source) transducers, it turns out that one should use more error microphones than secondary sources. Examples of such multidimensional ANC systems are employed for cancelling the lowest few harmonics of the engine noise in an airplane cabin and in a passenger car. In both of these cases, the adaptive filter matrix is controlled by a multiple-error version of the LMS algorithm. Further information can be found in Nelson and Elliott [1992]. Summary and Acknowledgment In this section, we have touched upon several topics in audio and electroacoustics. The reader may be reminded that the author’s choice of these topics was biased by his background in communication acoustics (and by his lack of knowledge in music). Furthermore, ongoing efforts in integrating different communication modalities into systems for teleconferencing [see, e.g., Flanagan et al., 1990] had a profound effect in focusing this contribution. Experts in topics covered in this contribution, like Jont Allen, David Berkley, Gary Elko, Joe Hall, Jim Johnston, Mead Killion, Harry Levitt, Dennis Morgan, and—last, but not least—Mohan Sondhi, are gratefully acknowledged for their patience and help. Defining Terms Audio: Science of processing signals that are within the frequency range of hearing, that is, roughly between 20 Hz and 20 kHz. Also name for this kind of signal. Critical bands: Broadly used to refer to psychoacoustic phenomena of limited frequency resolution in the cochlea. More specifically, the concept of critical bands evolved in experiments on the audibility of a tone in noise of varying bandwidth, centered around the frequency of the tone. Increasing the noise bandwidth beyond a certain critical value has little effect on the audibility of the tone. Electroacoustics: Science of interfacing between acoustical waves and corresponding electrical signals. This includes the engineering of transducers (e.g., loudspeakers and microphones), but also parts of the psychology of hearing, following the notion that it is not necessary to present to the ear signal components that cannot be perceived. Intelligibility maximization and loudness restoration: Two different objectives in fitting hearing aids. Maximiz- ing intelligibility involves conducting laborious intelligibility tests. Loudness restoration involves measuring the mapping between a given sound level and its perceived loudness. Here, we assume that recreating the loudness a normal hearing person would perceive is close to maximizing the intelligibility of speech. ? 2000 by CRC Press LLC Irrelevance and redundancy: In audio coding, irrelevant portions of an audio signal can be removed without perceptual effect. Once removed, however, they cannot be regenerated in the decoder. Contrary to this, redundant portions of a signal that have been removed in the encoder can be regenerated in the decoder. The “lacking” irrelevant parts of an original signal constitute the major cause for a (misleadingly) low signal-to-noise ratio (SNR) of the decoded signal while its subjective quality can still be high. Monaural/interaural/binaural: Monaural attributes of ear input signals (e.g., timbre, loudness) require, in principle, only one ear to be detected. Interaural attributes of ear input signals (e.g., localization in the horizontal plane) depend on differences between, or ratios of measures of, the two ear input signals (e.g., delay and level differences). Psychoacoustic effects (e.g., cocktail-party effect) that depend on the fact that we have two ears are termed binaural. Related Topics 15.2 Speech Enhancement and Noise Reduction ? 73.2 Noise References J.B. Allen, J.L. Hall, and P.S. Jeng, “Loudness growth in 1/2-octave bands (LGOB) — A procedure for the assessment of loudness,” J. Acoust. Soc. Am., vol. 88, no. 2, pp. 745–753, 1990. J.L. Flanagan, D.A. Berkley, and K.L. Shipley, “Integrated information modalities for human/machine commu- nication: HuMaNet, an experimental system for conferencing,” J. of Visual Communication and Image Representation, vol. 1, no. 2, pp. 113–126, 1990. J.L. Flanagan, D.A. Berkley, G.W. Elko, J.E. West, and M.M. Sondhi, “Autodirective microphone systems,” Acustica, vol. 73, pp. 58–71, 1991. E. Haensler, “The hands-free telephone problem—An annotated bibliography,” Signal Processing, vol. 27, pp. 259–271, 1992. J.D. Johnston and K. Brandenburg, “Wideband coding—perceptual considerations for speech and music,” in Advances in Speech Signal Processing, S. Furui and M.M. Sondhi, Eds., New York: Marcel Dekker, 1991. F.R. Moore, Elements of Computer Music, Englewood Cliffs, N.J.: Prentice-Hall, 1990. P. A. Nelson and S.J. Elliott, Active Control of Sound, London: Academic Press, 1992. K. C. Pohlmann, Principles of Digital Audio, 2nd ed., Carmel, Ind.: SAMS/Macmillan Computer Publishing, 1989. E. A. G. Shaw, “The acoustics of the external ear,” in Acoustical Factors Affecting Hearing Aid Performance, G. A. Studebaker and I. Hochberg, Eds., Baltimore, Md.: University Park Press, 1980. E. Villchur, “Signal processing to improve speech intelligibility in perceptive deafness,” J. Acoust. Soc. Am., vol. 53, no. 6, pp. 1646–1657, 1973. E.M. Wenzel, “Localization in virtual acoustic displays,” Presence, vol. 1, pp. 80–107, 1992. Further Information A highly informative article that is complementary to this contribution is the one by P. J. Bloom, “High-quality digital audio in the entertainment industry: An overview of achievements and challenges,” IEEE-ASSP Magazine, Oct. 1985. An excellent introduction to the fundamentals of audio, including music synthesis and digital recording, is contained in the 1992 book Music Speech Audio, by W. J. Strong and G. R. Plitnik, available from Soundprint, 2250 North 800 East, Provo, UT 84604 (ISBN 0-9611938-2-4). Oversampling Delta-Sigma Data Converters is a 1992 collection of papers edited by J. C. Candy and G. C. Temes. It is available from IEEE Press (IEEE order number PC0274-1). Specific issues of the Journal of Rehabilitation Research and Development (ISSN 007-506X), published by the Veterans Administration, are a good source of information on hearing aids, in particular the Fall 1987 issue. Spatial Hearing is the title of a 1982 book by J. Blauert, available from MIT Press (ISBN 0-262-02190-0). Anyone interested in Psychoacoustics should look into the 1990 book of this title by E. Zwicker and H. Fastl, available from Springer-Verlag (ISBN 0-387-52600-5). ? 2000 by CRC Press LLC The Institute of Electrical and Electronics Engineers (IEEE) Transactions on Speech and Audio Processing is keeping up-to-date on algorithms in audio. Every two to three years, a workshop on applications of signal processing to audio and electroacoustics covers the latest advances in areas introduced in this article. IEEE can be reached at 445 Hoes Lane, Piscataway, NJ 08855-1331, ph. (908) 981-0060. The Journal of the Audio Engineering Society (AES) is another useful source of information on audio. The AES can be reached at 60 East 42nd St., Suite 2520, New York, NY 10165-0075, ph. (212) 661-8528. The Journal of the Acoustical Society of America (ASA) contains information on physical, psychological, and physiological acoustics, as well as on acoustic signal processing, among other things. ASA’s “Auditory Demonstrations” CD contains examples of signals demonstrating hearing-related phenomena ranging from “critical bands” over “pitch” to “binaural beats.” ASA can be reached at 500 Sunnyside Blvd., Woodbury, NY 11797-2999, ph. (516) 576-2360. 19.2 Underwater Acoustical Signal Processing Sanjay K. Mehta and G. Clifford Carter What Is Underwater Acoustical Signal Processing? The use of acoustical signals that have propagated through water to detect, classify, and localize underwater objects is referred to as underwater acoustical signal processing. Why Exploit Sound for Underwater Applications? It has been found that acoustic energy propagates better under water than other types of energy. For example, both light and radio waves (used for satellite or above-ground communications) are attenuated to a far greater degree under water than are sound waves. For this reason sound waves have generally been used to extract information about underwater objects. A typical underwater acoustical signal processing scenario is shown in Fig. 19.8. Technical Overview In underwater acoustics, a number of units are used: distances of nautical miles (1852 m), yards (0.9144 m) and kiloyards; speeds of knots (nautical mile/h); depths of fathoms (6 ft or 1.8288 m); and bearing of degrees (0.01745 rad). However, in the past two decades there has been a conscious effort to be totally metric, i.e., to use MKS or Standard International units. Underwater acoustic signals to be processed for detection, classification, and localization can be characterized from a statistical point of view. When time averages of each waveform are the same as the ensemble average of waveforms, the signals are ergodic. When the statistics do not change with time, the signals are said to be stationary. The spatial equivalent to stationary is homogeneous. For many introductory problems, only sta- tionary signals and homogeneous noise are considered; more complex problems involve nonstationary, inho- mogeneous environments. Acoustic waveforms of interest have a probability density function (PDF); for example, the PDF may be Gaussian or in the case of clicking, sharp noise spikes, or crackling ice noise, the PDF may be non-Gaussian. In addition to being characterized by a PDF, signals can be characterized in the frequency domain by their power spectral density functions, which are Fourier transforms of the autocorrelation functions. White signals, which are uncorrelated from sample to sample, have a delta function autocorrelation and equivalently a flat (constant) power spectral density. Ocean signals in general are much more colorful and not limited to being stationary. Passive sonar signals are primarily modeled as random signals. Their first-order PDFs are typically Gaussian; one exception is a stable sinusoidal signal that is non-Gaussian and has a power spectral density function that is a Dirac delta function in the frequency domain. However, in the ocean environment, an arbitrarily narrow frequency width is never observed, and signals have some finite narrow bandwidth. Indeed, the full spectrum of most underwater signals is quite “colorful.” Received active sonar signals can be viewed as consisting of the results of a deterministic component (known transmit waveform) convolved with the medium and reflector transfer functions and a random (noise) component. Moreover, the Doppler imparted (frequency shift) to the reflected signal makes the total system effect nonlinear, thereby complicating analysis and processing of these signals. ? 2000 by CRC Press LLC SONAR SONAR, “SOund NAvigation and Ranging,” the acronym adopted in the 1940s, similar to the popular RADAR, “RAdio Detection And Ranging,” involves the use of sound to explore the ocean and underwater objects ?Passive sonar uses sound radiated from the underwater object itself. The duration of the radiated sound may be short or long in time and narrow or broad in frequency. Only one-way transmission through the ocean, from the acoustic source to a receiving sensor, is involved in this case. ?Active sonar involves echo-ranging where an acoustical signal is transmitted from a source, and reflected echoes are received back from the object. Here one is concerned with two-way transmissions from a transmitter to an object and back to a receiving sensor. There are three types of active sonar systems: 1.Monostatic: In this most common form, the source and receiver are either identical or distinct but located on the same platform (e.g., a surface ship). 2.Bistatic: In this form, the transmitter and receiver are on different platforms. 3.Multistatic: Here, a single (or more) source or transmitter and multiple receivers, which can be located on different receiving platforms or ships, are used. The performance of sonar systems can be assessed by the passive and active sonar equations. The major parameters in the sonar equation, measured in decibels (dB), are as follows: L S = source level L N = noise level N DI = directivity index N TS = echo level or target strength N RD = recognition differential Here, L S is the target-radiated signal strength (for passive) or transmitted signal strength (for active), and L N is the total background noise level. N DI , or DI, is the directivity index, which is a measure of the capability of a receiving array to discriminate against unwanted noise. N TS is the received echo level or target strength. Under- water objects with large values of N TS are more easily detectable with active sonar than are those with small values of N TS . In general, N TS varies as a function of object size, aspect angle, (i.e., the direction at which impinging acoustic energy reaches the underwater object), and reflection angle (the direction at which the impinging acoustic energy is reflected off the underwater object). N RD is the recognition differential of the processing system. FIGURE 19.8Active and passive underwater acoustical signal processing. Satellite Aircraft Sonobuoys Active Sonar Bottom Scattering Ship Wrecks Mines UUV Passive Sonar Oil Rig Fishing Ship Underwater Scatter Biologics (Fish, Plankton) Bubbles ? 1999 by CRC Press LLC The figure of merit (FOM), a basic performance measure involving parameters of the sonar system, ocean, and target, is computed for active and passive sonar systems (in dBs) as follows: For passive sonar, FOM P = L S – (L N – N DI ) – N RD (19.5) For active sonar, FOM A = (L S + N TS ) – (L N – N DI ) – N RD (19.6) Sonar systems, for a given set of parameters of the sonar equations, are designed so that the FOM exceeds the acoustic propagation loss. The amount above the FOM is called the signal excess. When two sonar systems are compared, the one with the largest signal excess is said to hold the acoustic advantage. However, it should be noted that the set of parameters in the preceding FOM equations is not unique. Depending on the design or parameter measurability conditions, different parameters can be combined or expanded in terms of quantities such as frequency dependency of the sonar system in particular ocean conditions, speed and bearing of the receiving or transmitting platforms, reverberation loss, and so forth. Furthermore, due to multipaths, differences in sonar system equipment and operation, and the constantly changing nature of the ocean medium, the FOM parameters fluctuate with time. Thus, the FOM is not an absolute measure of performance but rather an expected value of performance over time in a stochastic sense [for details, see Urick, 1983]. Underwater Propagation Speed/Velocity of Sound Sound speed, c, in the ocean, in general lies between 1450–1540 m/s and varies as a function of several physical parameters, such as temperature, salinity, and pressure (depth). Variations in sound speed can significantly affect the propagation (range or quality) of sound in the ocean. Table 19.1 gives approximate expressions for sound speed as a function of these physical parameters. Sound Velocity Profiles Sound rays that are normal (perpendicular) to the acoustic wavefront can be traced from the source to the receiver by a process called ray tracing. 1 In general, the acoustic ray paths are not straight, but bend in a manner analogous to optical rays focused by a lens. In underwater acoustics, the ray paths are determined by the sound velocity profile (SVP) or sound speed profile (SSP), that is, the speed of sound in water as a function of water 1 Ray tracing models are used for high-frequency signals and in deep water. Generally, if the depth-to-wavelength ratio is 100 or more, ray tracing models are accurate. Below that, corrections must be made to the ray trace models. In shallow water or low frequencies, i.e., when depth-to-wavelength is about 30 or less, “mode theory” models are used. TABLE 19.1Expressions for Sound Speed in Meters per Second Expression Limits c =1492.9 + 3(T – 10) – 6 2 10 –3 (T – 10) 2 –2 £ T £ 24.5° 4 2 10 –2 (T – 18) 2 + 1.2(S – 35) 30 £ S £ 42 – 10 –2 (T – 18)(S – 35) + D/61 0 £ D £ 1.000 c =1449.2 + 4.6T – 5.5 2 10 –2 T 2 0 £ T £ 35° + 2.9 2 10 –4 T 3 + (1.34 – 10 –2 T)(S – 35) 0 £ S £ 45 + 1.6 2 10 –2 D 0 £ D £ 1,000 c =1448.96 + 4.591T – 5.304 2 10 –2 T 2 0 £ T £ 30° + 2.374 2 10 –4 T 3 + 1.340(S – 35) 30 £ S £ 40 + 1.630 2 10 –2 D + 1.675 2 10 –7 D 2 0 £ D £ 8,000 – 1.025 2 10 –2 T(S – 35) – 7.139 2 10 –13 TD 3 D = depth, in meters. S = salinity, in parts per thousand. T = temperature, in degrees Celsius. ? 2000 by CRC Press LLC depth. The sound speed not only varies with depth but also varies in different regions of the ocean and with time as well. In deep water, the SVP fluctuates the most in the upper ocean due to variations of temperature and weather. Just below the sea surface is the surface layer where the sound speed is greatly affected by temperature and wind action. Below this layer lies the seasonal thermocline where the temperature and speed decrease with depth, and the variations are seasonal. In the next layer, the main thermocline, the temperature and speed decrease with depth and surface conditions or seasons have little effect. Finally, there is the deep isothermal layer where the temperature is nearly constant at 39°F, and the sound velocity increases almost linearly with depth. A typical deep water sound velocity profile as a function of depth is shown in Fig. 19.9. If the sound speed is a minimum at a certain depth below the surface, then this depth is called axis of the underwater sound channel. 2 The sound velocity increases both above and below this axis. When the sound wave travels through a medium with a sound speed gradient, the direction of travel of sound wave is bent towards the area of lower sound speed. Although the definition of shallow water can be signal dependent, in terms of depth-to-wavelength ratio, water depth of less than 1000 meters is generally referred to as shallow water. In shallow water the SVP is irregular and difficult to predict because of large surface temperature and salinity variations, wind effects, and multiple reflections of sound from the ocean bottom. Propagation Modes In general, there are three dominant propagation paths that depend on the distance or range between the acoustic source and the receiver (Fig. 19.10). ?Direct Path: Sound energy travels in (nominal) straight line path between the source and receiver, usually present at short ranges. ?Bottom Bounce Path: Sound energy is reflected from the ocean bottom (present at intermediate ranges). ?Convergence Zone (CZ) Path: Sound energy converges at longer ranges where multiple acoustic ray paths add or recombine coherently to reinforce the presence of acoustic energy from the radiating/reflecting source. Figure 19.11 shows the propagation loss as a function of range for different frequencies of the signal. Note the recombination of energy at the convergence zones. Multipaths The ocean contains multiple acoustic paths that split the acoustic energy. When the receiving system can resolve these multiple paths (or multipaths), then they should be recombined by optimal signal processing to fully exploit the available acoustic energy for detection [Chan, 1989]. It is also theoretically possible to exploit the geometrical properties of multipaths present in the bottom bounce path by investigation of the apparent aperture created by the different path arrivals to localize the energy source. In the case of first-order bottom bounce transmission, i.e., only one bottom interaction, there are four paths (from source to receiver): 1.A bottom bounce ray path (B). 2.A surface interaction followed by a bottom interaction (SB). 3.A bottom bounce followed by a surface interaction (BS). 4.A path that first hits the surface, then the bottom, and finally the surface (SBS). Typical first-order bottom bounce ocean propagation paths are depicted in Fig. 19.12. 2 Often called the SOFAR (Sound Fixing and Ranging) channel. FIGURE 19.9 A typical sound velocity profile (SVP). 3,000 4,850 4,900 4,950 5,000 6,000 9,000 12,000 Depth, ft Velocity of sound, ft/s Seasonal thermocline Main thermocline Surface layer Deep isothermal layer ? 2000 by CRC Press LLC FIGURE 19.10 Typical sound paths between source and receiver. (Source: A.W. Cox, Sonar and Underwater Sound, Lex- ington, Mass., Lexington Books, D.C. Health and Company, 1974, p. 25. With permission.) FIGURE 19.11 Propagation loss as a function of range. FIGURE 19.12 Multipaths for a first-order bottom bounce propagation model. 0 25 50 75 100 125 150 175 200 DP 1st CZ 2nd CZ 3rd CZ LEGEND DP - Direct Sound Path BB - Bottom Bounce Sound Path CZ - Convergence Zone Sound Path AREA ASSUMED - Mid North Atlantic Ocean Distance - k YDS 2700 Fathoms BB P ath CZ P ath Bottom ? 2000 by CRC Press LLC Performance Limitations In a typical reception of a signal wavefront, noise and interference can degrade the performance of a sonar system and limit the system’s ability to detect signals in the underwater environment. The noise or interference could be sounds from a school of fish, shipping (surface or subsurface) noise, active transmission interference (e.g., jammers), or interference when multiple receivers or sonar systems are in operation simultaneously. Also, the ambient noise may have unusual vertical or horizontal directivity and in some environments, such as the Arctic, the noise due to ice motion may produce unfamiliar interference. Unwanted backscatters, similar to the headlights of a car driving in fog, can cause a signal-induced noise that degrades processing gain without proper processing. Some other performance-limiting factors are the loss of signal level and acoustic coherence due to boundary interaction as a function of grazing angle; the radiated pattern (signal level) of the object and its spatial coherence; the presence of surface, bottom, and volume reverberation (in active sonar); signal spreading owing to the modulating effect of surface motion; biologic noise as a function of time (both time of day and time of year); and statistics of the noise in the medium. (Does the noise arrive in the same or at different ray path angles as the signal?) Hydrophone Sensors and Output Hydrophone sensors are underwater microphones capable of operating in water and under hydrostatic pressure. These sensors receive radiated and reflected acoustic energy that arrives through the multiple paths of the ocean medium from a variety of sources and reflectors. As with a microphone, hydrophones convert acoustic pressure to electrical voltages or to optical signals. A block diagram model of a stationary acoustic source, s(t), input to M unique hydrophone receivers is shown in Fig. 19.13. Multipaths from the source to each receiver can be characterized by the source to (each individual) receiver impulse response. The inverse Fou- rier transforms of these impulse responses are the trans- fer functions shown in the block diagram as A j (f), where the subscript, j = 1,..., M, denotes the appropriate source-to-receiver transfer function. For widely spaced receivers, there will be a different transfer function from the source to each receiver. Also, for multiple sources and widely spaced receivers, there will be a different transfer function from each source to each receiver. The receiver outputs from a single source are modeled as being corrupted by additive noise, n j (t), as shown in Fig. 19.13. Processing Functions Beamforming Beamforming is a process in which outputs from the hydrophone sensors of an array are coherently combined by delaying and summing the outputs to provide enhanced detection and estimation. In underwater applica- tions, one is trying to detect a directional (single direction) signal in the presence of normalized background noise that is ideally isotropic (nondirectional). By arranging the hydrophone (array) sensors in different physical geometries and electronically steering them in a particular direction, one can increase the signal-to-noise ratio (SNR) in a given direction by rejecting or canceling the noise in other directions. There are many different kinds of arrays (e.g., equally spaced line, continuous line, circular, cylindrical, spherical, or random sonobuoy arrays). The beam pattern specifies the response of these arrays to the variation in direction. In the simplest case, the increase in SNR due to the beamformer, called the array gain (in dB), is given by (19.7) FIGURE 19.13Hydrophone receiver model: source sig- nal s(t) through medium filter A j (t), corrupted by additive noise received at one of M hydrophones. AG SNR SNR array(output) singlesensor(input) =10log ? 2000 by CRC Press LLC Detection Detection of signals in the presence of noise, using classical Bayes or Neyman-Pearson decision criteria, is based on hypothesis testing. In the simplest binary hypothesis case, the detection problem is posed as two hypotheses: ?H 0 : Signal is not present (referred to as the null hypothesis). ?H 1 : Signal is present. For a received wavefront, H 0 relates to the noise-only case and H 1 to the signal-plus-noise case. Complex hypotheses (M-hypotheses) can also be formed if detecting a signal among a variety of sources is required. Probability is a measure, between zero and unity, of how likely an event is to occur. For a received wavefront the likelihood ratio, L, is the ratio of P H1 (probability that hypothesis H 1 is true) to P H0 (probability that hypothesis H 0 is true). A decision (detection) is made by comparing the likelihood, or logarithm of the likelihood ratio called the log-likelihood ratio, to a predetermined threshold h. That is, if L = P H1 /P H0 > h, a decision is made that the signal is present. Probability of detection, P D , measures the likelihood of detecting an event or object when the event does occur. Probability of false alarm, P fa , is a measure of the likelihood of saying something happened when the event did NOT occur. Receiver operating characteristics (ROC) curves plot P D versus P fa for a particular (sonar signal) processing system. A single plot of P D versus P fa for one system must fix the SNR and processing time. The threshold h is varied to sweep out the ROC curve. The curve is often plotted on either log-log scale or “probability” scale. In comparing a variety of processing systems one would like to select the system (or develop a new one) that maximizes the P D for every given P fa . Processing systems must operate on their ROC curves, but most processing systems allow the operator to select where on the ROC curve the system is operated by adjusting a threshold; low thresholds ensure a high probability of detection at the expense of high false alarm rate. A sketch of two monotonically increasing ROC curves is given in Fig. 19.14. By proper adjustment of the decision threshold, one can trade off detection performance for false alarm performance. Since the points (0,0) and (1,1) are on all ROC curves, one can always guarantee 100% probability of detection with an arbitrarily low threshold (albeit at the expense of 100% probability of false alarm) or 0% probability of false alarm with an arbitrarily high threshold (albeit at the expense of 0% probability of detection). The (log) likelihood detector is a detector that achieves the maximum probability of detection for fixed probability of false alarm; it is shown in Fig. 19.15 for detecting Gaussian signals reflected or radiated from the stationary objects modeled in Fig. 19.13. For moving objects more complicated time compression or Doppler compensation processing is FIGURE 19.14Typical ROC curves. Note points (0,0) and (1,1) are on all ROC curves; upper curve represents higher P D for fixed P fa and hence better performance by having higher SNR or processing time. ? 2000 by CRC Press LLC required. For spiky non-Gaussian noise, other signal processing is required; indeed, clipping prior to filtering improves detection performance, by “eliminating” strong noise “pulses”. In active sonar, the filters are matched to the known transmitted waveforms. If the object (acoustic reflector) has motion, it will induce Doppler on the reflected signal, and the receiver will be complicated by the addition of a bank of Doppler compensators. Returns from a moving object are shifted in frequency by D f = (2v/c)f, where v is the relative velocity (range rate) between the source and object, c is the speed of sound in water, and f is the operating frequency of the source transmitter. In passive sonar, at low SNR, the optimal filters in Fig. 19.15 (so-called Eckart filters) are functions of G ss 1/2 (f)/G nn (f), where f is frequency in hertz, G ss (f) is the signal power spectrum, and G nn (f) is the noise power spectrum [see page 484 Carter (1993)]. Estimation/Localization The second function of underwater acoustic signal processing estimates the parameters that localize the position of the detected object. The source position is estimated in range, bearing, and depth, typically from the underlying parameter of time delay associated with the acoustic wavefront. The statistical uncertainty of the positional estimates is important. Knowledge of the first order probability density function or its first- and second-order moments, the mean (expected value) and the variance, are vital to understanding the expected performance of the processing system. In the passive case, the ability to estimate range is extremely limited by the geometry of the measurements; indeed, the variance of passive range estimates can be extremely large, especially when the true range to the acoustic source is long when compared with the aperture length of the receiving array. Figure 19.16 depicts direct path passive ranging uncertainty from a collinear array with sensors clustered so as to minimize the bearing and uncertainty region. Beyond the direct path, multipath signals can be processed to estimate source depth covertly. Range estimation accuracy is not difficult with the active sonar, but active sonar is not covert, which for some applications can be important. FIGURE 19.15Log likelihood detector structure for uncorrelated Gaussian noise in the received signal r j (t), j = 1,...,M. FIGURE 19.16Array geometry used to estimate source position. (Source: G.C. Carter, “Coherence and time delay estima- tion,” Proceedings IEEE, vol. 75, no. 2, p. 251, ? 1987 IEEE. With permission.) ? 2000 by CRC Press LLC Classification The third function of sonar signal processing is classification. This function determines the type of object that has radiated or reflected acoustic energy. For example, was the sonar signal return from a school of fish or a reflection from the ocean bottom? The action one takes is highly dependent upon this important function. The amount of radiated or reflected signal power relative to the background noise (that is, SNR) necessary to achieve good classification may be higher than for detection. Also, the type of signal processing required for classification may be different than the type of processing for detection. Processing methods that are developed on the basis of detection might not have the requisite SNR to adequately perform the classification function. Classifiers are, in general, divided into feature (or clue) extractors followed by a classifier decision box. A key to successful classification is feature extraction. Performance of classifiers is plotted as in ROC detection curves as probability of deciding on class A, given A was actually present, or P(A/A), versus the probability of deciding on class B, given that A was present, i.e., P(B/A), for two different classes of objects, A and B. Of course, for the same class of objects, one could also plot P(B/B) versus P(A/B). Motion Analysis or Tracking The fourth function of underwater acoustic signal processing is to perform contact (or target) motion analysis (TMA), that is, to estimate parameters of bearing and speed. Generally, nonlinear filtering methods, including Kalman-Bucy filters, are applied; typically these methods rely on a state space model for the motion of the contact. For example, the underlying model of motion could assume a straight-line course and constant speed of the contact of interest. When the acoustic source of interest behaves like the model, then results consistent with the basic theory can be expected. It is also possible to incorporate motion compensation into the signal processing detection function. For example, in the active sonar case, proper waveform selection and processing can reduce the degradation of detector performance caused by uncompensated Doppler. Moreover, joint detection and estimation can provide clues to the TMA and classification processes. For example, if the processor simultaneously estimates depth in the process of performing detection, then a submerged object would not be classified as a surface object. Also, joint detection and estimation using Doppler for detection can directly improve contact motion estimates. Normalization Another important signal processing function for the detection of weak signals in the presence of unknown and (temporal and spatial) varying noise is normalization. The statistics of noise or reverberation for oceans typically varies in time, frequency, and/or bearing from measurement to measurement and location to location. To detect a weak signal in a broadband, nonstationary, and inhomogeneous background, it is usually desirable to make the noise background statistics as uniform as possible for the variations in time, frequency, and/or bearing. The noise background estimates are first obtained from a window of resolution cells (which usually surrounds the test data cell). These estimates are then used to normalize the test cell, thus reducing the effects of the background noise on detection. Window length and distance from the test cell are two of the parameters that can be adjusted to obtain accurate estimates of the different types of stationary or nonstationary noise. Advanced Signal Processing Adaptive Beamforming Beamforming was discussed in an earlier section. The cancellation of noise through beamforming can also be done adaptively, which can improve the array gain further. Some of the various adaptive beamforming tech- niques are [Knight et al., 1981], Dicanne, sidelobe cancellers, maximum entropy array processing, and maxi- mum-likelihood (ML) array processing. Coherence Processing Coherence is a normalized (to lie between zero and unity) cross-spectral density function that is a measure of the similarity of received signals and noise between any sensors of the array. The complex coherence function between two wide-sense-stationary processes x and y is defined by ? 2000 by CRC Press LLC (19.8) where, as before, f is the frequency in hertz and G is the power spectrum function. Array gain depends on the coherence of the signal and noise between the sensors of the array. To increase the array gain, it is necessary to have good coherence among the sensors for the signal, but poor coherence (incoherent) for the noise. Coherence of the signal between sensors improves with decreasing separation between the sensors, frequency of the received waveform, total bandwidth, and integration time. Loss of coherence of the signal could be due to ocean motion, object motion, multipaths, reverberation, or scattering. The coherence function has many uses, including measurement of SNR or array gain, system identification, and determination of time delays [Carter, 1987]. Acoustic Data Fusion Acoustic data fusion is a technique that combines information from multiple receivers or receiving platforms about a common object or channel. Instead of each receiver making a decision, relevant information from the different receivers is sent to a common control unit where the acoustic data is combined and processed (hence the name data fusion). After fusion, a decision can be relayed or “fed” back to each of the receivers. If data transmission is a concern, due to time constraints, cost, or security, other techniques can be used in which each receiver makes a decision and transmits only the decision. The control unit makes a global decision based on the decisions of all the receivers and relays this global decision back to the receivers. This is called “distributed detection.” The receivers can then be asked to re-evaluate their individual decisions based on the new global decision. This process could continue until all the receivers are in agreement or could be terminated whenever an acceptable level of consensus is attained. An advantage of data fusion is that the receivers can be located at different ranges (e.g., on two different ships), in different mediums (shallow or deep water, or even at the surface), and at different bearings from the object, thus giving comprehensive information about the object or the underwater acoustic channel. Application Since World War II, in addition to military applications, there has been an expansion in commercial and industrial underwater acoustics applications. Table 19.2 lists the military and nonmilitary functions of sonar along with some of the current applications. Defining Terms Decibels (dB): Logarithmic scale of representing the ratio of two quantities given as 10 log 10 (P 1 /P 0 ) for power level ratios and 20 log 10 (V 1 /V 0 ) for comparing acoustic pressure or voltage ratios. A standard reference pressure or intensity level in SI units is equal to 1 micropascal (1 pascal = 1 newton per square meter = 10 dyne per square centimeter). Doppler shift: Shift in frequency of transmitted waveform due to the relative motion between the source and object. Figure of merit/sonar equation: Performance evaluation measure for the various target and equipment parameters of a sonar system. It is a subset of the broader sonar performance given by the sonar equations, which includes reverberation effects. Hydrophone:Receiving sensors that convert sound energy into electrical or optical energy (analogous to underwater microphones). Receiver operating characteristics (ROC) curves:Plots of the probability of detection (likelihood of detecting the object when the object is present) versus the probability of false alarm (likelihood of detecting the object when the object is not present) for a particular processing system. g xy xy xx yy f Gf GfGf () () ()() = ? 2000 by CRC Press LLC TABLE 19.2Underwater Acoustics Applications Reverberation/clutter:Inhomogeneities, such as dust, sea organisms, schools of fish, sea mounds on the bottom of the sea, form mass density discontinuities in the ocean medium. When an acoustic wave strikes these inhomogeneities, some of the acoustic energy is reflected and reradiated. The sum total of all such reradiations is called reverberation. Reverberation is present only in active sonar, and in the case where the object echoes are completely masked by reverberation, the sonar system is said to be “reverberation limited.” SONAR:Acronym for “SOund NAvigation and Ranging,” adopted in the 1940s, involves the use of sound to explore the ocean and underwater objects. Sound velocity profile (SVP):Description of the speed of sound in water as a function of water depth. SNR:The signal-to-noise (power) ratios, usually measured in decibels (dB). Time delay: The time (delay) difference in seconds from when an acoustic wavefront impinges on one hydrophone or receiver until it strikes another. Related Topic 16.1 Spectral Analysis References L. Brekhovskikh and Yu. Lysanov, Fundamentals of Ocean Acoustics, New York.: Springer-Verlag, 1982. W.S. Burdic, Underwater Acoustic System Analysis, Englewood Cliffs, N.J.: Prentice-Hall, 1984. G.C. Carter, Coherence and time delay estimation, Piscataway, NJ: IEEE Press, 1993. A.W. Cox, Sonar and Underwater Sound, Lexington, Mass.: Lexington Books, D.C. Health and Company, 1974. Function Description Military Detection Deciding if a target is present or not. Classification Deciding if a detected target does or does not belong to a specific class. Localization Measuring at least one of the instantaneous positions and velocity components of a target (either relative or absolute), such as range, bearing, range rate, or bearing rate. Navigation Determining, controlling, and/or steering a course through a medium (includes avoidance of obstacles and the boundaries of the medium). Communications Instead of a wire link, transmitting and receiving acoustic power and information. Control Using a sound-activated release mechanism. Position marking Transmitting a sound signal continuously (beacons) or transmitting only when suitably interrogated (transponders). Depth sounding Sending short pulses downward and timing the bottom return. Acoustic-speedometers Using pairs of transducers pointing obliquely downwards to obtain speed over the bottom from the Doppler shift of the bottom return. Commercial Applications: Industrial Oceanological Fish finders/fish herding Subbottom geological mapping Oil and mineral explorations Ocean topography River flow meter Bathyvelocimeter Acoustic holography Emergency telephone Viscosimeter Seismic simulation and measurement Acoustic ship docking system Biological signal and noise measurement Ultrasonic grinding/drilling Sonar calibration ? 2000 by CRC Press LLC W.C. Knight, R.G. Pridham, and S.M. Kay, “Digital signal processing for sonar,” Proceedings of the IEEE, vol. 69, no. 11, pp. 1451–1506, Nov. 1981. R.O. Nielsen, Sonar Signal Processing, Boston: Artech House, 1991. A.V. Oppenheim, Ed., Applications of Digital Signal Processing, Englewood Cliffs, N.J.: Prentice-Hall, 1980. R.J. Urick, Principles of Underwater Sound, New York.: McGraw-Hill, 1983. H.L. Van Trees, Detection, Estimation, and Modulation Theory, New York: John Wiley & Sons, 1968. L.J. Ziomek, Underwater Acoustics, A Linear Systems Theory Approach, New York: Academic Press, 1985. Further Information Journal of Acoustical Society of America (JASA), IEEE Transactions on Signal Processing (formerly the IEEE Transactions on Acoustics, Speech and Signal Processing), and IEEE Journal of Oceanic Engineering are professional journals providing current information on underwater acoustical signal processing. The annual meetings of the International Conference on Acoustics, Speech and Signal Processing, sponsored by the IEEE, and the biannual meetings of the Acoustical Society of America are a good source for current trends and technologies. A detailed tutorial on Digital Signal Processing for Sonar by W.C. Knight et al. is an informative and detailed tutorial on the subject [Knight et al., 1981]. ? 2000 by CRC Press LLC