Pricer, W.D., Katz, R.H., Lee, P.A., Mansuripur, M. “Memory Devices” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000 80 Memory Devices 80.1 Integrated Circuits (RAM, ROM) Dynamic RAMs (DRAMs) ? Static RAMs (SRAMs) ? Nonvolatile Programmable Memories ? Read-Only Memories (ROMs) 80.2 Basic Disk System Architectures Basic Magnetic Disk System Architecture ? Characterization of I/O Workloads ? Extensions to Conventional Disk Architectures 80.3 Magnetic Tape A Brief Historical Review ? Introduction ? Magnetic Tape ? Tape Format ? Recording Modes 80.4 Magneto-Optical Disk Data Storage Preliminaries and Basic Definitions ? The Optical Path ? Automatic Focusing ? Automatic Tracking ? Thermomagnetic Recording Process ? Magneto-Optical Readout ? Materials of Magneto-Optical Data Storage 80.1 Integrated Circuits (RAM, ROM) W. David Pricer The major forms of semiconductor memory in descending order of present economic importance are 1. Dynamic Random-Access Memories (DRAMs) 2. Static Random-Access Memories (SRAMs) 3. Nonvolatile Programmable Memories (PROMs, EEPROMs, EAROMs, EPROMs) 4. Read-Only Memories (ROMs) DRAMs and SRAMs differ little in their applications. DRAMs are distinguished from SRAMs in that no bistable electronic circuit internal to the storage cell maintains the information. Instead DRAM information is stored “dynamically” as charge on a capacitor. All modern designs feature one field-effect transistor (FET) to access the information for both reading and writing and a thin film capacitor for information storage. SRAMs maintain their bistability, so long as power is applied, by a cross-coupled pair of inverters within each storage cell. Almost always two additional transistors serve to access the internal nodes for reading and writing. Most modern cell designs are CMOS, with two P-channel and four N-channel FETs. Programmable memories operate much like read-only memories with the important attribute that they can be programmed at least once, and some can be reprogrammed a million times or more. Storage is almost always by means of a floating-gate FET. Information in such storage cells is not indefinitely nonvolatile. The discharge time constant is on the order of ten years. ROMs are generally programmed by a custom information mask within the fabrication sequence. As the name implies, information thence can only be read. The information thus stored is truly nonvolatile, even when power is removed. This is the most dense form of semiconductor storage (and the least flexible). Other forms of semiconductor memories, such as associative memories and charge-coupled devices, are used rarely. W. David Pricer IBM Randy H. Katz University of California, Berkeley Peter A. Lee Department of Trade and Industry, London M. Mansuripur University of Arizona, Tucson ? 2000 by CRC Press LLC Dynamic RAMs (DRAMs) The universally used storage cell circuit of one transistor and one capacitor has remained unchanged for over 20 years. The physical implementation, however, has undergone much diversity and many refinements. The innovation in physical implementation is driven primarily by the need to maintain a nearly constant value of capacitance while the surface area of the cell has decreased. A nearly fixed value of capacitance is needed to meet two important design goals. The cell has no internal amplification. Once the information is accessed, the stored voltage is vastly attenuated by the much larger bit line capacitance (see Fig. 80.1). The resulting signal must be kept larger than the resolution limits of the sensing amplifier. DRAMs in particular are also sensitive to a problem called soft errors. These are typically initiated by atomic events such as the incidence of a single alpha particle. An alpha particle can cause a spurious signal of 50,000 electrons or more. All mod- ern DRAM designs resolve this problem by constructing the capacitor in space out of the plane of the transistors (see Fig. 80.2 for examples). Placing the capacitor in space unusable for transistor fabrication has allowed great strides in DRAM density, generally at the expense of fabrication complexity. DRAM chip capacity has increased by about a factor of four every three years. DRAMs are somewhat slower than SRAMs. This relationship derives directly from the smaller signal available from DRAMs and from certain constraints put on the support circuitry by the DRAM array. DRAMs also FIGURE 80.2 (a) Cross section of “trench capacitors” etched vertically into the semiconductor surface of a DRAM inte- grated circuit. (Courtesy of IBM.) (b) Cross section of “stacked” capacitors fabricated above the semiconductor surface of a DRAM integrated circuit. (Source: M. Taguchi et al., “A 40-ns 64-b parallel data bus architecture,” IEEE J. Solid State Circuits, vol. 26, no. 11, p. 1495. ? 1991 IEEE. With permission.) FIGURE 80.1 Cell and bit line capacitance. ? 2000 by CRC Press LLC THE REVOLUTION OF ELECTRONICS TECHNOLOGY he last three decades have witnessed a revo- lution in electrical and, especially, electron- ics technology. This revolution was paced by changes in solid-state electronics that greatly expanded capabilities while at the same time radi- cally reduced costs. The entire field of electrical engi- neering has grown far beyond the boundaries that characterized it just a generation ago. Electrical engi- neers have become the creators and masters of the most pervasive technology of our time, with pro- found effects on society and on their profession. The effects of the electronics revolution are com- T ? 2000 by CRC Press LLC require periodic intervals to “refresh” lost charge from the capacitor. This charge is lost primarily across the semiconductor junctions and must be replenished every few milliseconds. The manufacturer usually supplies these “housekeeping” functions with on-chip circuitry. Signal detection and amplification remain a critical focus of good DRAM design. Figure 80.3 illustrates an arrangement called a “folded bit line.” This design cancels many of the noise sources originating in the array and decreases circuit sensitivity to manufacturing process variations. It also achieves a high ratio of storage cells per sense amplifier. Note the presence of the dummy cells, which create a reference signal midway between a “one” and a “zero” for the convenience of the sense amplifier. The stored reference voltage in this case is created by shorting two driven bit lines after one of the storage cells has been written. Large DRAM integrated circuit chips frequently provide other features that users may find useful. Faster access is provided between certain adjacent addresses, usually along a common word line. Some designs feature on-chip buffer memories, low standby power modes, or error correction circuitry. A few DRAM chips are designed to mesh with the constraints of particular applications such as image support for CRT displays. Some on-chip features are effectively hidden from the user. These may include redundant memory addresses which the maker activates by laser to improve manufacturing yield. The largest single market for DRAMs is with microprocessors in personal computers. Rapid microprocessor performance improvements have led DRAM manufacturers to offer improvements especially designed for the “PC” environment. Extended Data Out mode (EDO) keeps the data accessed from a DRAM valid over a longer period of the DRAM cycle. EDO mode is intended to ease the synchronization problem between a DRAM and the increasingly higher speed microprocessor. Synchronous DRAM (SDRAM) allows the rapid sequential plex. For the profession, the most obvious impact has been explosive growth. The increase in the num- ber of students studying in the field continues to be dramatic and shows no signs of slowing. The elec- trical engineering community represents the largest single technical group in the world, and the mem- bers of the IEEE make up the world’s largest engi- neering society. (Courtesy of the IEEE Center for the History of Electrical Engineering.) This 64-kB random access memory chip, developed by IBM in 1978, was one of the densest of its time. It could store as many as 64,000 bits of informa- tion—roughly equivalent to 1,000 eight-letter words. (Photo courtesy of the IEEE Center for the History of Electrical Engineering.) MULTICOORDINATE DIGITAL INFORMATION STORAGE DEVICE Jay W. Forrester Patented February 28, 1956 #2,736,880 p to this time, digital data storage was generally done by encoding binary data ? 2000 by CRC Press LLC transfer of large blocks of data between the microprocessor and the DRAM without extensive signal “hand- shaking”. While SDRAMs do nothing to improve the access time to first data, they greatly improve the “band- width” between microprocessor and DRAM. Static RAMs (SRAMs) The primary advantages of SRAMs as compared to DRAMs are high speed and ease of use. In addition, SRAMs fabricated in CMOS technology exhibit extremely low standby power. This later feature is effectively used in much portable equipment like pocket calculators. Bipolar SRAMs are generally faster but less dense than FET versions. Figure 80.4 illustrates two cells. SRAM performance is dominated by the speed of the support circuits, leading some manufacturers to design bipolar support circuits to FET arrays. Bipolar designs frequently incorporate circuit consolidation unavailable in FET technology, such as the multi- emitter cell shown in Fig. 80.4(a). Here one of the two lower emitters is normally forward biased, turning one inverter on and the other off for bistability. The upper emitters can be used either to extract a differential signal on rotating magnetic drums or other means where data had to be stored and retrieved sequentially. This patent describes a system whereby data could be stored and retrieved ran- domly by a simple addressing scheme. It used tiny doughnut-shaped ferromagnetic cores with windings to magnetically polarize the material in one direction or the other. This was about one hundred times faster than rotating drums and took up perhaps 2% of the volume. A 4-Kbyte core memory module would take up about 60 cubic inches and could access data in less than one millisecond. Random access mem- ory (RAM) was born. Core memory (as it has become known) was non-volatile; that is, the information would not be lost when power was cut. Modern non-volatile “flash” memory is yet again thousands of times faster and achieves data density of over 100,000 times greater than the breakthrough magnetic core memory described by Forrester. (Copyright ? 1995, DewRay Products, Inc. Used with permission.) U or to discharge one collector towards ground in order to write the cell. The word line is pulsed positive to both read and write the cell. A few RAMs use polysilicon load resistors of very high resistance value in place of the two P-channel transistors shown in Fig. 80.4(b). Most are full CMOS designs like the one shown. Sometimes the P-channel transistors are constructed by thin film techniques and are physically placed over the N-channel transistors to improve density. When both P- and N-channel transistors are fabricated in the same plane of the single-crystal semiconductor, the standby current can be extremely low. Typically this can be microamps for megabit chips. The low standby current is possible because each cell sources and sinks only that current needed to overcome the actual node leakage within the cell. Selecting the proper transconductance for each transistor is an important focus of the designer. The accessing transistors should be large enough to extract a large read signal but insufficiently large to disturb the stored information. During the write operation, these same transistors must be capable of overriding the current drive of at least one of the internal CMOS inverters. The superior performance of SRAMs derives from their larger signal and the absence of a need to refresh the stored information as in a DRAM. As a result, SRAMs need fewer sense amplifiers. Likewise these amplifiers are not constrained to match the cell pitch of the array. SRAM design engineers have exploited this freedom to realize higher-performance sense amplifiers. FIGURE 80.3 Folded bit line array. FIGURE 80.4 (a) Bipolar SRAM cell. (b) CMOS SRAM cell. ? 2000 by CRC Press LLC Practical SRAM designs routinely achieve access times of a few nanoseconds to a few tens of nanoseconds. Cycle time typically equals access time, and in at least one pipelined design, cycle time is actually less than access time. SRAM integrated circuit chips have fewer special on-chip features than DRAM chips, primarily because no special performance enhancements are needed. By contrast, many other integrated circuit chips feature on- chip SRAMs. For example, many ASICs (application-specific integrated circuits) feature on-chip RAMs because of their low power and ease of use. All modern microprocessors include one or more on-chip “cache” SRAM memories which provide a high speed link between processor and memory. Nonvolatile Programmable Memories A few nonvolatile memories are programmable just once. These have arrays of diodes or transistors with fuses or antifuses in series with each semiconductor cross point. Aluminum, titanium, tungsten, platinum silicide, and polysilicon have all been successfully used as fuse technology (see Fig. 80.5). Most nonvolatile cells rely on trapped charge stored on a floating gate in an FET. These can be rewritten many times. The trapped charge is subject to very long term leakage, on the order of ten years. The number of times the cell may be rewritten is limited by pro- gramming stress-induced degradation of the dielectric. Charge reaches the floating gate either by tunneling or by avalanche injec- tion from a region near the drain. Both phenomena are induced by over-voltage conditions and hence the degradation after repeated erase/write cycles. Commercially available chips typically promise 100 to 100,000 write cycles. Erasure of charge from the floating gate may be by tunneling or by exposure to ultraviolet light. Asperities on the polysilicon gate and silicon-rich oxide have both been shown to enhance charging and discharging of the gate. The nomenclature used is not entirely consistent throughout the industry. However, EPROM is generally used to describe cells which are electronically written but UV erased. EEPROM is used to describe cells which are electronically both written and erased. Cells are of either a two- or a one-transistor design. Where two transistors are used, the second transistor is a conventional enhance- ment mode transistor (see Fig. 80.6). The second transistor works to minimize the disturb of unselected cells. It also removes some constraints on the writing limits of the programmable transistor, which in one state may be depletion mode. The two transistors in series then assume the threshold of the second (enhancement) transistor, or a very high threshold as determined by the programmable transistor. Some designs are so cleverly integrated that the features of the two transistors are merged. Flash EEPROMs describe a family of single-transistor cell EPPROMs. Cell sizes are about half that of two- transistor EEPROMs, an important economic consideration. Care must be taken that these cells are not programmed into the depletion mode. An array of depletion mode cells would confound the read operation by providing multiple signal paths. Programming to enhancement only thresholds can be accomplished by a sequence of partial program and then monitor subcycles, until the threshold is brought to compliance with specification limits. Flash EEPROMs require bulk erasure of large portions of the array. NVRAM is a term used to describe a SRAM or DRAM with nonvolatile circuit elements. The cell is built to operate as a RAM with normal power applied. On command or with power failure imminent, the EEPROM elements can be activated to capture the last state of the RAM cell. The nonvolatile information is restored to a SRAM cell by normal internal cell regeneration when power is restored. FIGURE 80.5 PROM cells. ? 2000 by CRC Press LLC Read-Only Memories (ROMs) ROMs are the only form of semiconductor storage which is permanently nonvolatile. Information is retained without power applied, and there is not even very gradual informa- tion loss as in EEPROMs. It is also the most dense form of semiconductor storage. ROMs are, however, less used than RAMs or EEPROMs. ROMs must be personalized by a mask in the fabrication process. This method is cumbersome and expensive unless many identical parts are to be made. Fur- thermore it seems much “permanent” information is not really permanent and must be occasionally updated. ROM cells can be formed as diodes or transistors at every intersection of the word and bit lines of a ROM array (see Fig. 80.7). One of the masks in the chip fabrication process programs which of these devices will be active. Clever layout and circuit techniques may be used to obtain further density. Two such techniques are illustrated in Figs. 80.8 and 80.9. The X array shares bit and virtual ground lines. The AND array places many ROM cells in series. Each of these series AND ROM cells is either FIGURE 80.6 Cross section of two-transistor EEPROM cells. FIGURE 80.8 Layout of ROS X array. FIGURE 80.7 ROM cell. ? 2000 by CRC Press LLC an enhancement or a depletion channel of an FET. Sensing is accomplished by pulsing the gates of all series cells positive except the gate which is to be interrogated. Current will flow through all series channels only if the interrogated channel is depletion mode. ROM applications include look-up tables, machine-level instruction code for computers, and small arrays used to perform logic (see PLA in Section 81.4 of this handbook). Defining Terms Antifuse: A fuse-like device which when activated becomes low impedance. Application-specific integrated circuits (ASICs): Integrated circuits specifically designed for one particular application. Avalanche injection: The physics whereby electrons highly energized in avalanche current at a semiconductor junction can penetrate into a dielectric. Depletion mode: An FET which is on when zero volts bias is applied from gate to source. Enhancement mode: An FET which is off when zero volts bias is applied from gate to source. Polysilicon: Silicon in polycrystalline form. Tunneling: A physical phenomenon whereby an electron can move instantly through a thin dielectric. Related Topic 25.3 Application-Specific Integrated Circuits References H. Kalter et al., “A 50 nsec 16 Mb DRAM with 10 nsec data rate and on-chip ECC,” IEEE Journal of Solid-State Circuits, vol. SC 25, no. 5, 1990. H. Kato, “A 9 nsec 4 Mb BiCMOS SRAM with 3.3 V operation,” Digest of Technical Papers ISSCC, vol 35, 1992. H. Kawague, and N. Tsuji, “Minimum size ROM structure compatible with silicon-gate E/D MOS LSI,” IEEE Journal of Solid State Circuits, vol. SC 11, no. 2, 1976. Further Information W. Donoghue et al., “A 256K H CMOS ROM using a four state cell approach,” IEEE Journal of Solid-State Circuits, vol. SC20, no. 2, 1985. FIGURE 80.9 Layout of ROS AND array. ? 2000 by CRC Press LLC D. Frohmann-Bentchkowsky, “A fully decoded 2048 bit electronically programmable MOS-ROM,” Digest of Technical Papers ISSCC, vol. 14, 1971. L. A. Glasser and D. W. Dobberpuhl, The Design and Analysis of VLSI Circuits, Reading, Mass.: Addison-Wesley, 1985. F. Masuoka, “Are you ready of the next generation dynamic RAM chips,” IEEE Spectrum Magazine, vol. 27, no. 11, 1990. R. D. Pashley and S. K. Lai, “Flash memories: The best of two worlds,” IEEE Spectrum Magazine, vol. 26, no. 12, 1989. 80.2 Basic Disk System Architectures Randy H. Katz Architects of high-performance computers have long been forced to acknowledge the existence of a large gap between the speed of the CPU and the speed of its attached I/O devices. A number of techniques have been developed in an attempt to narrow this gap, and we shall review them in this chapter. A key measure of magnetic disk technology is the growth in the maximum number of bits that can be stored per square inch, i.e., the bits per inch in a disk track times the number of tracks per inch of media. Called MAD, for maximal areal density, the “First Law in Disk Density” predicts [Frank, 1987]: MAD = 10 (Year-1971)/10 (80.1) This is plotted against several real disk products in Fig. 80.10. Magnetic disk technology has doubled capacity and halved price every three years, in line with the growth rate of semiconductor memory. Between 1967 and 1979 the growth in disk capacity of the average IBM data processing system more than kept up with its growth in main memory, maintaining a ratio of 1000:1 between disk capacity and physical memory size [Stevens, 1981]. In contrast to primary memory technologies, the performance of conventional magnetic disks has improved only modestly. These mechanical devices, the elements of which are described in more detail in the next section, are dominated by seek and rotation delays: from 1971 to 1981, the raw seek time for a high-end IBM disk improved by only a factor of two while the rotation time did not change [Harker et al., 1981]. Greater recording density translates into a higher transfer rate once the information is located, and extra positioning actuators for the read/write heads can reduce the average seek time, but the raw seek time only improved at a rate of 7% per year. This is to be compared to a doubling in processor power every year, a doubling in memory density every two years, and a doubling in disk density every three years. The gap between processor performance and disk speeds continues to widen, and there is no reason to expect a radical improvement in raw disk performance in the near future. To maintain balance, computer systems have been using even larger main memories or solid-state disks to buffer some of the I/O activity. This may be an acceptable solution for applications whose I/O activity has locality of reference and for which volatility is not an issue, but applications dominated by a high rate of random requests for small pieces of data (e.g., transaction processing) or by a small number of sequential requests for massive amounts of data (e.g., supercomputer applications) face a serious performance limitation. The rest of the chapter is organized as follows. In the next section, we will briefly review the fundamentals of disk system architecture. The third section describes the characteristics of the applications that demand high I/O system performance. Conventional ways to improve disk performance are discussed in the last section. Basic Magnetic Disk System Architecture We will review here the basic terminology of magnetic disk devices and controllers and then examine the disk subsystems of three manufacturers (IBM, Cray, and DEC). Throughout this section we are concerned with technologies that support random access, rather than sequential access (e.g., magnetic tape). A more detailed discussion, focusing on the structure of small dimension disk drives, can be found in Vasudeva [1988]. The basic concepts are illustrated in Fig. 80.11. A spindle consists of a collection of platters. Platters are metal disks ? 2000 by CRC Press LLC covered with a magnetic material for recording information. Each platter contains a number of circular recording tracks. A sector is a unit of a track that is physically read or written at the same time. In traditional magnetic disks, the constant angular rotation of the platters dictates that sectors on inner tracks are recorded more densely than sectors on the outer tracks. Thus, the platter can spin at a constant rate and the same amount of data can be recorded on the inner and outer tracks. 1 Some modern disks use zone recording techniques to more densely record data on the outer tracks, but this requires more sophisticated read/write electronics. The read/write head is an electromagnet that produces switchable magnetic fields to read and record bit streams on a platter’s track. It is associated with a disk arm, attached to an actuator. The head “flies” close to, but never touches, the rotating platter (except perhaps when powered down). This is the classical definition of a Winchester disk. The actuator is a mechanical assembly that positions the head electronics over the appro- priate track. It is possible to have multiple read/write mechanisms per surface, e.g., multiple heads per arm—at one extreme, one could have a head-per-track position, that is, the disk equivalent of a magnetic drum—or 1 Some optical disks use a technique called constant linear velocity (CLV), where the platter rotates at different speeds depending on the relative position of the track. This allows more data to be stored on the outer tracks than the inner tracks, but because it takes more delay to vary the speed of rotation, the technique is better suited to sequential rather than random access. FIGURE 80.10 Maximal areal density law. Squares represent predicted density; triangles are the MAD reported for the indicated products. FIGURE 80.11 Disk terminology. Heads reside on arms which are positioned by actuators. Tracks are concentric rings on platters. A sector is the basic unit of read/write. A cylinder is a stack of tracks at one actuator position. An HDA is everything in the figure plus the air-tight casing. In some devices it is possible to transfer from multiple surfaces simultaneously. The collection of heads that participate in a single logical transfer that is spread over multiple surfaces is called a head group. ? 2000 by CRC Press LLC multiple arms per surface through multiple actuators. Due to costs and technical limitations, it is usually uneconomical to build a device with a large number of actuators and heads. A cylinder is a stack of tracks at one actuator position. A head disk assembly (HDA) is the collection of platters, heads, arms, and actuators, plus the air-tight casing. A disk drive is an HDA plus all associated electronics. A disk might be a platter, an actuator, or a drive depending the context. We can illustrate these concepts by describing two first-generation supercomputer disks, the Cray DD-19 and the CDC 819 [Bucher and Hayes, 1980]. These were state-of-the-art disks around 1980. Each disk has 40 recording surfaces (20 platters), 411 cylinders, and 18 (DD-19) or 20 (CDC 819) 512-byte sectors per track. Both disks possess a limited “parallel read-out” capability. A given data word is actually byte interleaved over four surfaces. Rather than a single set of read/write electronics for the actuator, these disks have four sets, so it is possible to read or write with four heads at a time. Four heads on adjacent arms are called a head group. A disk track is thus composed of the stacked recording tracks of four adjacent surfaces, and there are 10 tracks per cylinder, spread over 40 surfaces. The advances over the last decade can be illustrated by the Cray DD-49, which is a typical high-end supercomputer disk of today. It consists of 16 recording surfaces (9 platters), 886 cylinders, 42 4096-byte sectors per track, with 32 read/write heads organized into eight head groups, four groups on each of two independent actuators. Each actuator can sweep the entire range of tracks, and by “scheduling” the arms to position the actuator closest to the target track of the pending request, the average seek time can be reduced. The DD-49 has a capacity of 1.2 Gbytes of storage and can transfer at a sustained rate of 9.6 Mbytes/s. A variety of standard and proprietary interfaces are defined for transferring the data recorded on the disk to or from the host. We concentrate on industry standards here. On the disk surface, information is represented as alternating polarities of magnetic fields. These signals need to be sensed, amplified, and decoded into synchronized pulses by the read electronics. For example, the pulse-level protocol ST506/412 standard describes the way pulses can be extracted from the alternating flux fields. The bit-level ESDI, SMD, and IPI-2 standards describe the bit encoding of signals. At the packet level, these bits must be aligned into bytes, error correcting codes need to be applied, and the extracted data must be delivered to the host. These “intelligent” standards include SCSI (small computer standard interface) and IPI-3. The ST506 is a low-cost but primitive interface, most appropriate for interfacing floppy disks to personal computers and low-end workstations. For example, the controller must perform data separation on its own; this is not done for it by the disk device. As a result, its transfer rate is limited to 0.625 Mbytes/s. The SMD interface is higher performance and is used extensively in connecting disks to mainframe disk controllers. ESDI is similar, but geared more towards smaller disk systems. One of its innovations over the ST506 is its ability to specify a seek to a particular track number rather than requiring track positioning via step-by-step pulses. Its performance is in the range of 1.25–1.875 Mbytes/s. SCSI has so far been used primarily with workstations and minicomputers, but offers the highest degree of integration and intelligence. Implementations with per- formance at the level of 1.5–4 Mbytes/s are common. The newer IPI-3 standard has the advantages of SCSI, but provides even higher performance at a higher cost. It is beginning to make inroads into mainframe systems. However, because of the very widespread use of SCSI, many believe that SCSI-2, an extension of SCSI to wider signal paths, will become the de facto standard for high-performance small disks. The connection pathway between the host and the disk device varies widely depending on the desired level of performance. A low-end workstation or personal computer would use a SCSI interface to directly connect the device to the host. A higher end file server or minicomputer would typically use a separate disk controller to manage several devices at the same time. These devices attach to the controller through SMD interfaces. It is the controller’s responsibility to implement error checking and corrections and direct memory transfer to the host. Mainframes tend to have more devices and more complex interconnection schemes to access them. In IBM terminology [Buzen and Shum, 1986], the channel path, i.e., the set of cables and associated electronics that transfer data and control information between an I/O device and main memory, consists of a channel, a storage director, and a head of string (see Fig. 80.12). The collection of disks that share the same pathway to the head of string is called a string. In earlier IBM systems, a channel path and channel are essentially the same thing. The channel processor is the hardware that executes channel programs, which are fetched from the host’s memory. A subchannel is the ? 2000 by CRC Press LLC execution environment of a channel program, similar to a process on a conventional CPU. Formerly, a subchannel was statically assigned for execution to a particular channel, but a major innovation in high-end IBM systems (308X and 3090) allows subchannels to be dynamically switched among channel paths. This is like allocating a process to a new processor within a multiprocessor system every time it is rescheduled for execution. I/O program control statements, e.g., transfer in channel, are interpreted by the channel, while the storage director (also known as the device controller or control unit) handles seek and data-transfer requests. Besides these control functions, it may also perform certain datapath functions, such as error detection/correction and mapping between serial and parallel data. In response to requests from the storage director, the device will position the access mechanism, select the appropriate head, and perform the read or write. If the storage director is simply a control unit, then the datapath functions will be handled by the head of string (also known as a string controller). To minimize the latency caused by copying into and out of buffers, the IBM I/O system uses little buffering between the device and memory. 1 In a high-performance environment, devices spend a good deal of time waiting for the pathway’s resources to become free. These resources are used for time periods related to disk transfer speeds, measured in milliseconds. One possible method for improving utilization is to support dis- connect/reconnect. A subchannel can connect to a device, issue a seek, disconnect to free the channel path for other requests, and reconnect later to perform the transfer when the seek is completed. Unfortunately, not all reconnects can be serviced immediately, because the control units are busy servicing other devices. These RPS misses (to be described in more detail in the next section) are a major source of delay in heavily utilized IBM storage subsystems [Buzen and Shum, 1987]. Performance can be further improved by providing multiple paths between memory and devices. To this purpose, IBM’s high-end systems support dynamic path reconnect, a 1 Only the most recent generation of storage directors (e.g., IBM 3880, 3990) incorporate disk caches, but care must be taken to avoid cache management-related delays [Buzen, 1982]. FIGURE 80.12 Host-to-device pathways. For large IBM mainframes, the connection between host and device must pass through a channel, storage director, and string controller. Note that multiple storage directors can be attached to a channel, multiple string controllers per storage director, and multiple devices per string controller. This multipathing approach makes it possible to share devices among hosts and to provide alternative pathways to better utilize the drives and controllers. While logically correct, the figure does not reflect the true physical components of high-end IBM systems (308X, 3090). The concept of channel has disappeared from these systems and has been replaced by a channel path. ? 2000 by CRC Press LLC mechanism that allows a subchannel to change its channel path each time it cycles through a disconnect/recon- nect with a given device. Rather than wait for its currently allocated path to become free, it can be assigned to another available path. Turning to supercomputer I/O systems, we will now examine the I/O architecture of the Cray machines. Because the Cray I/O system (IOS) varies from model to model, the following discussion concentrates on the IOS found on the Cray X-MP and Y-MP [Cray, 1988]. In general, the IOS consists of two to four I/O processors (IOPs), each with its own local memory and sharing a common buffer memory with the other IOPs. The IOP is designed to be a simple, fast machine for controlling data transfers between devices and the central memory of the Cray main processors. Since it executes the control statements of an I/O program, it is not unlike the IBM channel processor in terms of its functionality, except that IO programs reside in its local memory rather than in the host’s. An IOP’s local memory is connected through a high-speed communications interface, called a channel in Cray terminology, to a disk control unit (DCU). A given port into the local memory can be time multiplexed among multiple channels. Data is transferred back and forth between devices and the main processors through the IOP’s local memory, which is interfaced to central memory through a 100-Mbyte/s channel pair (one pathway for each direction of transfer). The DCU provides the interface between the IOP and the disk drives and is similar in functionality to IBM’s storage director. It oversees the data transfers between devices and the IOP’s local memory, provides speed matching buffer storage, and transmits control signals and status information between the IOP and the devices. Disk storage units (DSUs) are attached to the DCU through point-to-point connections. The DSU contains the disk device and is responsible for dealing with its own defect management, by using a technique called sector slipping. Figure 80.13 summarizes the elements of the Cray I/O system. Digital Equipment Corporation’s high-end I/O strategy is described in terms of the digital storage architecture (DSA) and is embodied in system configurations such as the VAXCluster shared disk system (see Fig. 80.14). The architecture provides a rigorous definition of how storage subsystems and host computers interact. It achieves this by defining a client/server message-based model for I/O interaction based on device-independent interfaces [Massiglia, 1986; Kronenberg et al., 1986]. A mass storage subsystem is viewed at the architectural level as consisting of logical block machines capable of storing and retrieving fixed blocks of data, i.e., the I/O system supports the transfer of logical blocks between CPUs and devices given a logical block number. From the viewpoint of physical components, a subsystem consists of controllers which connect computers to drives. The software architecture is divided into four levels: the Operating System Client (also called the Class Driver), the Class Server (Controller), the Device Client (Data Controller), and the Device Server (Device). The Disk Class Driver, resident on a host CPU, accepts requests for disk I/O service from applications, packages these FIGURE 80.13 Elements of the Cray I/O system for the Y-MP. An IOS contains up to four IOPs. The MIOP connects to the operator workstation and performs mainly maintenance functions. The XIOP supports block multiplexing and is most appropriate for controlling relatively slow speed devices, such as tapes. The BIOP and DIOP are designed for controlling high-speed devices like disks. Up to four disk storage units (DSUs) can be attached through the disk control unit (DCU) to the IOP. Three DCUs can be connected to each of the BIOP and DIOP, leading to a total of 24 disks per IOS. The Y-MP can be configured with two IOSs, for a system total of 48 devices. ? 2000 by CRC Press LLC requests into messages, and transmits them via a communications interface (such as the Computer Interconnect port driver) to the Disk Class Server resident within a controller in the I/O subsystem. The command set supported by the Class Server includes such relatively device-independent operations as read logical block, write logical block, bring on-line, and request status. The Disk Class Server 1 interprets the transmitted com- mands, handles the scheduling of command execution, tracks their progress, and reports status back to the Class Driver. Note the absence of seek or select head commands. This interface can be used equally well for solid-state disks as for conventional magnetic disks. Device-specific commands are issued at a lower level of the architecture, i.e., between the Device Client (disk controller) and Device Server (disk device). The former provides the path for moving commands and data between hosts and drives, and it is usually realized physically by a piece of hardware that corresponds to the device controller. The latter coincides with the physical drives used for storing and retrieving data. It is interesting to contrast these proprietary approaches with an industry standard approach like SCSI, admittedly targeted for the low to mid range of performance. SCSI defines the logical and physical interface between a host bus adapter (HBA) and a disk controller, usually embedded within the assembly of the disk device. The HBA accepts I/O requests from the host, initiates I/O actions by communicating with the controllers, and performs direct memory access transfers between its own buffers and the memory of the host. Requesters of service are called initiators, while providers of service are called targets. Up to eight nodes can reside on a single SCSI string, sharing a common pathway to the HBA. The embedded controller performs device handling and error recovery. Physically, the interface is implemented with a single daisy-chained cable, and the 8-bit datapath is used to communicate control and status information, as well as data. SCSI defines a layered communications protocol, including a message layer for protocol and status, and a command/status layer for target operation execution. The HBA roughly corresponds to the function of the IBM channel processor or Cray IOP, while the embedded controller is similar to the IBM storage director/string controller or the Cray DCU. Despite the differences in terminology, the systems we have surveyed exhibut significant commonality of function and similar approaches for partitioning these functions among hardware components. Characterization of I/O Workloads Before characterizing the I/O behavior of different workloads, it is necessary to first understand the elements of disk performance. Disk performance is a function of the service time, which consists of three main compo- nents: seek time, rotational latency, and data transfer time. 2 Seek time is the time needed to position the heads 1 Other kinds of class servers are also supported, such as for tape drives. 2 In a heavily utilized system, delays waiting for a device can match actual disk service times, which in reality is composed of device queuing, controller overhead, seek, rotational latency, reconnect misses, error retries, and data transfer. FIGURE 80.14 VAXCluster architecture. CPUs are connected to HSCs (hierarchical storage controllers) through a dual CI (computer interconnect) bus. Thirty-one hosts and 31 HSCs can be connected to a CI. Up to 32 disks can be connected to an HSC-70. ? 2000 by CRC Press LLC to the appropriate track position containing the desired data. It is a function of a substantial initial start-up cost to accelerate the disk head (on the order of 6 ms) as well as the number of tracks that must be traversed. Typical average seek times, i.e., the time to traverse between two randomly selected tracks (approximately 28% of the data band), are in the range of 10 to 20 ms. The track-to-track seek time is usually below 10 ms and as low as 2 ms. The second component of service time is rotational latency. It takes some time for the desired sector to rotate under the head position before it can be read or written. Today’s devices spin at a rate of approximately 3600 rpm, or 60 revolutions per second (we expect to see rotation speeds increase to 5400 rpm in the near future). For today’s disks, a full revolution is 16 ms, and the average latency is 8 ms. Note that the worst-case latencies are comparable to average seeks. The last component is the transfer time, i.e., the time to physically transfer the bytes from disk to the host. While the transfer time is a strong function of the number of bytes to be transferred, seek and rotational latencies times are independent of the transfer blocksize. If data is to be read or written in large chunks, it makes sense to choose a large blocksize, since the “fixed cost” of seek and latency is better amortized across a large data transfer. A low-performance I/O system might dedicate the pathway between the host and the disk for the entire duration of the seek, rotate, and transfer times. Assuming small blocksizes, transfer time is a small component of the overall service time, and these pathways can be better utilized if they are shared among multiple devices. Thus, higher performance systems support independent seeks, in which a device can be directed to detach itself from the pathway while seeking to the desired track (recall the discussion of dynamic path reconnect in the previous section). The advantage is that multiple seeks can be overlapped, reducing overall I/O latency and better utilizing the available I/O bandwidth. However, to make it possible for devices to reattach to the pathway, the I/O system must support a mechanism called rotational position sensing, i.e., the device interrupts the I/O controller when the desired sector is under the heads. If the pathway is currently in use, the device must pay a full rotational delay before it can again attempt to transfer. These rotational positional reconnect miss delays (RPS delays) represent a major source of degradation in many existing I/O systems [Buzen and Shum, 1987]. This arises from the lack of device buffering and the real-time service requirements of magnetic disks. At the time that these architectures were established, buffer memories were expensive and the demands for high I/O performance were less pressing with slower speed CPUs. An alternative, made more attractive by today’s relative costs of electronic and mechanical com- ponents, is to associate a track buffer with the device that can be filled immediately. This can then be used as the source of the transfer when the pathway becomes available [Houtekamer, 1985]. I/O intensive applications vary widely in the demand they place on the I/O system. They run the gamut from processing small numbers of bulk I/Os that must be handled with minimum delay (supercomputer I/O) to large numbers of simple tasks that touch small amounts of data (transaction processing). An important design challenge is to develop an I/O system that can handle the performance needs of these diverse workloads. A given workload’s demand for I/O service can be specified in terms of three metrics: throughput, latency, and bandwidth. Throughput refers to the number of requests for service made per unit time. Latency measures how long it takes to service an individual request. Bandwidth gauges the amount of data flowing between service requesters (i.e., applications) and service providers (i.e., devices). As observed by Bucher and Hayes [1980], supercomputer I/O can be characterized almost entirely by sequential I/O. Typically, computation parameters are moved in bulk from disk to in-memory data structures, and results are periodically written back to disk. These workloads demand large bandwidth and minimum latency, but are characterized by low throughput. Contrast this with transaction processing, which is charac- terized by enormous numbers of random accesses, relatively small units of work, and a demand for moderate latency with very high throughput. Figure 80.15 shows another way of thinking about the varying demands of I/O intensive applications. It shows the percentage of time different applications spend in the three components of I/O service time. Trans- action processing systems spend the majority of their service time in seek and rotational latency; thus techno- logical advances which reduce the transfer time will not affect their performance very much. On the other hand, scientific applications spend a more equal amount of time in seek and data transfer, and their performance is sensitive to any improvement in disk technology. ? 2000 by CRC Press LLC Extensions to Conventional Disk Architectures In this subsection, we will focus on techniques for improving the performance of conventional disk systems, i.e., methods which allow us to reduce the seek time, rotational latency, or transfer time of conventional disks. By reducing disk service times, we also decrease device queuing delays. These techniques include fixed-head disks, parallel transfer disks, increased disk density, solid-state disks, disk caches, and disk scheduling. Fixed-Head Disk The concept of a fixed-head disk is to place a read/write head at every track position. The need for positioning the heads is eliminated, thus eliminating the seek time altogether. The approach does not assist in reducing rotational latencies, nor does it lessen the transfer time. Fixed-head disks were often used in the early days of computing systems as a back-end store for virtual memory. However, since modern disks have hundreds of tracks per surface, placing a head at every position is no longer viewed as an economical solution. Parallel Transfer Disks Some high-performance disk drives make it possible to read or write from multiple disk surfaces at the same time. For example, the Cray DD-19 and DD-49 disks described in the second section have a parallel transfer capability. The advantage is that much higher transfer rates can be achieved, but no assistance is provided for seek or rotational latency. Thus transfer units are correspondingly larger in these systems. A number of economic and technological issues limit the usefulness of parallel transfer disks. From the economic perspective, providing more than one set of read/write electronics per actuator is expensive. Further, current disks use sophisticated control systems to lock onto an individual track, and it is difficult to do this simultaneously across tracks within the same cylinder. Hence, the Cray strategy is limiting head groups to only four surfaces. There appears to be a fundamental trade-off between track density and the number of platters: as the track density increases, it becomes ever more difficult to lock onto tracks across many platters, and the number of surfaces that can participate in a parallel transfer is reduced. For example, current Cray track densities are around 980 tracks/inch, and require a rather sophisticated closed-loop track-following servo system to position the heads accurately with finely controlled voice coil actuators. A lower cost ($/megabyte) high- performance disk system can be constructed from several standard drives than from a single parallel transfer device, in part because of the relatively small sales volume of parallel transfer devices compared to standard drives. Increasing Disk Density As described in the first section, the improvements in disk recording density are likely to continue. Higher bit densities are achieved through a combination of the use of thinner films on the disk platters (e.g., densities improve from 16,000 bpi to 21,000 bpi when thick iron oxide is replaced with thin film materials), smaller gaps between the poles of the read/write head’s electromagnet, and heads which fly closer to the disk surface. FIGURE 80.15 I/O system parameters as a function of application. Transaction processing applications are seek and rotational latency limited, since only small blocks are usually transferred from disk. Image-processing applications, on the other hand, transfer huge blocks and thus spend most of their I/O time in data transfer. Scientific computing applications tend to fall in between. (Source: I. Y. Bucher and A. H. Hayes, “I/O performance measurement on Cray-1 and CDC 7600 computers,” Proc. Cray Users Group Conference, October 1980. With permission.) ? 2000 by CRC Press LLC While vertical recording techniques have long been touted as the technology of the future, advances in head technology make it possible to continue using conventional horizontal methods, but still keep disks on the MAD curve. These magneto-resistive heads employ noninductive methods for reading, which work well with dense horizontal recording fields. However, a more conventional head is needed for writing, but this dual-head organization permits separate optimizations for read and write. Also, the choice of coding technique can have a significant effect on density. Standard modified frequency modulation techniques require approximately one flux change per bit, while more advanced run-length limited codes can increase density by an additional factor of 50%. Densities as high as 31,429 bpi can be attained with these techniques. As the recording densities increase, the transfer times decrease, as more bits transit beneath the heads per unit time. Of course, this approach provides no improvement in seek and latency times. Most of the increase in density comes from increases in the number of tracks per inch, which does not improve (and may actually reduce) performance. Although increased densities are inevitable, the problem is primarily economic. Increasing the tracks per inch may make seeks slower as it becomes more time consuming for the heads to correctly “lock” onto the appropriate track. The sensing electronics get more complex and thus more expensive. Once again, it can be argued that higher capacity can be achieved at lower cost by using several smaller disks rather than one expensive high-density disk. Solid-State Disks Solid-state disks (SSD), constructed from relatively slow memory chips, can be viewed either as a kind of large and slow main memory or as a small and high-speed disk. When viewed as large main memory, the SSD is often called expanded storage (ES). The expanded storage found in the IBM 3090 class machines [Buzen and Shum, 1986] supports operations for paging data blocks from and to main memory. Usually, the expanded storage looks to the system more like memory than an I/O device: it is directly attached to main memory through a high-speed bus rather than an I/O controller. The maximum transfer bandwidth on the IBM 3090 between expanded store and memory is two orders of magnitude faster than conventional devices: approxi- mately 216 Mbytes/s—one word each 18.5 ns! Further, unlike conventional devices, a transfer between memory and expanded storage is performed syn- chronously with the CPU. This is viewed as acceptable, because the transfer requires so little time and does not involve the usual operating system overheads of I/O set-up and interrupts. Note that to transfer data from ES to disk requires the data to be first staged into main memory. The Cray X-MP and Y-MP also support SSDs, which can come in configurations of up to 4096 Mbytes, approximately four times the capacity of the DD-49. The SSD has the potential for enormous bandwidth. It can be attached to the Cray IO system or directly to the CPU through up to two 1000-Mbyte/s channels. Access can be arranged in one of three ways [Reinhardt, 1988]. The first alternative is to treat the SSD as a logical disk, with users responsible for staging heavily accessed files to it. Unfortunately, this leads to the inevitable contention for SSD space. Further, the operating system’s disk device drivers are not tuned for the special capabilities of SSDs, and some performance is lost. The second alternative is to use the SSD as an extended memory, in much the same manner as IBM’s extended storage. Special system calls for accessing the SSD bypass the usual disk-handling code, and a 4096-byte sector can be accessed in 25 ms. The last alternative is to use the SSD as a logical device cache, i.e., as a second-level cache for multitrack chunks of files that resides between the system’s in-main memory file cache and the physical disk devices. Cray engineers have observed workload speedups for their UNIX-like operating system of a factor of four over conventional disk when the cache is enabled. These results indicate that SSDs are most appropriate for containing “hot spot” data [Gawlick, 1987]. Conventional wisdom has it that 20% of the data receives 80% of the accesses, and this has been widely observed in transaction processing systems [Gawlick, 1987]. If SSDs are to be used to replace magnetic disks, then they must be made nonvolatile, and herein lies their greatest weakness. This can be achieved through battery back-up, but the technique is controversial. First, it is difficult to verify that the batteries will be fully charged when needed, i.e., when conventional power fails. Second, it is difficult to determine how long is long enough when powering the SSD with batteries. This should probably be long enough to off-load the disk’s contents to magnetic media. Fortunately, low-power DRAM and wafer scale integration technology are making feasible longer battery hold times. ? 2000 by CRC Press LLC Another weakness is their cost. At the present time, there is more than a 10 to 20 times difference in price between the cost of a megabyte of magnetic disk memory and a megabyte of DRAM. While wafer scale integration may bring this price down in the future, for the near term SSDs will be limited to a staging or caching function. Disk Caches Disk caches place buffer memories between the host and the device. If disk data is likely to be re-referenced, caches can be effective in eliminating the seek and rotational latencies. Unfortunately, this effectiveness depends critically on the access behavior of the applications. Truly random access with little re-referencing cannot make effective use of disk caches. However, applications that exhibit a large degree of sequential access can use a cache to good purpose, because data can be staged into the cache before it is actually requested. Disk caches can become even more useful if they are made nonvolatile using the battery back-up techniques described in the previous subsection (and with the same potential problems). A nonvolatile cache will allow “fast writes”: the application need not wait for the write I/O to actually complete before it is notified that it has completed. For some applications environments, disk caches have the beneficial effect of reducing the number of reads and thus the number of I/O requests seen by the disks. This has the interesting side effect of increasing the percentage of writes found in the I/O mix, and some observers believe that writes may dominate I/O performance in future systems. As already mentioned, a disk cache can also lead to better utilization of the host-to-device pathways. A device can transfer data into a cache even if the pathway is in use by another device on the same string. Thus caches are effective in avoiding rotational position sensing misses. Disk Scheduling The mechanical delays as seen by a set of simultaneous I/O requests can be reduced through effective disk scheduling. For example, seek times can be reduced if a shortest-seek-time-first scheduling algorithm is used [Smith, 1981]. That is, among the queue of pending I/O requests, the one next selected for service is the one that requires the shortest seek time from the current location of the read/write heads. The literature on disk scheduling algorithms is vast, and the effectiveness of a particular scheduling approach depends critically on the workload. It has been observed that scheduling algorithms work best when there are long queues of pending requests; unfortunately, this situation seems to occur rarely in existing systems [Smith, 1981]. Disk Arrays An alternative to the approaches just described is to exploit parallelism by grouping together a number of physical disks and making these appear to applications as a single logical disk. This has the advantage that the bandwidth of several disks can be harnessed to service a single logical I/O request or can support multiple independent I/Os in parallel. Further, arrays can be constructed using existing, widely available disk technology, rather than the more specialized and more expensive approaches described in the previous subsection. For example, Cray offers a device called the DS-40, which appears as a single logical disk device but which is actually implemented internally as four drives. A logical track is constructed from sectors across the four disks. The DS-40 can transfer at a peak rate of 20 Mbyte/s, with a sustained transfer rate of 9.6 Mbyte/s, and thus is strictly faster than the DD-49. Defining Terms Arm: A mechanical assembly that positions the head to the correct track for reading or writing. Bandwidth: The amount of data per unit time flowing between host computers and storage devices. Cylinder: A stack of tracks at one acuator position. Disk drive: An HDA plus all associated electronics. Head: An electromagnet that produces switchable magnetic fields to read and record bit streams on a platter’s track. Head disk assembly (HDA): The collection of platters, heads, arms, and actuators, plus the air-tight casing, that makes up the storage device. Basically, this is everything but the electronics for controlling the drive and interfacing it to a computer system. ? 2000 by CRC Press LLC Latency: How long it takes to service an individual request. Maximal areal density (MAD): The maximum number of bits that can be stored per square inch. Computed by multiplying the bits per inch in a disk track times the number of tracks per inch of media. Platters: Metal disks covered with a magnetic material for recording information. Rotational latency: The time it takes for the desired sector to rotate under the head position before it can be read or written. Rotational position sensing: A storage device interrupts the I/O controller when the desired sector is under the heads. Sector: A unit of a storage that is physically read or written at the same time. Seek time: The time needed to position the heads to the appropriate track position containing the desired data. Spindle: The collection of disk platters. Track buffer: A memory buffer embedded in the disk drive. It can hold the contents of the current disk track. Tracks: The circular recording regions on a platter. Transfer time: The time taken to physically transfer the bytes from disk to the host. Throughput: The number of requests for disk service per unit time. Winchester disk: A magnetic disk in which the read/write heads fly above the recording surface on an air bearing. This is in contrast to contact recording, such as a floppy disk, in which the head and the magnetic media are actually touching. Related Topic 36.2 Magnetic Recording References I. Y. Bucher and A. H. Hayes, “I/O performance measurement on Cray-1 and CDC 7600 computers,” Proceedings of the Cray Users Group Conference, October 1980. J. Buzen, “BEST/1 analysis of the IBM 3880-13 cached storage controller,” Proc. CMG XIII Conference, 1982. J. P. Buzen and A. Shum, “I/O architecture in MVS/370 and MVS/XA,” CMG Transactions, vol. 54, pp. 19–26, Fall 1986. J. P. Buzen and A. Shum, “A unified operational treatment of RPS reconnect delays,” Proc. 1987 Sigmetrics Conference, Performance Evaluation Review, vol. 15, no. 1, May 1987. Cray Research, Inc., “CRAY Y-MP Computer Systems Functional Description Manual,” HR-4001, January 1988. P. D. Frank, “Advances in head technology,” presentation at Challenges in Disk Technology Short Course, Institute for Information Storage Technology, University of Santa Clara, Santa Clara, Calif., December 12–15, 1987. D. Gawlick, Private Communication, November 1987. J. M. Harker et al., “A quarter century of disk file innovation,” IBM Journal of Research and Development, vol. 25, no. 5, pp. 677–689, September 1981. G. Houtekamer, “The local disk controller,” Proc. 1985 Sigmetrics Conference, August 1985. N. P. Kronenberg, H. Levy, and W. D. Strecker, “VAXClusters: A closely-coupled distributed system,” ACM Trans. on Comp. Systems, vol. 4, no. 2, pp. 130–146, May 1986. P. Massiglia, Digital Large System Mass Storage Handbook, Colorado Springs, Col.: Digital Equipment Corpo- ration, 1986. S. Reinhardt, “A blueprint for the UNICOS operating system,” Cray Channels, vol. 10, no. 3, pp. 20–24, Fall 1988. A. J. Smith, “Input/output optimization and disk architectures: A survey,” in Performance and Evaluation 1, North-Holland Publishing Company, 1981, pp. 104–117. L. D. Stevens, “The evolution of magnetic storage,” IBM Journal of Research and Development, vol. 25, no. 5, pp. 663–675, September 1981. A. Vasudeva, “A case for disk array storage system,” Proc. Reliability Conference, Santa Clara, Calif., 1988. J. Voelcker, “Winchester disks reach for a gigabyte,” IEEE Spectrum, pp. 64–67, February 1987. ? 2000 by CRC Press LLC Further Information International Business Machines (IBM) Corporation developed the first rotating magnetic storage device in the mid-1950s and has always been an industry leader in the storage industry. In honor of the 25-year anniversary of the invention of the magnetic disk, IBM’s Journal of Research and Development in September 1981 reviewed the development of the technology up to that time. Two particularly notable papers are L. D. Stevens, “The evolution of magnetic storage,” IBM Journal of Research and Development, vol. 25, no. 5, pp. 663–675, September 1981. J. M. Harker et al., “A quarter century of disk file innovation,” IBM Journal of Research and Development, vol. 25, no. 5, pp. 677–689, September 1981. For a more up-to-date review of progress in the disk drive industry, see: J. Voelker, “Winchester disks reach for a gigabyte,” IEEE Spectrum, pp. 64–67, February 1987. 80.3 Magnetic Tape 1 Peter A. Lee Computers depend on memory to execute programs and to store program code and data. They also need access to stored program code and data in a nonvolatile memory (i.e., a form in which the information is not lost when the power is removed from the computer system). Different types of memory have been developed for different tasks. This memory can be categorized according to its price per bit, access time, and other parameters. Table 80.1 shows a typical hierarchy for memory which places the smallest and fastest memory at the top in level 0 and in general the largest, slowest, and cheapest at the bottom in level 4 [Ciminiera and Valenzano, 1987]. Auxiliary (secondary or mass) memory of level 4 forms the large storage capacity for program and code that are not currently required by the CPU. This is usually nonvolatile and is at a low cost per bit. Computer magnetic tape falls within this category and is the subject of this section. A Brief Historical Review Probably the first recorded storage device, developed by Schickard in 1623, used mechanical positions of cogs and gears to work a semi-automatic calculator. Then came Pascal’s calculating machine based on 10 digits per wheel. In 1812 punched cards were used in weaving looms to store patterns for woven material. Since that time there have been many mechanical and, latterly, electromechanical devices developed for memory and storage. In 1948 at Manchester University in England the cathode ray tube (Williams) and the magnetic drum were developed. These consisted of 1024 bits and 1280 bits and a magnetic drum capacity of 120K bits. Cambridge University developed the mercury delay line in 1949, which represented the first fully operational delay line memory, consisting of 576 bits per tube with a total capacity of 18K bits and a circulation time of 1.1 ms. The first commercial computer with a magnetic tape system was introduced in 1951. The UNIVAC I had a magnetic tape system of 1.44M bits on 150 feet of tape and was capable of storing 128 characters per inch. The tape could be read at a rate of 100 ips. Optical memories are now available as very fast storage devices and will replace magnetic storage in the next few years. At present these devices are expensive although it is envisaged that optical disks with large silicon caches will be the storage arrangement of the future where computer systems utilizing CAD software and image processing can take advantage of the large storage capacities with fast access times. In the future, semiconductor memories are likely to continue their advancing trend. Introduction Today’s microprocessors are capable of addressing up to 16 Mbytes of main memory. To take advantage of this large capacity, it is usual to have several programs residing in memory at the same time. With intelligent memory 1 Based on P. A. Lee, “Memory subsystems,” in Digital Systems Reference Book, B. Holdworth and G. R. Martins, Eds., Oxford: Butterworth-Heinemann, 1991, chap. 2.6. With permission. ? 2000 by CRC Press LLC management units (MMUs), the programs can be swopped in and out of the main memory to the auxiliary memory when required. For the system to keep pace with this program swopping, it must have a fast auxiliary memory to write to. In the past, most auxiliary systems like magnetic tape and disks have had slow access times, and this has meant that expensive systems have evolved to cater for this requirement. Now that auxiliary memory has improved, and access times are fast and the memory cheap, computer systems have been developed that provide memory swopping with large nonvolatile storage systems. Although the basic technology has not changed over the last 20 years, new materials and different approaches have meant that a new form of auxiliary memory has been brought to the market at a very cheap cost. Magnetic Tape Magnetic tape currently provides the cheapest form of storage for large quantities of computer data in a nonvolatile form. The tape is arranged on a reel and has several different packaging styles. It is made from a polyester transportation layer with a deposited layer of oxide having a property similar to a ferrite material with a large hysteresis. Magnetic tape is packaged either in a cartridge, on a reel, or in a cassette. The magnetic cartridge is manufactured in several tape lengths and cartridge sizes capable of storing up to 2 G (giga) bytes of data. These can be purchased in many popular preformatted styles. The magnetic tape reel is usually 1?2 inch or 1 inch wide and has lengths of 600, 1200, and 2400 feet. Most reels can store data at rates from 800 bits per inch (bpi) up to 6250 bpi. The reel-to-reel magnetic tape reader is generally bulkier and more expensive than the cartridge readers due to the complicated pneumatic drive mechanisms, but it provides a large data storage capacity with high access speeds [Wiehler, 1974]. An example of a typical magnetic tape drive with the reel-to-reel arrangement is shown in Fig. 80.16. A cheap storage medium is the magnetic cassette. Based on the audio cassette, this uses the normal audio cassette recorder for reading and writing data via the standard Kansas City interface through a serial computer I/O line. A logic data “1” is recorded by a high frequency and a logic data “0” by a lower frequency. High- density cassettes can store up to 60 Mbytes of data on each tape and are popular with the computer games market as a cheap storage medium for program distribution. Both reel-to-reel and cartridge tapes are generally organised by using nine separate tracks across the tape as shown in Fig. 80.17(a). Each track has its own read and write head operated independently from other tracks [see Fig. 80.17(b)]. Tracks 1 to 8 are used for data and track nine for the parity bit. Data is written on the tape in rows of magnetized islands, using for example EBCDIC (Extended Binary Coded Decimal Interchange Code). Each read/write head is shaped from a ferromagnetic material with an air gap 1 mm wide as seen in Fig. 80.18. The writing head is concerned with converting an electrical pulse into a magnetic state and can be magnetized in one of two directions. This is done by passing a current through the magnetic coil which sets up a leakage field across the 1-mm gap. When the current is reversed the field across the gap is changed, reversing the polarity of the magnetic field on the tape. The head magnetizes the passing magnetic tape recording the state of the magnetic field in the air gap. A logic 1 is recorded as a change in polarity on the tape, and a logic 0 is recorded as no change in polarity, as seen in Fig. 80.19. Reading the magnetic tape states from the tape and converting them to electrical signals is done by the read head. The bit sequences in Fig. 80.19 show the change in magnetic TABLE 80.1Memory Hierarchy Data Code MMU Level 0 CPU register Instruction registers MMU registers Level 1 Data cache Instruction cache MMU memory Level 2 On-board cache Level 3 Main memory Level 4 Auxiliary memory Source: P. A. Lee, “Memory subsystems,” in Digital Systems Refer- ence Book, B. Holdsworth and G. R. Martin, Eds., Oxford: Butter- worth-Heinemann, 1991, p. 2.6/3. With permission. ? 2000 by CRC Press LLC states on the tape. When the tape is passed over the read head, it induces a voltage into the magnetic coil which is converted to digital levels to retrieve the original data. Tape Format Information is stored on magnetic tape in the form of a coherent sequence of rows forming a block. This usually corresponds to a page of computer memory and is the minimum amount of data written to or read from FIGURE 80.16 (a) Magnetic tape drive. (b) Magnetic tape reel arrangement. (Source: K. London, Introduction to Computers, London: Faber and Faber, 1986, p. 141. With permission.) FIGURE 80.17 Magnetic tape format. (Source: P. A. Lee, “Memory subsystems,” in Digital Systems Reference Book, B. Holdsworth and G. R. Martin, Eds., Oxford: Butterworth-Heinemann, 1991, p. 2.6/11. With permission.) ? 2000 by CRC Press LLC magnetic tape with each program statement. Each block of data is separated by a block gap which is approxi- mately 15 mm long and has no data stored in it. This is shown in Fig. 80.20. Block gaps are used to allow the tape to accelerate to its operational speed and for the tape to decelerate when stopping at the end of a block. Block gaps use up to 50% of the tape space available for recording, although this may be reduced by making the block sizes larger but has the disadvantage of requiring larger memory buffers to accommodate the data. FIGURE 80.18 Read/write head layout. (Source: P. A. Lee, “Memory subsystems,” in Digital Systems Reference Book, B. Holdsworth and G. R. Martin, Eds., Oxford: Butterworth-Heinemann, 1991, p. 2.6/12. With permission.) FIGURE 80.19 Write and read pulses on magnetic tape. (Source: P. A. Lee, “Memory subsystems,” in Digital Systems Reference Book, B. Holdsworth and G. R. Martin, Eds., Oxford: Butterworth-Heinemann, 1991, p. 2.6/12. With permission.) FIGURE 80.20 Magnetic tape format. (Source: P. A. Lee, “Memory subsystems,” in Digital Systems Reference Book, B. Holdsworth and G. R. Martin, Eds., Oxford: Butterworth-Heinemann, 1991, p. 2.6/12. With permission.) ? 2000 by CRC Press LLC A number of blocks make up a file identified by a tape file marker which is written to the tape by the tape controller. The entire length of tape is enclosed between the beginning and end of tape markers. These normally consist of a photosensitive material that triggers sensors on the read/write heads. When a new tape is loaded, it normally advances to the beginning of a tape marker and then it is ready for access by the CPU. The end of tape marker is used to prevent the tape from running off the end of the tape spool and indicates the limit of the storage length. Recording Modes Several recording modes are used with the express objective of storing data at the highest density and with the greatest reliability of noncorruption of retrieved data. Two popular but contrasting modes are the non-return- to-zero (NRZ) and phase encoding (PE) modes. These are incompatible although some magnetic tape drives have detectors to sense the mode and operate in a bimodal way. The NRZ technique is shown in Fig. 80.19, where only the 1 bit is displayed by a reversal of magnetization on the tape. The magnetic polarity remains unchanged for logic 0. An external clock track is also required for this mode because a pulse is not always generated for each row of data on the tape. The PE technique allows both the 0 and 1 states to be displayed by changes of magnetization. A 1 bit is given by a north-to-north pole on the tape, and a 0 bit is given by a south-to-south pole on the tape. PE provides approximately double the recording density and processor speed of NRZ. PE tapes carry an identification mark called a burst, which consists of successive magnetization changes at the beginning of track 4. This allows the tape drive to recognize the tape mode and configure itself accordingly. Defining Terms Access time: The cycle time for the computer store to present information to the CPU. Access times vary from less than 40 ns for level 0 register storage up to tens of seconds for magnetic tape storage. Auxiliary (secondary, mass, or backing) storage: Computer stores which have a capacity to store enormous amounts of information in a nonvolatile form. This type of memory has an access time usually greater than main memory and consists of magnetic tape drives, magnetic disk stores, and optical disk stores. Ferromagnetic material: Materials that exhibit high magnetic properties. These include metals such as cobalt, iron, and some alloys. Magnetic tape: A polyester film sheet coated with a ferromagnetic powder, which is used extensively in auxiliary memory. It is produced on a reel, in a cassette, or in a cartridge transportation medium. Nonvolatile memory: The class of computer memory that retains its stored information when the power supply is cut off. It includes magnetic tape, magnetic disks, flash memory, and most types of ROM. Related Topic 36.2 Magnetic Recording References L. Ciminiera and A. Valenzano, Advanced Microprocessor Architectures, Reading, Mass.: Addison-Wesley, 1987. B. Holdsworth and G. Martin, Eds, Digital Systems Reference Book, Oxford: Butterworth-Heinemann, 1991, pp 2.6/1–2.6/11. R. Hyde, “Overview of memory management,” Byte, pp. 219–225, April 1988. J. Isailovíc, Video Disc and Optical Memory Systems, Englewood Cliffs, N.J.: Prentice-Hall, 1985. K. London, Introduction to Computers, London: Faber and Faber Press, 1986, p. 141. M. Mano, Computer Systems Architecture, Englewood Cliffs, N.J.: Prentice-Hall, 1982. R. Matick, Computer Storage Systems & Technology, New York: John Wiley, 1977. A. Tanenbaum, Structured Computer Organisation, Englewood Cliffs, N.J.: Prentice-Hall, 1990. G. Wiehler, Magnetic Peripheral Data Storage, Heydon & Son, 1974. ? 2000 by CRC Press LLC Further Information The IEEE Transactions on Magnetics is available from the IEEE Service Center, Customer Service Department, 445 Hoes Lane, Piscataway, NJ 08855-1331; 800-678-IEEE (outside the USA: 908-981-0060). An IEEE-sponsored Conference on Magnetism and Magnetic Materials was held in December 1992. The British Tape Industry Association (BTIA) has a computer media committee, and further information on standards, etc. can be obtained from British Tape Industry Association, Carolyn House, 22-26 Dingwall Road, Croydon CR0 9XF, England. The equivalent American Association also provides information on computer tape and can be con- tacted at International Tape Manufacturers’ Association, 505 Eighth Avenue, New York, NY 10018. 80.4 Magneto-Optical Disk Data Storage M. Mansuripur Since the early 1940s, magnetic recording has been the mainstay of electronic information storage worldwide. Audio tapes provided the first major application for the storage of information on magnetic media. Magnetic tape has been used extensively in consumer products such as audio tapes and video cassette recorders (VCRs); it has also found application in backup/archival storage of computer files, satellite images, medical records, etc. Large volumetric capacity and low cost are the hallmarks of tape data storage, although sequential access to the recorded information is perhaps the main drawback of this technology. Magnetic hard disk drives have been used as mass storage devices in the computer industry ever since their inception in 1957. With an areal density that has doubled roughly every other year, hard disks have been and remain the medium of choice for secondary storage in computers. 1 Another magnetic data storage device, the floppy disk, has been successful in areas where compactness, removability, and fairly rapid access to the recorded information have been of prime concern. In addition to providing backup and safe storage, inexpensive floppies with their moderate capacities (2 Mbyte on a 3.5-in. diameter platter is typical nowadays) and reasonable transfer rates have provided the crucial function of file/data transfer between isolated machines. All in all, it has been a great half-century of progress and market dominance for magnetic recording devices, which are only now beginning to face a potentially serious challenge from the technology of optical recording. Like magnetic recording, a major application area for optical data storage systems is the secondary storage of information for computers and computerized systems. Like the high-end magnetic media, optical disks can provide recording densities in the range of 10 7 bits/cm 2 and beyond. The added advantage of optical recording is that, like floppies, these disks can be removed from the drive and stored on the shelf. Thus the functions of the hard disk (i.e., high capacity, high data transfer rate, rapid access) may be combined with those of the floppy (i.e., backup storage, removable media) in a single optical disk drive. Applications of optical recording are not confined to computer data storage. The enormously successful audio compact disk (CD), which was introduced in 1983 and has since become the de facto standard of the music industry, is but one example of the tremendous potentials of the optical technology. A strength of optical recording is that, unlike its magnetic counterpart, it can support read-only, write-once, and erasable/rewritable modes of data storage. Consider, for example, the technology of optical audio/video disks. Here the information is recorded on a master disk which is then used as a stamper to transfer the embossed patterns to a plastic substrate for rapid, accurate, and inexpensive reproduction. The same process is employed in the mass production of read-only files (CD-ROM, O-ROM) which are now being used to distribute software, catalogues, and other large databases. Or consider the write-once read-many (WORM) technology, where one can permanently store massive amounts of information on a given medium and have rapid, random access to them afterwards. The optical drive can be designed to handle read-only, WORM, and erasable media all in one unit, thus combining their useful features without sacrificing performance and ease of use or occupying too 1 At the time of this writing, achievable densities on hard disks are in the range of 10 7 bits/cm 2 . Random access to arbitrary blocks of data in these devices can take on the order of 10 ms, and individual read/write heads can transfer data at the rate of several megabits per second. ? 2000 by CRC Press LLC much space. What is more, the media can contain regions with prerecorded information as well as regions for read/write/erase operations, both on the same platter. These possibilities open new vistas and offer opportunities for applications that have heretofore been unthinkable; the interactive video disk is perhaps a good example of such applications. In this article we will lay out the conceptual basis for optical data storage systems; the emphasis will be on disk technology in general and magneto-optical disk in particular. The first section is devoted to a discussion of some elementary aspects of disk data storage including the concept of track and definition of the access time. The second section describes the basic elements of the optical path and its functions; included are the properties of the semiconductor laser diode, characteristics of the beamshaping optics, and certain features of the focusing objective lens. Because of the limited depth of focus of the objective and the eccentricity of tracks, optical disk systems must have a closed-loop feedback mechanism for maintaining the focused spot on the right track. These mechanisms are described in the third and fourth sections for automatic focusing and automatic track following, respectively. The physical process of thermomagnetic recording in magneto-optic (MO) media is described next, followed by a discussion of the MO readout process in the sixth section. The final section describes the properties of the MO media. Preliminaries and Basic Definitions A disk, whether magnetic or optical, consists of a number of tracks along which the information is recorded. These tracks may be concentric rings of a certain width, W t , as shown in Fig. 80.21. Neighboring tracks may be separated from each other by a guard band whose width we shall denote by W g . In the least sophisticated recording scheme imaginable, marks of length D 0 are recorded along these tracks. Now, if each mark can be in either one of two states, present or absent, it may be associated with a binary digit, 0 or 1. When the entire disk surface of radius R is covered with such marks, its capacity C 0 will be (80.2) Consider the parameter values typical of current optical disk technology: R = 67 mm corresponding to 5.25-in. diameter platters, D 0 = 0.5 mm which is roughly determined by the wavelength of the read/write laser diodes, and W t + W g = 1 mm for the track pitch. The disk capacity will then be around 28 ′ 10 9 bits, or 3.5 gigabytes. This is a reasonable estimate and one that is fairly close to reality, despite the many simplifying assumptions made in its derivation. In the following paragraphs we examine some of these assumptions in more detail. FIGURE 80.21 Physical appearance and general features of an optical disk. The read/write head gains access to the disk through a window in the jacket; the jacket itself is for protection purposes only. The hub is the mechanical interface with the drive for mounting and centering the disk on the spindle. The track shown at radius r 0 is of the concentric-ring type. C R WW tg 0 2 0 = + p ()D bits per surface ? 2000 by CRC Press LLC The disk was assumed to be fully covered with information-carrying marks. This is generally not the case in practice. Consider a disk rotating at 1 revolutions per second (rps). For reasons to be clarified later, this rotational speed should remain constant during the disk operation. Let the electronic circuitry have a fixed clock duration T c . Then only pulses of length T c (or an integer multiple thereof) may be used for writing. Now, a mark written along a track of radius r, with a pulse-width equal to T c , will have length ,, where , = 2p 1 rT c (80.3) Thus for a given rotational speed 1 and a fixed clock cycle T c , the minimum mark length , is a linear function of track radius r, and , decreases toward zero as r approaches zero. One must, therefore, pick a minimum usable track radius, r min , where the spatial extent of the recorded marks is always greater than the minimum allowed mark length, D 0 . Equation (80.3) yields (80.4) One may also define a maximum usable track radius r max , although for present purposes r max = R is a perfectly good choice. The region of the disk used for data storage is thus confined to the area between r min and r max . The total number N of tracks in this region is given by (80.5) The number of marks on any given track in this scheme is independent of the track radius; in fact, the number is the same for all tracks, since the period of revolution of the disk and the clock cycle uniquely determine the total number of marks on any individual track. Multiplying the number of usable tracks N with the capacity per track, we obtain for the usable disk capacity (80.6) Replacing for N from Eq. (80.5) and for 1 T c from Eq. (80.4), we find, (80.7) If the capacity C in Eq. (80.7) is considered a function of r min with the remaining parameters held constant, it is not difficult to show that maximum capacity is achieved when r min = ? r max (80.8) With this optimum r min , the value of C in Eq. (80.7) is only half that of C 0 in Eq. (80.2). In other words, the estimate of 3.5 gigabyte per side for 5.25-in. disks seems to have been optimistic by a factor of two. One scheme often proposed to enhance the capacity entails the use of multiple zones, where either the rotation speed 1 or the clock period T c is allowed to vary from one zone to the next. In general, zoning schemes can reduce the minimum usable track radius below that given by Eq. (80.8). More importantly, however, they allow tracks with larger radii to store more data than tracks with smaller radii. The capacity of the zoned disk is somewhere between C of Eq. (80.7) and C 0 of Eq. (80.2), the exact value depending on the number of zones implemented. r T c min = D 0 2p1 N rr WW tg = - + max min C N T c = 1 C rr r WW tg = - + 2 0 p min max min () ()D ? 2000 by CRC Press LLC A fraction of the disk surface area is usually reserved for preformat information and cannot be used for data storage. Also, prior to recording, additional bits are generally added to the data for error correction coding and other housekeeping chores. These constitute a certain amount of overhead on the user data and must be allowed for in determining the capacity. A good rule of thumb is that overhead consumes approximately 20% of the raw capacity of an optical disk, although the exact number may vary among the systems in use. Substrate defects and film contaminants during the deposition process can create bad sectors on the disk. These are typically identified during the certification process and are marked for elimination from the sector directory. Needless to say, bad sectors must be discounted when evaluating the capacity. Modulation codes may be used to enhance the capacity beyond what has been described so far. Modulation coding does not modify the minimum mark length of D 0 , but frees the longer marks from the constraint of being integer multiples of D 0 . The use of this type of code results in more efficient data storage and an effective number of bits per D 0 that is greater than unity. For example, the popular (2, 7) modulation code has an effective bit density of 1.5 bits per D 0 . This or any other modulation code can increase the disk capacity beyond the estimate of Eq. (80.7). The Concept of Track The information on magnetic and optical disks is recorded along tracks. Typically, a track is a narrow annulus at some distance r from the disk center. The width of the annulus is denoted by W t , while the width of the guard band, if any, between adjacent tracks is denoted by W g . The track pitch is the center-to-center distance between neighboring tracks and is therefore equal to W t + W g . A major difference between the magnetic floppy disk, the magnetic hard disk, and the optical disk is that their respective track pitches are presently of the order of 100, 10, and 1 mm. Tracks may be fictitious entities, in the sense that no independent existence outside the pattern of recorded marks may be ascribed to them. This is the case, for example, with the audio compact disk format where prerecorded marks simply define their own tracks and help guide the laser beam during readout. In the other extreme are tracks that are physically engraved on the disk surface before any data is ever recorded. Examples of this type of track are provided by pregrooved WORM and magneto-optical disks. Figure 80.22 shows micrographs from several recorded optical disk surfaces. The tracks along which the data are written are clearly visible in these pictures. It is generally desired to keep the read/write head stationary while the disk spins and a given track is being read from or written onto. Thus, in an ideal situation, not only should the track be perfectly circular, but also the disk must be precisely centered on the spindle axis. In practical systems, however, tracks are neither precisely circular, nor are they concentric with the spindle axis. These eccentricity problems are solved in low-perfor- mance floppy drives by making tracks wide enough to provide tolerance for misregistrations and misalignments. Thus the head moves blindly to a radius where the track center is nominally expected to be and stays put until the reading or writing is over. By making the head narrower than the track pitch, the track center is allowed to wobble around its nominal position without significantly degrading the performance during the read/write operation. This kind of wobble, however, is unacceptable in optical disk systems, which have a very narrow track, about the same size as the focused beam spot. In a typical situation arising in practice, the eccentricity of a given track may be as much as ±50 mm while the track pitch is only about 1 mm, thus requiring active track-following procedures. One method of defining tracks on an optical disk is by means of pregrooves that are either etched, stamped, or molded onto the substrate. In grooved media of optical storage, the space between neighboring grooves is the so-called land [see Fig. 80.23(a)]. Data may be written in the grooves with the land acting as a guard band. Alternatively, the land regions may be used for recording while the grooves separate adjacent tracks. The groove depth is optimized for generating an optical signal sensitive to the radial position of the read/write laser beam. For the push-pull method of track-error detection the groove depth is in the neighborhood of l/8, where l is the wavelength of the laser beam. In digital data storage applications, each track is divided into small segments or sectors, intended for the storage of a single block of data (typically either 512 or 1024 bytes). The physical length of a sector is thus a few millimeters. Each sector is preceded by header information such as the identity of the sector, identity of the corresponding track, synchronization marks, etc. The header information may be preformatted onto the ? 2000 by CRC Press LLC substrate, or it may be written on the storage layer prior to shipping the disk. Pregrooved tracks may be “carved” on the optical disk either as concentric rings or as a single continuous spiral. There are certain advantages to each format. A spiral track can contain a succession of sectors without interruption, whereas concentric rings may each end up with some empty space that is smaller than the required length for a sector. Also, large files may be written onto (and read from) spiral tracks without jumping to the next track, which occurs when concentric tracks are used. On the other hand, multiple-path operations such as write-and-verify or erase-and- write, which require two paths each for a given sector, or still-frame video are more conveniently handled on concentric-ring tracks. Another track format used in practice is based on the sampled-servo concept. Here the tracks are identified by occasional marks placed permanently on the substrate at regular intervals, as shown in Fig. 80.23. Details of track following by the sampled-servo scheme will follow shortly; suffice it to say at this point that servo marks help the system identify the position of the focused spot relative to the track center. Once the position is determined it is fairly simple to steer the beam and adjust its position. FIGURE 80.22 Micrographs of several types of optical storage media. The tracks are straight and narrow (track pitch = 1.6 mm), with an orientation angle of . –45°. (A) Ablative, write-once tellurium alloy. (B) Ablative, write-once organic dye. (C) Amorphous-to-crystalline, write-once phase-change alloy GaSb. (D) Erasable, amorphous magneto-optic alloy GdTbFe. (E) Erasable, crystalline-to-amorphous phase-change tellurium alloy. (F) Read-only CD-audio, injection-molded from polycarbonate with a nickel stamper. (Source: Ullmann’s Encyclopedia of Industrial Chemistry, 5th ed., vol. A14, Weinheim: VCH, 1989, p. 196. With permission.) ? 2000 by CRC Press LLC Disk Rotation Speed When a disk rotates at a constant angular velocity w, a track of radius r moves with the constant linear velocity V = rw . Ideally, one would like to have the same linear velocity for all the tracks, but this is impractical except in a limited number of situations. For instance, when the desired mode of access to the various tracks is sequential, such as in audio and video disk applications, it is possible to place the head in the beginning at the inner radius and move outward from the center thereafter while continuously decreasing the angular velocity. By keeping the product of r and w constant, one can thus achieve constant linear velocity for all the tracks. 1 Sequential access mode, however, is the exception rather than the norm in data storage systems. In most appli- cations, the tracks are accessed randomly with such rapidity that it becomes impossible to adjust the rotation speed for constant linear velocity. Under these circumstances, the angular velocity is best kept constant during the normal operation of the disk. Typical rotation speeds are 1200 and 1800 rpm for slower drives and 3600 rpm for the high data rate systems. Higher rotation rates (5000 rpm and beyond) are certainly feasible and will likely appear in future storage devices. Access Time The direct-access storage device or DASD, used in computer systems for the mass storage of digital information, is a disk drive capable of storing large quantities of data and accessing blocks of this data rapidly and in arbitrary order. In read/write operations it is often necessary to move the head to new locations in search of sectors containing specific data items. Such relocations are usually time-consuming and can become the factor that limits performance in certain appli- cations. The access time t a is defined as the average time spent in going from one randomly selected spot on the disk to another. t a can be considered the sum of a seek time, t s , which is the average time needed to acquire the target track, and a latency, t l , which is the average time spent on the target track waiting for the desired sector. Thus, t a = t s + t l (80.9) The latency is half the revolution period of the disk, since a randomly selected sector is, on the average, halfway along the track from the point where the head initially lands. Thus for a disk rotating at 1200 rpm t l = 25 ms, while at 3600 rpm t l . 8.3 ms. The seek time, on the other hand, is independent of the rotation speed, but is determined by the traveling distance of the head during an average seek, as well as by the mechanism of head actuation. It can be shown that the average length of travel in a random seek is one third of the full stroke. (In our notation the full stroke is r max – r min .) In magnetic disk drives where the head/actuator assembly is relatively light-weight (a typical Winchester head weighs about 5 grams) the acceleration and deceleration periods are short, and seek times are typically around 10 ms in small drives (i.e., 5.25 and 3.5 in.). In optical disk systems, 1 In compact disk players the linear velocity is kept constant at 1.2 m/s. The starting position of the head is at the inner radius r min = 25 mm, where the disk spins at 460 rpm. The spiral track ends at the outer radius r max = 58 mm, where the disk’s angular velocity is 200 rpm. FIGURE 80.23 (a) Lands and grooves in an optical disk. The substrate is transparent, and the laser beam must pass through it before reaching the storage medium. (b) Sampled-servo marks in an optical disk. These marks which are offset from the track-center provide information regarding the position of focused spot. ? 2000 by CRC Press LLC on the other hand, the head, being an assembly of discrete elements, is fairly large and heavy (typical weight .100 grams), resulting in values of t s that are several times greater than those obtained in magnetic recording systems. The seek times reported for commercially available optical drives presently range from 20 ms in high- performance 3.5-in. drives to about 80 ms in larger drives. We emphasize, however, that the optical disk tech- nology is still in its infancy; with the passage of time, the integration and miniaturization of the elements within the optical head will surely produce lightweight devices capable of achieving seek times of the order of a few milliseconds. The Optical Path The optical path begins at the light source which, in practically all laser disk systems in use today, is a semiconductor GaAs diode laser. Several unique features have made the laser diode indispensable in optical recording technology, not only for the readout of stored information but also for writing and erasure. The small size of this laser has made possible the construction of compact head assemblies, its coherence properties have enabled diffraction-limited focusing to extremely small spots, and its direct modulation capability has eliminated the need for external modulators. The laser beam is modulated by con- trolling the injection current; one applies pulses of variable duration to turn the laser on and off during the recording process. The pulse duration can be as short as a few nanosec- onds, with rise and fall times typically less than 1 ns. Although readout can be accomplished at constant power level, i.e., in CW mode, it is customary for noise reduction purposes to modulate the laser at a high frequency (e.g., several hundred megahertz during readout). Collimation and Beam Shaping Since the cross-sectional area of the active region in a laser diode is only about one micrometer, diffraction effects cause the emerging beam to diverge rapidly. This phenomenon is depicted schematically in Fig. 80.24(a). In practical applica- tions of the laser diode, the expansion of the emerging beam is arrested by a collimating lens, such as that shown in Fig. 80.24(b). If the beam happens to have aberrations (astig- matism is particularly severe in diode lasers), then the colli- mating lens must be designed to correct this defect as well. In optical recording it is most desirable to have a beam with circular cross section. The need for shaping the beam arises from the special geometry of the laser cavity with its rectan- gular cross section. Since the emerging beam has different dimensions in the directions parallel and perpendicular to the junction, its cross section at the collimator becomes elliptical, with the initially narrow dimension expanding more rapidly to become the major axis of the ellipse. The collimating lens thus produces a beam with elliptical cross section. Circular- ization may be achieved by bending various rays of the beam at a prism, as shown in Fig. 80.24(c). The bending changes the beam’s diameter in the plane of incidence but leaves the diam- eter in the perpendicular direction intact. FIGURE 80.24 (a) Away from the facet, the out- put beam of a diode laser diverges rapidly. In gen- eral, the beam diameter along X is different from that along Y, which makes the cross section of the beam elliptical. Also, the radii of curvature R x and R y are not the same, thus creating a certain amount of astigmatism in the beam. (b) Multi-element col- limator lens for laser diode applications. Aside from collimating, this lens also corrects astigmatic aber- rations of the beam. (c) Beam shaping by deflection at a prism surface. q 1 and q 2 are related by the Snell’s law, and the ratio d 2 /d 1 is the same as cos q 2 /cos q 1 . Passage through the prism circularizes the elliptical cross section of the beam. ? 2000 by CRC Press LLC Focusing by the Objective Lens The collimated and circularized beam of the diode laser is focused on the surface of the disk using an objective lens. The objective is designed to be aberration-free, so that its focused spot size is limited only by the effects of diffraction. Figure 80.25(a) shows the design of a typical objective made from spherical optics. According to the classical theory of diffraction, the diameter of the beam, d, at the objective’s focal plane is given by (80.10) where l is the wavelength of light, and NA is the numerical aperture of the objective. 1 In optical recording it is desired to achieve the smallest possible spot, since the size of the spot is directly related to the size of marks recorded on the medium. Also, in readout, the spot size determines the resolution of the system. According to Eq. (80.10) there are two ways to achieve a small spot: first by reducing the wavelength and, second, by increasing the numerical aperture of the objec- tive. The wavelengths currently available from GaAs lasers are in the range of 670–840 nm. It is possible to use a nonlinear optical device to double the frequency of these diode lasers, thus achieving blue light. Good efficiencies have been demonstrated by frequency doubling. Also recent developments in II–VI materials have improved the prospects for obtaining green and blue light directly from semiconductor lasers. Consequently, there is hope that in the near future optical storage systems will operate in the wavelength range of 400–500 nm. As for the numerical aperture, current practice is to use a lens with NA .0.5–0.6. Although this value might increase slightly in the coming years, much higher numerical apertures are unlikely, since they put strict constraints on the other characteristics of the system and limit the tolerances. For instance, the working distance at high numerical aperture is relatively short, making access to the recording layer through the substrate more difficult. The smaller depth of focus of a high numerical aperture lens will make attain- ing/maintaining proper focus more of a problem, while the limited field of view might restrict automatic track-following procedures. A small field of view also places constraints on the possibility of read/write/erase operations involving multiple beams. The depth of focus of a lens, d, is the distance away from the focal plane over which tight focus can be maintained [see Fig. 80.25(b)]. According to the classical diffraction theory 1 Numerical aperture is defined as NA = n sin q, where n is the refractive index of the image space, and q is the half-angle subtended by the exit pupil at the focal point. In optical recording systems the image space is air whose index is very nearly unity; thus for all practical purposes NA = sin q. FIGURE 80.25 (a) Multi-element lens design for a high numerical aperture video disk objective. (Source: D. Kuntz, “Specifying laser diode optics,” Laser Focus, March 1984. With permission.) (b) Various parameters of the objective lens. The numerical aperture is NA = sin q. The spot diameter d and the depth of focus d are given by Eqs. (80.10) and (80.11), respectively. (c) Focusing through the substrate can cause spherical aberration at the active layer. The problem can be cor- rected if the substrate is taken into account while design- ing the objective. d NA . l ? 2000 by CRC Press LLC (80.11) Thus for a wavelength of l = 700 nm and NA = 0.6, the depth of focus is about ±1 mm. As the disk spins under the optical head at the rate of several thousand rpm, the objective lens must stay within a distance of f ± d from the active layer if proper focus is to be maintained. Given the conditions under which drives usually operate, it is impossible to make rigid enough mechanical systems to yield the required positioning tolerances. On the other hand, it is fairly simple to mount the objective lens in an actuator capable of adjusting its position with the aid of closed-loop feedback control. We shall discuss the technique of automatic focusing in the next section. For now, let us emphasize that by going to shorter wavelengths and/or larger numerical apertures (as is required for attaining higher data densities) one will have to face a much stricter regime as far as automatic focusing is concerned. Increasing the numerical aperture is particularly worrisome, since d drops with the square of NA. A source of spherical aberrations in optical disk systems is the substrate through which the light must travel to reach the active layer of the disk. Figure 80.25(c) shows the bending of the rays at the disk surface that causes the aberration. This problem can be solved by taking into account the effects of the substrate in the design of the objective, so that the lens is corrected for all aberrations including those arising at the substrate. Recent developments in molding of aspheric glass lenses have gone a long way in simplifying the lens design problem. Figure 80.26 shows a pair of molded glass aspherics designed for optical disk system applications; both the collimator and the objective are single-element lenses and are corrected for aberrations. Automatic Focusing We mentioned in the preceding section that since the objective has a large numerical aperture (NA 3 0.5), its depth of focus d is rather shallow (d . ±1 mm at l = 780 nm). During all read/write/erase operations, therefore, the disk must remain within a fraction of a micrometer from the focal plane of the objective. In practice, however, the disks are not flat and they are not always mounted rigidly parallel to the focal plane, so that movements away from focus occur a few times during each revolution. The peak-to-peak movement in and out of focus may be as much as 100 mm. Without automatic focusing of the objective along the optical axis, this runout (or disk flutter) will be detrimental to the operation of the system. In practice, the objective is mounted on a small motor (usually a voice coil) and allowed to move back and forth in order to keep its distance within an acceptable range from the disk. The spindle turns at a few thousand rpm, which is a hundred or so revolutions per second. If the disk moves in and out of focus a few times during each revolution, then the voice coil must be fast enough to follow these movements in real time; in other words, its frequency response must extend to several kilohertz. The signal that controls the voice coil is obtained from the light reflected from the disk. There are several techniques for deriving the focus error signal, one of which is depicted in Fig. 80.27(a). In this so-called obscuration method a secondary lens is placed in the path of the reflected light, one-half of its aperture is covered, and a split detector is placed at its focal plane. When the disk is in focus, the returning beam is collimated and the secondary lens will focus the beam at the center of the split detector, giving a difference FIGURE 80.26 Molded glass aspheric lens pair for optical disk applications. These singlets can replace the multi-element spherical lenses shown in Figs. 80.24(b) and 80.25(a). d l . NA 2 ? 2000 by CRC Press LLC signal DS equal to zero. If the disk now moves away from the objective, the returning beam will become converging, as in Fig. 80.27(b), sending all the light to detector #1. In this case DS will be positive and the voice coil will push the lens towards the disk. On the other hand, when the disk moves close to the objective, the returning beam becomes diverging and detector #2 receives the light [see Fig. 80.27(c)]. This results in a negative DS that forces the voice coil to pull back in order to return DS to zero. A given focus error detection scheme is generally characterized by the shape of its focus error signalDS versus the amount of defocus Dz; one such curve is shown in Fig. 80.27(d). The slope of the focus error signal (FES) curve near the origin is of particular importance, since it determines the overall performance and stability of the servo loop. Automatic Tracking Consider a track at a certain radial location, say r 0 , and imagine viewing this track through the access window shown in Fig. 80.21. It is through this window that the head gains access to arbitrarily selected tracks. To a viewer looking through the window, a perfectly circular track centered on the spindle axis will look stationary, irrespective of the rotation rate. However, any eccentricity will cause an apparent radial motion of the track. The peak-to-peak distance traveled by a track (as seen through the window) depends on a number of factors including centering accuracy of the hub, deformability of the substrate, mechanical vibrations, manufacturing tolerances, etc. For a typical 3.5-in. disk, for example, this peak-to-peak motion can be as much as 100 mm during one revolution. Assuming a revolution rate of 3600 rpm, the apparent velocity of the track in the radial direction will be several millimeters per second. Now, if the focused spot remains stationary while trying to read from or write to this track, it is clear that the beam will miss the track for a good fraction of every revolution cycle. Practical solutions to the above problem are provided by automatic tracking techniques. Here the objective is placed in a fine actuator, typically a voice coil, which is capable of moving the necessary radial distances and FIGURE 80.27 Focus error detection by the obscuration method. In (a) the disk is in focus, and the two halves of the split detector receive equal amounts of light. When the disk is too far from the objective (b) or too close to it (c), the balance of detector signals shifts to one side or the other. A plot of the focus error signal (FES) versus defocus is shown in (d), and its slope near the origin is identified as the FES gain, G. ? 2000 by CRC Press LLC maintaining a lock on the desired track. The signal that controls the movement of this actuator is derived from the reflected light itself, which carries information about the position of the focused spot. There exist several mechanisms for extracting the track error signal (TES); all these methods require some sort of structure on the disk surface in order to identify the track. In the case of read-only disks (CD, CD-ROM, and video disk), the embossed pattern of data provides ample information for tracking purposes. In the case of write-once and erasable disks, tracking guides are “carved” on the sub- strate in the manufacturing process. As mentioned ear- lier, the two major formats for these tracking guides are pregrooves (for continuous tracking) and sampled- servo marks (for discrete tracking). A combination of the two schemes, known as continuous/composite format, is often used in practice. This scheme is depicted in Fig. 80.28 which shows a small section containing five tracks, each consisting of the tail end of a groove, synchronization marks, a mirror area used for adjusting focus/track offsets, a pair of wobble marks for sampled tracking, and header information for sector identification. Tracking on Grooved Regions As shown in Fig. 80.23(a), grooves are continuous depressions that are either embossed or etched or molded onto the substrate prior to deposition of the storage medium. If the data is recorded on the grooves, then the lands are not used except for providing a guard band between neighboring grooves. Conversely, the land regions may be used to record the information, in which case grooves provide the guard band. Typical track widths are about one wavelength. The guard bands are somewhat narrower than the tracks, their exact shape and dimensions depending on the beam size, required track-servo accuracy, and the acceptable levels of cross-talk between adjacent tracks. The groove depth is usually around one-eighth of one wavelength (l/8), since this depth can be shown to give the largest TES in the push-pull method. Cross sections of the grooves may be rectangular, trapezoidal, triangular, etc. When the focused spot is centered on track, it is diffracted symmetrically from the two edges of the track, resulting in a balanced far field pattern. As soon as the spot moves away from the center, the symmetry breaks down and the light distribution in the far field tends to shift to one side or the other. A split photodetector placed in the path of the reflected light can therefore sense the relative position of the spot and provide the appropriate feedback signal. This strategy is depicted schematically in Fig. 80.29; also shown in the figure are intensity plots at the detector plane for light reflected from various regions of the disk. Note how the intensity shifts to one side or the other depending on the direction of motion of the spot. Sampled Tracking Since dynamic track runout is usually a slow and gradual process, there is actually no need for continuous tracking as done on grooved media. A pair of embedded marks, offset from the track center as in Fig. 80.23(b), can provide the necessary information for correcting the relative position of the focused spot. The reflected intensity will indicate the positions of the two servo marks as two successive short pulses. If the beam happens to be on track, the two pulses will have equal magnitudes and there will be no need for correction. If, on the other hand, the beam is off-track, one of the pulses will be stronger than the other. Depending on which pulse is the stronger, the system will recognize the direction in which it has to move and will correct the error accordingly. The servo marks must appear frequently enough along the track to ensure proper track following. In a typical application, the track might be divided into groups of 18 bytes, 2 bytes dedicated as servo offset areas and 16 bytes filled with other format information or left blank for user data. Thermomagnetic Recording Process Recording and erasure of information on a magneto-optical disk are both achieved by the thermomagnetic process. The essence of thermomagnetic recording is shown in Fig. 80.30. At the ambient temperature the film FIGURE 80.28 Servo fields in continuous/composite format contain a mirror area and offset marks for tracking Groove Synch Mark MirrorWobble Marks Header ? 2000 by CRC Press LLC has a high magnetic coercivity 1 and therefore does not respond to the externally applied field. When a focused beam raises the local temperature of the film, the hot spot becomes magnetically soft (i.e., its coercivity drops). As the temperature rises, coercivity drops continuously until such time as the field of the electromagnet finally overcomes the material’s resistance to reversal and switches its magnetization. Turning the laser off brings the temperatures back to normal, but the reverse-magnetized domain remains frozen in the film. In a typical situation in practice, the film thickness may be around 300 ?, laser power at the disk .10 mW, diameter of the focused spot .1 mm, laser pulse duration .50 ns, linear velocity of the track .10 m/s, and the magnetic field strength .200 gauss. The temperature may reach a peak of 500 K at the center of the spot, which is sufficient for magnetization reversal, but is not nearly high enough to melt or crystalize or in any other way modify the material’s structure. The materials of magneto-optical recording have strong perpendicular magnetic anisotropy. This type of anisotropy favors the “up” and “down” directions of magnetization over all other orientations. The disk is initialized in one of these two directions, say up, and the recording takes place when small regions are selectively reverse-magnetized by the thermomagnetic process. The resulting magnetization distribution then represents the pattern of recorded information. For instance, binary sequences may be represented by a mapping of zeros to up-magnetized regions and ones to down-magnetized regions (non-return to zero or NRZ). Alternatively, the NRZI scheme might be used, whereby transitions (up-to-down and down-to-up) are used to represent the ones in the bit-sequence. 1 Coercivity of a magnetic medium is a measure of its resistance to magnetization reversal. For example, consider a thin film with perpendicular magnetic moment saturated in the +Z direction. A magnetic field applied along –Z will succeed in reversing the direction of magnetization only if the field is stronger than the coercivity of the film. FIGURE 80.29 (a) Push-pull sensor for tracking on grooves. (b) Calculated distribution of light intensity at the detector plane when the disk is in focus and the beam is centered on track. (c) Calculated intensity distribution at the detector plane with disk in focus but the beam centered on the groove edge. (d) Same as (c) except for the spot being focused on the opposite edge of the groove. Net Signal = A - B A B ? 2000 by CRC Press LLC Recording by Laser Power Modulation (LPM) In this traditional approach to thermomagnetic recording, the electromagnet produces a constant field, while the information signal is used to modulate the power of the laser beam. As the disk rotates under the focused spot, the on/off laser pulses create a sequence of up/down domains along the track. The Lorentz electron micrograph in Fig. 80.30(b) shows a number of domains recorded by LPM. The domains are highly stable and may be read over and over again without significant degradation. If, however, the user decides to discard a recorded block and to use the space for new data, the LPM scheme does not allow direct overwrite; the system must erase the old data during one disk revolution cycle and record the new data in a subsequent revolution cycle. During erasure, the direction of the external field is reversed, so that up-magnetized domains in Fig. 80.30(a) now become the favored ones. Whereas writing is achieved with a modulated laser beam, in erasure the laser stays on for a relatively long period of time, erasing an entire sector. Selective erasure of individual domains is not practical, nor is it desired, since mass data storage systems generally deal with data at the level of blocks, which are recorded onto and read from individual sectors. Note that at least one revolution period elapses between the erasure of an old block and its replacement by a new block. The electromagnet therefore need not be capable of rapid switchings. (When the disk rotates at 3600 rpm, for example, there is a period of 16 ms or so between successive switchings.) This kind of slow reversal allows the magnet to be large enough to cover all the tracks simultaneously, thereby eliminating the need for a moving magnet and an actuator. It also affords a relatively large gap between the disk and the magnet, which enables the use of double-sided disks and relaxes the mechanical tolerances of the system without overburdening the magnet’s driver. The obvious disadvantage of LPM is its lack of direct overwrite capability. A more subtle concern is that it is perhaps unsuitable for the PWM (pulse width modulation) scheme of representing binary waveforms. Due to fluctuations in the laser power, spatial variations of material properties, lack of perfect focusing and track following, etc., the length of a recorded domain along the track may fluctuate in small but unpredictable ways. If the information is to be encoded in the distance between adjacent domain walls (i.e., PWM), then the LPM scheme of thermomagnetic writing may suffer from excessive domain-wall jitter. Laser power modulation works well, however, when the information is encoded in the position of domain centers (i.e., pulse position modu- lation or PPM). In general, PWM is superior to PPM in terms of the recording density, and, therefore, recording techniques that allow PWM are preferred. Recording by Magnetic Field Modulation Another method of thermomagnetic recording is based on magnetic field modulation (MFM) and is depicted schematically in Fig. 80.31(a). Here the laser power may be kept constant while the information signal is used to modulate the magnetic field. Photomicrographs of typical domain patterns recorded in the MFM scheme are shown in Fig. 80.31(b). Crescent-shaped domains are the hallmark of the field modulation technique. If one assumes (using a much simplified model) that the magnetization aligns itself with the applied field within FIGURE 80.30 Thermomagnetic recording process. (a) The field of the electromagnet helps reverse the direction of magnetization in the area heated by the focused laser beam. (b) Lorentz micrograph of domains written thermomagnetically. The various tracks shown here were written at different laser powers, with power level decreasing from top to bottom. (Source: F. Greidanus et al., Paper 26B-5, presented at the International Symposium on Optical Memory, Kobe, Japan, September 1989. With permission.) ? 2000 by CRC Press LLC a region whose temperature has passed a certain critical value, T crit , then one can explain the crescent shape of these domains in the following way: With the laser operating in the CW mode and the disk moving at constant velocity, temperature distribution in the magnetic medium assumes a steady-state profile, such as that shown in Fig. 80.31(c). Of course, relative to the laser beam, the temperature profile is stationary, but in the frame of reference of the disk the profile moves along the track with the linear track velocity. The isotherm corresponding to T crit is identified as such in the figure; within this isotherm the magnetization aligns itself with the applied field. Figure 80.31(d) shows a succession of critical isotherms along the track, each obtained at the particular instant of time when the magnetic field switches direction. From this picture it is easy to infer how the crescent-shaped domains form and also understand the relation between the waveform that controls the magnet and the resulting domain pattern. The advantages of magnetic field modulation recording are that (1) direct overwriting is possible and (2) domain- wall positions along the track, being rather insensitive to defocus and laser power fluctuations, are fairly accurately controlled by the timing of the magnetic field switchings. On the negative side, the magnet must now be small and fly close to the disk surface, if it is to produce rapidly switched fields with a magnitude of a hundred gauss or so. Systems that utilize magnetic field modulation often fly a small electromagnet on the opposite side of the disk from the optical stylus. Since mechanical tolerances are tight, this might compromise the removability of the disk. Moreover, the requirement of close proximity between the magnet and the storage medium dictates the use of single-sided disks in practice. FIGURE 80.31 (a) Thermomagnetic recording by magnetic field modulation. The power of the beam is kept constant, while the magnetic field direction is switched by the data signal. (b) Polarized-light microphotograph of recorded domains. (c) Computed isotherms produced by a CW laser beam, focused on the magnetic layer of a disk. The disk moves with constant velocity under the beam. The region inside the isotherm marked as T crit is above the critical temperature for writing, that is, its magnetization aligns with the direction of the applied field. (d) Magnetization within the heated region (above T crit ) follows the direction of the applied field, whose switchings occur at times t n . The resulting domains are crescent-shaped. ? 2000 by CRC Press LLC Magneto-Optical Readout The information recorded on a perpendicularly magnetized medium may be read with the aid of the polar magneto-optical Kerr effect. When linearly polarized light is normally incident on a perpendicular magnetic medium, its plane of polarization undergoes a slight rotation upon reflection. This rotation of the plane of polarization, whose sense depends on the direction of magnetization in the medium, is known as the polar Kerr effect. The schematic representation of this phenomenon in Fig. 80.32 shows that if the polarization vector suffers a counterclockwise rotation upon reflection from an up-magnetized region, then the same vector will rotate clockwise when the magnetization is down. A magneto-optical medium is characterized in terms of its reflectivity R and its Kerr rotation angle q k . R is a real number (between 0 and 1) that indicates the fraction of the incident power reflected back from the medium at normal incidence. q k is generally quoted as a positive number, but is understood to be positive or negative depending on the direction of magnetization; in MO readout, it is the sign of q k that carries the information about the state of magnetization, i.e., the recorded bit pattern. The laser used for readout is usually the same as that used for recording, but its output power level is substantially reduced in order to avoid erasing (or otherwise obliterating) the previously recorded information. For instance, if the power of the write/erase beam is 20 mW, then for the read operation the beam is attenuated to about 3 or 4 mW. The same objective lens that focuses the write beam is now used to focus the read beam, creating a diffraction-limited spot for resolving the recorded marks. Whereas in writing the laser was pulsed to selectively reverse-magnetize small regions along the track, in readout it operates with constant power, i.e., in CW mode. Both up- and down-magnetized regions are read as the track passes under the focused spot. The reflected beam, which is now polarization-modulated, goes back through the objective and becomes collimated once again; its information content is subsequently decoded by polarization-sensitive optics, and the scanned pattern of magnetization is reproduced as an electronic signal. Differential Detection Figure 80.33 shows the differential detection system that is the basis of magneto-optical readout in practically all erasable optical storage systems in use today. The beam splitter (BS) diverts half of the reflected beam away from the laser and into the detection module. 1 The polarizing beam splitter (PBS) splits the beam into two parts, each carrying the projection of the incident polarization along one axis of the PBS, as shown in Fig. 80.33(b). The component of polarization along one of the axes goes straight through, while the component 15 The use of an ordinary beam splitter is an inefficient way of separating the incoming and outgoing beams, since half the light is lost in each pass through the splitter. One can do much better by using a so-called “leaky” polarizing beam splitter. FIGURE 80.32 Schematic diagram describing the polar magneto-optical Kerr effect. Upon reflection from the surface of a perpendicularly magnetized medium, the polarization vector undergoes a rotation. The sense of rotation depends on the direction of magnetization, M, and switches sign when M is reversed. ? 2000 by CRC Press LLC along the other axis splits off and branches to the side. The PBS is oriented such that in the absence of the Kerr effect its two branches will receive equal amounts of light. In other words, if the polarization, upon reflection from the disk, did not undergo any rotations whatsoever, then the beam entering the PBS would be polarized at 45° to the PBS axes, in which case it would split equally between the two branches. Under this condition, the two detectors generate identical signals and the differential signal DS will be zero. Now, if the beam returns from the disk with its polarization rotated clockwise (rotation angle = q k ), then detector #1 will receive more light than detector #2, and the differential signal will be positive. Similarly, a counterclockwise rotation will generate a negative DS. Thus, as the disk rotates under the focused spot, the electronic signal DS reproduces the pattern of magnetization along the scanned track. Materials of Magneto-Optical Data Storage Amorphous rare earth transition metal alloys are presently the media of choice for erasable optical data storage applications. The general formula for the composition of the alloy may be written (Tb y Gd 1–y ) x (Fe z Co 1-z ) 1–x where terbium and gadolinium are the rare earth (RE) elements, while iron and cobalt are the transition metals FIGURE 80.33 Differential detection scheme utilizes a polarizing beam splitter and two photodetectors in order to convert the rotation of polarization to an electronic signal. E úú and E ^ are the reflected components of polarization; they are, respectively, parallel and perpendicular to the direction of incident polarization. The diagram in (b) shows the orientation of the PBS axes relative to the polarization vectors. ? 2000 by CRC Press LLC (TM). In practice, the transition metals constitute roughly 80 atomic percent of the alloy (i.e., x . 0.2). In the transition metal subnetwork, the fraction of cobalt is usually small, typically around 10%, and iron is the dominant element (z . 0.9). Similarly, in the rare earth subnetwork Tb is the main element (y . 0.9) while the gadolinium content is small or it may even be absent in some cases. Since the rare earth elements are highly reactive to oxygen, RE-TM films tend to have poor corrosion resistance and, therefore, require protective coatings. In multilayer disk structures, the dielectric layers that enable optimization of the medium for the best optical/thermal behavior also perform the crucial function of protecting the MO layer from the environment. The amorphous nature of the material allows its composition to be continuously varied until a number of desirable properties are achieved. In other words, the fractions x, y, z of the various elements are not constrained by the rules of stoichiometry. Disks with very large areas can be coated uniformly with thin films of these media, and, in contrast to polycrystalline films whose grains and grain boundaries scatter the beam and cause noise, amorphous films are continuous, smooth, and substantially free from noise. The films are deposited either by sputtering from an alloy target or by co-sputtering from multiple elemental targets. In the latter case, the substrate moves under the various targets and the fraction of a given element in the alloy is determined by the time spent under each target as well as the power applied to that target. During film deposition the substrate is kept at a low temperature (usually by chilled water) in order to reduce the mobility of deposited atoms and thus inhibit crystal growth. The type of the sputtering gas (argon, krypton, xenon, etc.) and its pressure during sputtering, the bias voltage applied to the substrate, deposition rate, nature of the substarte and its pretreatment, and temperature of the substrate all can have dramatic effects on the composition and short-range order of the deposited film. A comprehensive discussion of the factors that influence film properties will take us beyond the intended scope here; the interested reader may consult the vast literature of this field for further information. Defining Terms Automatic focusing: The process in which the distance of the disk from the objective’s focal plane is contin- uously monitored and fed back to the system in order to keep the disk in focus at all times. Automatic tracking: The process in which the distance of the focused spot from the track center is contin- uously monitored and the information fed back to the system in order to maintain the read/write beam on track at all times. Compact disk (CD): A plastic substrate embossed with a pattern of pits that encode audio signals in digital format. The disk is coated with a metallic layer (to enhance its reflectivity) and read in a drive (CD player) that employs a focused laser beam and monitors fluctuations of the reflected intensity in order to detect the pits. Error correction coding (ECC): Systematic addition of redundant bits to a block of binary data, as insurance against possible read/write errors. A given error-correcting code can recover the original data from a contaminated block, provided that the number of erroneous bits is less than the maximum number allowed by that particular code. Grooved media of optical storage: A disk embossed with grooves of either the concentric-ring type or the spiral type. If grooves are used as tracks, then the lands (i.e., regions between adjacent grooves) are the guard bands. Alternatively, lands may be used as tracks, in which case the grooves act as guard bands. In a typical grooved optical disk in use today the track width is 1.1 mm, the width of the guard band is 0.5 mm, and the groove depth is 70 nm. Magneto-optical Kerr effect: The rotation of the plane of polarization of a linearly polarized beam of light upon reflection from the surface of a perpendicularly magnetized medium. Objective lens: A well-corrected lens of high numerical aperture, similar to a microscope objective, used to focus the beam of light onto the surface of the storage medium. The objective also collects and recollimates the light reflected from the medium. Optical path: Optical elements in the path of the laser beam in an optical drive. The path begins at the laser itself and contains a collimating lens, beam shaping optics, beam splitters, polarization-sensitive elements, photodetectors, and an objective lens. Preformat: Information such as sector address, synchronization marks, servo marks, etc., embossed perma- nently on the optical disk substrate. ? 2000 by CRC Press LLC Sector: A small section of track with the capacity to store one block of user data (typical blocks are either 512 or 1024 bytes). The surface of the disk is covered with tracks, and tracks are divided into contiguous sectors. Thermomagnetic process: The process of recording and erasure in magneto-optical media, involving local heating of the medium by a focused laser beam, followed by the formation or annihilation of a reverse- magnetized domain. The successful completion of the process usually requires an external magnetic field to assist the reversal of the magnetization. Track: A narrow annulus or ring-like region on a disk surface, scanned by the read/write head during one revolution of the spindle; the data bits of magnetic and optical disks are stored sequentially along these tracks. The disk is covered either with concentric rings of densely packed circular tracks or with one continuous, fine-pitched spiral track. Related Topics 42.2 Optical Fibers and Cables ? 43.1 Introduction References A. B. Marchant, Optical Recording, Reading, Mass.: Addison-Wesley, 1990. P. Hansen and H. Heitman, “Media for erasable magneto-optic recording,” IEEE Trans. Mag., vol. 25, pp. 4390–4404, 1989. M. H. Kryder, “Data-storage technologies for advanced computing,” Scientific American, vol. 257, pp. 116–125, 1987. G. Bouwhuis, J. Braat, A. Huijser, J. Pasman, G. Van Rosmalen, and K. S. Immink, Principles of Optical Disk Systems, Bristol: Adam Hilger Ltd., 1985, chap. 2 and 3. Special issue of Applied Optics on video disks, July 1, 1978. E. Wolf, “Electromagnetic diffraction in optical systems. I. An integral representation of the image field,” Proc. R. Soc. Ser. A, vol. 253, pp. 349–357, 1959. M. Mansuripur, “Certain computational aspects of vector diffraction problems,” J. Opt. Soc. Am. A, vol. 6, pp. 786–806, 1989. D. O. Smith, “Magneto-optical scattering from multilayer magnetic and dielectric films,” Opt. Acta, vol. 12, p. 13, 1965. P. S. Pershan, “Magneto-optic effects,” J. Appl. Phys., vol. 38, pp. 1482–1490, 1967. K. Egashira and R. Yamada, “Kerr effect enhancement and improvement of readout characteristics in MnBi film memory,” J. Appl. Phys., vol. 45, pp. 3643–3648, 1974. H. S. Carslaw and J. C. Jaeger, Conduction of Heat in Solids, London: Oxford University Press, 1954. P. Kivits, R. deBont, and P. Zalm, “Superheating of thin films for optical recording,” Appl. Phys., vol. 24, pp. 273–278, 1981. M. Mansuripur, G. A. N. Connell, and J. W. Goodman, “Laser-induced local heating of multilayers,” Appl. Opt., vol. 21, p. 1106, 1982. J. Heemskerk, “Noise in a video disk system: experiments with an (AlGa)As laser,” Appl. Opt., vol. 17, p. 2007, 1978. A. Arimoto, M. Ojima, N. Chinone, A. Oishi, T. Gotoh, and N. Ohnuki, “Optimum conditions for the high frequency noise reduction method in optical video disk players,” Appl. Opt., vol. 25, p. 1398, 1986. M. Mansuripur, G. A. N. Connell, and J. W. Goodman, “Signal and noise in magneto-optical readout,” J. Appl. Phys., vol. 53, p. 4485, 1982. J. W. Beck, “Noise considerations of optical beam recording,” Appl. Opt., vol. 9, p. 2559, 1970. S. Chikazumi and S. H. Charap, Physics of Magnetism, New York: John Wiley, 1964. B. G. Huth, “Calculation of stable domain radii produced by thermomagnetic writing,” IBM J. Res. Dev., pp. 100–109, 1974. A. P. Malozemoff and J. C. Slonczewski, Magnetic Domain Walls in Bubble Materials, New York: Academic Press, 1979. ? 2000 by CRC Press LLC A. M. Patel, “Signal and error-control coding,” in Magnetic Recording, vol. II, C. D. Mee and E. D. Daniel, Eds. New York: McGraw-Hill, 1988. K. A. S. Immink, “Coding methods for high-density optical recording,” Philips J. Res., vol. 41, pp. 410–430, 1986. L. I. Maissel and R. Glang, Eds., Handbook of Thin Film Technology, New York: McGraw-Hill, 1970. G. L. Weissler and R. W. Carlson, Eds., Vacuum Physics and Technology, vol. 14 of Methods of Experimental Physics, New York: Academic Press, 1979. T. Suzuki, “Magneto-optic recording materials,” Mater. Res. Soc. Bull., pp. 42-47, Sept. 1996. K. G. Ashar, Magnetic Disk Drive Technology, New York: IEEE Press, 1997. Further Information Proceedings of the Optical Data Storage Conference are published annually by SPIE, the International Society for Optical Engineering. These proceedings document the latest developments in the field of optical recording each year. Two other conferences in this field are the International Symposium on Optical Memory (ISOM), whose proceedings are published as a special issue of the Japanese Journal of Applied Physics, and the Magneto- Optical Recording International Symposium (MORIS), whose proceedings appear in a special issue of the Journal of the Magnetics Society of Japan. ? 2000 by CRC Press LLC