Ramakumar, R. “Reliability Engineering” The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000 110 Reliability Engineering 1 110.1 Introduction 110.2 Catastrophic Failure Models 110.3 The Bathtub Curve 110.4 Mean Time To Failure (MTTF) 110.5 Average Failure Rate 110.6 A Posteriori Failure Probability 110.7 Units for Failure Rates 110.8 Application of the Binomial Distribution 110.9 Application of the Poisson Distribution 110.10 The Exponential Distribution 110.11 The Weibull Distribution 110.12 Combinatorial Aspects 110.13 Modeling Maintenance 110.14 Markov Models 110.15 Binary Model for a Repairable Component 110.16 Two Dissimilar Repairable Components 110.17 Two Identical Repairable Components 110.18 Frequency and Duration Techniques 110.19 Applications of Markov Process 110.20 Some Useful Approximations 110.21 Application Aspects 110.22 Reliability and Economics 110.1 Introduction Reliability engineering is a vast field and it has grown significantly during the past five decades (since World War II). The two major approaches to reliability assessment and prediction are (1) traditional methods based on probabilistic assessment of field data and (2) methods based on the analysis of failure mechanisms and physics of failure. The latter is more accurate, but is difficult and time consuming to implement. The first one, in spite of its many flaws, continues to be in use. Some of the many areas encompassing reliability engineering are reliability allocation and optimization, reliability growth and modeling, reliability testing including accel- erated testing, data analysis and graphical techniques, quality control and acceptance sampling, maintenance engineering, repairable system modeling and analysis, software reliability, system safety analysis, Bayesian analysis, reliability management, simulation and Monte Carlo techniques, Failure Modes, Effects and Criticality Analysis (FMECA), and economic aspects of reliability, to mention a few. Application of reliability techniques is gaining importance in all branches of engineering because of its effectiveness in the detection, prevention, and correction of failures in the design, manufacturing, and operational 1 Some of the material in this chapter was previously published by CRC Press in The Engineering Handbook, R. C. Dorf, Ed., 1996. R. Ramakumar Oklahoma State University ? 2000 by CRC Press LLC ? 2000 by CRC Press LLC I NFORMATION M ANAGEMENT S YSTEM FOR M ANUFACTURING E FFICIENCY t current schedules, each of NASA’s four Space Shuttle Orbiters must fly two or three times a year. Preparing an orbiter for its next mission is an incredibly complex process and much of the work is accomplished in the Orbiter Processing Facility (OPF) at Kennedy Space Center. The average “flow” — the complete cycle of refurbishing an orbiter — requires the integration of approximately 10,000 work events, takes 65 days, and some 40,000 technician labor hours. Under the best conditions, scheduling each of the 10,000 work events in a single flow would be a task of monumental proportions. But the job is further complicated by the fact that only half the work is standard and predictable; the other half is composed of problem-generated tasks and jobs specific to the next mission, which creates a highly dynamic processing environment and requires frequent rescheduling. For all the difficulties, Kennedy Space Center and its prime contractor for shuttle processing — Lockheed Space Operations Company (LSOC) — are doing an outstanding job of managing OPF oper- ations with the help of a number of processing innovations in recent years. One of the most important is the Ground Processing Scheduling System, or GPSS. The GPSS is a software system for enhancing efficiency by providing an automated scheduling tool that predicts conflicts between scheduled tasks, helps human schedulers resolve those conflicts, and searches for near-optimal schedules. GPSS is a cooperative development of Ames Research Center, Kennedy Space Center, LSOC, and a related company, Lockheed Missiles and Space Company. It originated at Ames, where a group of computer scientists conducted basic research on the use of artificial intelligence techniques to automate the scheduling process. A product of the work was a software system for complex, multifaceted operations known as the Gerry scheduling engine. Kennedy Space Center brought Ames and Lockheed together and the group formed an inter-center/NASA contractor partnership to transfer the technology of the Gerry scheduling engine to the Space Shuttle program. The transfer was successfully accomplished and GPSS has become the accepted general purpose scheduling tool for OPF operations. (Courtesy of National Aeronautics and Space Administration.) Kennedy Space Center technicians are preparing a Space Shuttle Orbiter for its next mission, an intricate task that requires scheduling 10,000 separate events over 65 days. A NASA-developed computer program automated this extremely complex scheduling job. (Photo courtesy of National Aeronautics and Space Administration.) A phases of products and systems. Increasing emphasis being placed on quality of components and systems, coupled with pressures to minimize cost and increase value, further emphasize the need to study, understand, quantify, and predict reliability and arrive at innovative designs and operational and maintenance procedures. From the electrical engineering point of view, two (among several) areas that have received significant attention are electronic equipment (including computer hardware) and electric power systems. Other major areas include communication systems and software engineering. As the complexity of electronic equipment grew during and after World War II and as the consequences of failures in the field became more and more apparent, the U.S. military became seriously involved, promoted the formation of groups, and became instru- mental in the development of the earliest handbooks and specifications. The great northeast blackout in the U.S. in November 1965 triggered the serious application of reliability concepts in the power systems area. The objectives of this chapter are to introduce the reader to the fundamentals and applications of classical reliability concepts and bring out the important benefits of reliability considerations. Brief summaries of application aspects of reliability for electronic systems and power systems are also included. 110.2 Catastrophic Failure Models Catastrophic failure refers to the case in which repair of the component is either not possible or available or of no value to the successful completion of the mission originally planned. Modeling such failures is typically based on life test results. We can consider the “lifetime” or “time to failure” T as a continuous random variable. Then, (110.1) where R(t) is the reliability function. Obviously, as t ? ¥, R(t) ? 0 since the probability of failure increases with time of operation. Moreover, (110.2) where Q(t) is the unreliability function. From the definition of the distribution function of a continuous random variable, it is clear that Q(t) is indeed the distribution function for T. Therefore, the failure density function f(t) can be obtained as (110.3) The hazard rate function l(t) is defined as (110.4) It can be shown that (110.5) The four functions, f(t), Q(t), R(t), and l(t) constitute the set of functions used in basic reliability analysis. The relationships between these functions are given in Table 110.1. PPTtRtsurvival up to time t ( ) => ( ) o () PPTtQtfailure at t ( ) =£ ( ) o () ft d dt Qt () = () lt t t tt t () o []? + D D D 0 1 lim (, ), given survivaluptot probability of failure in lt ft Rt () = () () ? 2000 by CRC Press LLC 110.3 The Bathtub Curve Of the four functions discussed, the hazard rate function l(t) displays the different stages during the lifetime of a component most clearly. In fact, typical l(t) plots have the general shape of a bathtub as shown in Fig. 110.1. The first region corresponds to wearin (infant mortality) or early failures during debugging. The hazard rate goes down as debugging continues. The second region corresponds to an essentially constant and low failure rate and failures can be considered to be nearly random. This is the useful lifetime of the component. The third region corresponds to wearout or fatigue phase with a sharply increasing hazard rate. “Burn-in” refers to the practice of subjecting components to an initial operating period of t 1 (see Fig. 110.1) before delivering them to the customer. This eliminates all the initial failures from occurring after delivery to customers requiring high-reliability components. Moreover, it is prudent to replace a component as it approaches the wearout region, i.e., after an operating period of (t 2 -t 1 ). Electronic components tend to have a long useful life (constant hazard) period. Wearout region tends to dominate in the case of mechanical components. TABLE 110.1 Relationships Between Different Reliability Functions f (t) l(t) Q(t) R(t) f (t) = f (t) l(t) Q(t)1 – R(t) 1 – Q(t) R(t) FIGURE 110.1 Bathtub-shaped hazard function llxx()exp ()td t - é ? ê ù ? ú ò 0 d dt Qt() - d dt Rt() l xx () () () t ft fd t = - ò 1 0 1 1-Qt d dt Qt () (()) - d dt Rt[ln()] Qt fd t () ()= ò xx 0 1 0 -- é ? ê ù ? ú ò exp ()lxxd t Rt fd t () ()=- ò 1 0 xx exp ()- é ? ê ù ? ú ò lxxd t 0 l (t) t 1 t 2 t I II III ? 2000 by CRC Press LLC 110.4 Mean Time To Failure (MTTF) The mean or expected value of the continuous random variable “time-to-failure” is the MTTF. This is a very useful parameter and is often enough to assess the suitability of components. It can be obtained using either the failure density function f(t) or the reliability function R(t) as follows: (110.6) In the case of repairable components, the repair time can also be considered as a continuous random variable with an expected value of MTTR. The mean time between failures, MTBF, is the sum of MTTF and MTTR. Since for well-designed components MTTR<<MTTF, MTBF and MTTF are often used interchangably. 110.5 Average Failure Rate The average failure rate over the time interval 0 to T is defined as (110.7) 110.6A Posteriori Failure Probability When components are subjected to a burn-in (or wearin) period of duration T, and if the component survives during (0, T), the probability of failure during (T, T+t) is called the a posteriori failure probability Q c (t). It can be found using (110.8) The probability of survival during (T, T+t) is (110.9) 110.7 Units for Failure Rates Several units are used to express failure rates. In addition to l(t) which is usually in number per hour, %/K is used to denote failure rate in percent per thousand hours and PPM/K is used to express failure rate in parts per million per thousand hours. The last unit is also known as FIT for “fails in time”. The relationships between these units are given in Table 110.2. MTTF tftdt Rtdt= () () ¥¥ òò 00 or AFR T AFRT RT T 0, ln ( ) o () =- () Qt fd fd c T Tt T () = () () + ¥ ò ò xx xx RtT Qt fd fd RT t RT d c Tt T T Tt ( ) =- () = () () = + ( ) () =- () é ? ê ù ? ú + ¥ ¥ + ò ò ò 1 xx xx lxxexp ? 2000 by CRC Press LLC 110.8 Application of the Binomial Distribution In an experiment consisting of n identical independent trials, with each trial resulting in success or failure with probabilities of p and q, the probability P r of r successes and (n-r) failures is (110.10) If X denotes the number of successes in n trials, then it is a discrete random variable with a mean value of (np) and a variance of (npq). In a system consisting of a collection of n identical components with a probability p that a component is defective, the probability of finding r defects out of n is given by the P r in Eq. (110.10). If p is the probability of success of one component and if at least r of them must be good for system success, then the system reliability (probability of system success) is given by (110.11) For systems with redundancy, r < n. 110.9 Application of the Poisson Distribution For events that occur “in-time” at an average rate of l occurrences per unit of time, the probability P x (t) of exactly x occurences during the time interval (0 , t) is given by (110.12) The number of occurrences X in (0, t) is a discrete random variable with a mean value m of (lt) and a standard deviation s of . By setting X = 0 in Eq. (110.12), we obtain the probability of no occurrence in (0,t) as e –lt . If the event is failure, then no occurrence means success and e –lt is the probability of success or system reliability. This is the well-known and often-used exponential distribution, also known as the constant-hazard model. 110.10 The Exponential Distribution A constant hazard rate (constant l) corresponding to the useful lifetime of components leads to the single- parameter exponential distribution. The functions of interest associated with a constant l are: (110.13) TABLE 110.2Relationships Between Different Failure Rate Units l(#/hr) %K PPM/K (FIT) l = l 10 –5 (%/K) 10 –9 (PPM/K) %/K = 10 5 l %/K 10 –4 (PPM/K) PPM/K (FIT) = 10 9 l 10 4 (%/K) PPM/K PCpp rnr r nr =- ( ) - 1 RCpp nk k nk kr n =- ( ) - = ? 1 Pt te x x x t () = () - l l ! lt ft e t t () => - l l ,0 ? 2000 by CRC Press LLC (110.14) (110.15) The a posteriori failure probability Q c (t) is independent of the prior operating time T, indicating that the component does not degrade no matter how long it operates. Obviously, such a scenario is valid only during the useful lifetime (horizontal portion of the bathtub curve) of the component. The mean and standard deviation of the random variable “lifetime” are (110.16) 110.11 The Weibull Distribution The Weibull distribution has two parameters, a scale parameter a and a shape parameter b. By adjusting these two parameters, a wide range of experimental data can be modeled in system reliability studies. The associated functions are (110.17) (110.18) (110.19) With b = 1, the Weibull distribution reduces to the constant hazard model with l = (1/a). With b = 2, the Weibull distribution reduces to the Rayleigh distribution. The associated MTTF is (110.20) where G denotes the gamma function. 110.12 Combinatorial Aspects Analysis of complex systems is facilitated by decomposition into functional entities consisting of subsystems or units and by the application of combinatorial considerations and network modeling techniques. A series or chain structure consisting of n units is shown in Fig. 110.2. From the reliability point of view, the system will succeed only if all the units succeed. The units may or may not be physically in series. If R i is the probability of success of the ith unit, then the series system reliability R s is given as (110.21) Rt e t () = -l Qt Qt e c t () = () =- - 1 l m l s l o= =MTTF 11 and l b a ab b b t t t () =>>3 -1 000;,, ft tt () =- ? è ? ? ? ÷ é ? ê ê ù ? ú ú - b a a b b b 1 exp Rt t () =- ? è ? ? ? ÷ é ? ê ê ù ? ú ú exp a b MTTF== + ? è ? ? ? ÷ ma b G1 1 RR si i n = = ? 1 ? 2000 by CRC Press LLC if the units do not interact with each other. If they do, then the conditional probabilities must be carefully evaluated. If each of the units has a constant hazard, then (110.22) where l i is the constant failure rate for the ith unit or component. This enables us to replace the n components in series by an equivalent component with a constant hazard l s where (110.23) If the components are identical, then l s = nl and the MTTF for the equivalent component is (1/n) of the MTTF of one component. A parallel structure consisting of n units is shown in Fig. 110.3. From the reliability point of view, the system will succeed if any one of the n units succeeds. Once again, the units may or may not be physically or topologically in parallel. If Q i is the probability of failure of the ith unit, then the parallel system reliability R p is given as (110.24) if the units do not interact with each other (meaning independent). FIGURE 110.2Series or chain structure. FIGURE 110.3Parallel structure. Cause EffectUnit 1 Unit 2 Unit n Unit 1 Cause Effect Unit 2 Unit n Rt t si i n () =- ( ) = ? exp l 1 ll si i n = = ? 1 RQ pi i n =- = ? 1 1 ? 2000 by CRC Press LLC If each of the units has a constant hazard, then (110.25) and we do not have the luxury of being able to replace the parallel system by an equivalent component with a constant hazard. The parallel system does not exhibit constant-hazard behavior even though each of the units has constant-hazard. The MTTF of the parallel system can be obtained by using Eq. (110.25) in (110.6). The results for the case of components with identical hazards l are: (1.5/l), (1.833/l), and (2.083/l) for n = 2, 3, and 4 respectively. The largest gain in MTTF is obtained by going from one component to two components in parallel. It is uncommon to have more than two or three components in a truly parallel configuration because of the cost involved. For two non-identical components in parallel with hazard rates of l 1 and l 2 , the MTTF is given as (110.26) An r-out-of-n structure, also known as a partially redundant system, can be evaluated using Eq. (110.11). If all the components are identical, independent, and have a constant hazard of l, then the system reliability can be expressed as (110.27) For r = 1, the structure becomes a parallel system and for r = n, it becomes a series system. Series-parallel systems are evaluated by repeated application of the expressions derived for series and parallel configurations by employing the well-known network reduction techniques. Several general techniques are available for evaluating the reliability of complex structures that do not come under purely series or parallel or series parallel. They range from inspection to cutset and tieset methods and connection matrix techniques that are amenable for computer programming. 110.13 Modeling Maintenance Maintenance of a component could be a scheduled (or preventive) one or a forced (corrective) one. The latter follows in-service failures and can be handled using Markov models discussed later. Scheduled maintenance is conducted at fixed intervals of time, irrespective of the system continuing to operate satisfactorily. Scheduled maintenance, under ideal conditions, takes very little time (compared to the time between main- tenances) and the component is restored to an “as new” condition. Even if the component is not repairable, scheduled maintenance postpones failure and prolongs the life of the component. Scheduled maintenance makes sense only for those components with increasing hazard rates. Most mechanical systems come under this category. It can be shown that the density function f T *(t) with scheduled maintenance included can be expressed as (110.28) Rt t pi i n ( ) =- - - ( ) [] = ? 11 1 exp l MTTF =+- + 11 1 1212 ll ll Rt C e e nk kt t nk kr n ( ) =- ( ) -- - = ? ll 1 ft ft kT RT TM k M k * ( ) =- ( ) ( ) = ¥ ? 1 0 ? 2000 by CRC Press LLC (110.29) R(t) = component reliability function T M = time between maintenances, constant and f T (t) = original failure density function In Eq. (110.28), k = 0 is used only between t = 0 and t = T M ; k = 1 is used only between t = T M and t = 2T M and so on. A typical f T *(t)is shown in Fig. 110.4. The time scale is divided into equal intervals of T M each. The function in each segment is a scaled-down version of the one in the previous segment, the scaling factor being equal to R(T M ). Irrespective of the nature of the original failure density function, scheduled maintenance gives it an exponential tendency. This is another justification for the widespread use of exponential distribution in system reliability evaluations. 110.14 Markov Models Of the different Markov models available, the discrete-state continuous-time Markov process has found many applications in system reliability evaluation, including the modeling of repairable systems. The model consists of a set of discrete states, called the state space, in which the system can reside and a set of transition rates between appropriate states. Using these, a set of first order differential equations are derived in the standard vector-matrix form for the time-dependent probabilities of the various states. Solution of these equations incorporating proper initial conditions gives the probabilities of the system residing in different states as functions of time. Several useful results can be gleaned from these functions. 110.15 Binary Model for a Repairable Component The binary model for a repairable component assumes that the component can exist in one of two states—the UP state or the DOWN state. The transition rates between these two states, S 0 and S 1 , are assumed to be constant FIGURE 110.4Density function with ideal scheduled maintenance incorporated. T m ? T (t) 2T m 3T m 4T m 5T m * ? 1 (t) = ? T (t) ? 1 (t - 2T M ) R 2 (T M ) ? 1 (t - 3T M ) R 3 (T M ) ? 1 (t - 4T M ) R 4 (T M ) ? 1 (t - T M ) R(T M ) t where for otherwise ft ft tT TM 1 0 0 () = () <£ ì í ? ? 2000 by CRC Press LLC and equal to l and m. These transition rates are the constant failure and repair rates implied in the modeling process and their reciprocals are the MTTF and MTTR, respectively. Figure 110.5 illustrates the binary model. The associated Markov differential equations are (110.30) with the initial conditions (110.31) The coefficient matrix of Markov differential equations, namely is obtained by transposing the matrix of rates of departures and replacing the diagonal entries by the negative of the sum of all the other entries in their respective columns. Solution of (110.30) with initial conditions as given by (110.31) yields: (110.32) (110.33) The limiting, or steady-state, probabilities are found by letting t ? ¥. They are also known as limiting availability A and limiting unavailability U and they are (110.34) The time-dependent A(t) and U(t) are simply P 0 (t), and P 1 (t) respectively. Referring back to Eq. (110.14) for a constant hazard component and comparing it with Eq. (110.32) which incorporates repair, the difference between R(t) and A(t) becomes obvious. Availability A(t) is the probability FIGURE 110.5State space diagram for a single reparable component. S 0 Component Up S 1 Component Down l ¢ () ¢ () é ? ê ê ù ? ú ú = -m -m é ? ê ù ? ú () () é ? ê ê ù ? ú ú Pt Pt Pt Pt 0 1 0 1 l l P P 0 1 0 0 1 0 () () é ? ê ê ù ? ú ú = é ? ê ù ? ú -m -m é ? ê ù ? ú l l 0 0 l m é ? ê ù ? ú Pt e t 0 () = + + + -+()m lm l lm lm Pt e t 1 1 () = + - [] -+()l lm lm PAPU 01 o m + o= + o lm l lm and ? 2000 by CRC Press LLC that the component is up at time t and reliability R(t) is the probability that the system has continuously operated from 0 to t. Thus, R(t) is much more stringent than A(t). While both R(0) and A(0) are unity, R(t) drops off rapidly as compared to A(t) as time progresses. With a small value of MTTR (or large value of m), it is possible to realize a very high availability for a repairable component. 110.16 Two Dissimilar Repairable Components Irrespective of whether the two components are in series or in parallel, the state space consists of four possible states. They are: S 1 (1 up, 2 up), S 2 (1 down, 2 up), S 3 (1 up, 2 down) , and S 4 (1 down, 2 down). The actual system configuration will determine which of these four states corresponds to system success and failure. The associated state-space diagram is shown in Fig. 110.6. Analysis of this system results in the following steady- state probabilities: (110.35) (110.36) For components in series, A = P 1 and U = (P 2 + P 3 + P 4 ) and the two components can be replaced by an equivalent component with a failure rate of l s = (l 1 + l 2 ) and a mean repair duration of r s where (110.37) Extending this to n components in series, the equivalent system will have (110.38) (110.39) FIGURE 110.6State space diagram for two dissimilar reparable components. S 1 UU S 2 DU S 3 UD S 4 DD m 1 m 2 l 2 m 2 l 2 l 1 m 1 l 1 PPPP 1 12 2 12 3 21 4 12 ==== mm lm lm ll ; ; ; Denom Denom Denom Denom where Denom o+ ( ) + ( ) lmlm 1122 r rr s s @ +ll l 11 22 ll l l si i n s s i i n i rr=@ == ?? 11 1 and and system unavailability = Ur r sss i i n @= = ? ll 1 ? 2000 by CRC Press LLC For components in parallel, A = (P 1 + P 2 + P 3 ) and U = P 4 and the two components can be replaced by an equivalent component with (110.40) (110.41) Extension to more than two components in parallel follows similar lines. For three components in parallel, (110.42) 110.17 Two Identical Repairable Components In this case, only three states are needed to complete the state space. They are: S 1 :Both UP; S 2 : One UP and One DOWN; and S 3 :Both DOWN. The corresponding state space diagram is shown in Fig. 110.7. Analysis of this system results in the following steady-state probabilities: (110.43) 110.18 Frequency and Duration Techniques The expected residence time in a state is the mean value of the passage time from the state in question to any other state. Cycle time is the time required to complete an “in” and “not-in” cycle for that state. Frequency of occurrence (or encounter) for a state is the reciprocal of its cycle time. It can be shown that the frequency of occurrence of a state is equal to the steady-state probability of being in that state times the total rate of departure from it. Also, the expected value of the residence time is equal to the reciprocal of the total rate of departure from that state. Under steady-state conditions, the expected frequency of entering a state must be equal to the expected frequency of leaving that state (this assumes that the system is “ergodic”, which will not be elaborated for lack FIGURE 110.7State space diagram for two identical reparable components. S 2 One Up, One Down S 1 Both Up S 3 Both Down 2m m l 2l lllll mmm pp rr@ ( ) + ( ) =+ 121 212 1 2 and and system unavailability== ( ) U ppp lm1 mmmm lll pp Urrr=++ ( ) = 123 123123 and PP P 1 2 2 2 3 2 2 = + ? è ? ? ? ÷ = + ? è ? ? ? ÷ = + ? è ? ? ? ÷ m lm l m m lm l lm ;; ? 2000 by CRC Press LLC of space). Using this principle, frequency balance equations can be easily written (one for each state) and solved in conjunction with the fact that the sum of the steady-state proabilities of all the states must be equal to unity to obtain the steady state probabilities. This procedure is much simpler than solving the Markov differential equations and letting t ? ¥. 110.19 Applications of Markov Process Once the different states are identified and a state-space diagram is developed, Markov analysis can proceed systematically (probably with the help of a computer in the case of large systems) to yield a wealth of results useful in system reliability evaluation. Inclusion of installation time after repair, maintenance, spare, standby systems, and limitations imposed by restricted repair facilities are some of the many problems that can be studied. 110.20 Some Useful Approximations 1. For an r-out-of-n structure with failure and repair rates of l and m for each, the equivalent MTTR and MTTF can be approximated as (110.44) (110.45) 2. Influence of weather must be considered for components operating in an outdoor environment. If l and l¢ are the normal weather and stormy weather failure rates, l¢ will be much greater than l and the average failure rate l f can be approximated as (110.46) where N and S are the expected durations of normal and stormy weather. 3. For well-designed high-reliability components, the failure rate l will be very small and lt << 1. Then, for a single component, (110.47) and for n dissimilar components in series, (110.48) For the case of n identical components in parallel, (110.49) MTTR MTTR of one component nr eq = -+1 MTTF MTTF of one component MTTF MTTR nrr n eq nr = ? è ? ? ? ? ? ÷ ÷ ÷ ? è ? ? ? ÷ - ( ) - ( ) ì í ? ? ? ü y ? t ? - !! ! 1 lll f N NS S NS @ + ? è ? ? ? ÷ + + ? è ? ? ? ÷ ¢ Rt t Qt t ( ) @- ( ) @1 ll and Rt t Qt t i i n i i n ( ) @- ( ) @ == ?? 1 11 ll and Rt t Qt t n ( ) @- ( ) ( ) @ ( ) 1 ll n and ? 2000 by CRC Press LLC For the case of an r-out-of-n configuration, (110.50) The approximations detailed in (3) are called rare-event approximations. 110.21 Application Aspects Electronic systems utilize large numbers of similar components over which the designer has very little control. Quality control methods can be used in the procurement and manufacturing phases. However, the circuit designer has no control over the design reliability of the devices except in cases such as custom-designed integrated circuits. In addition, electronic components cannot be inspected easily because of encapsulation. Although gross defects can be easily detected by suitable testing processes, defects that are not immediately effective (for example, weak mechanical bond of a lead-in conductor, material flaws in semiconductors, defective sealing, etc.) primarily contribute to unreliability. Temperature and voltage are the predominant failure-accel- erating stresses for the majority of electronic components. As weaker components fail and are replaced by better ones, the percentage of defects in a population is reduced, resulting in a decreasing hazard rate. Wearout is rarely of significance in the failure of electronic components and systems. The designer should be careful to ensure that the loads (voltage, current, temperature) are within rated values and strive for a design that minimizes hot spots and temperature rises. Parameter drifts and accidental short circuits at connections can also lead to system failures. The circuit designer can follow a few basic rules to significantly improve electronic system reliability: reduce the number of adjustable components; avoid selection of components on the basis of parameter values obtained by testing; assemble components such that adjustments are easily accessible; and partition circuits into subassemblies for easy testing and diagnosis of problems. Power systems are expected to provide all customers a reliable supply of electric power upon which much of modern life depends. Power systems are also very large, consisting of scores of large generators, hundreds of miles of high-voltage transmission lines, thousands of miles of distribution lines, along with the necessary transformers, switchgear, and substations interconnecting them. Reliability at the customer level can be improved by additional investment; the challenge is to balance reliability and the associated investment cost against the cost of energy charged to customers. This should be done in the presence of a number of random inputs and events: generator outages, line outages (which are highly weather dependent), random component outages, and uncertainties in the load demand (which is also weather dependent). Probabilistic techniques to evaluate power system reliability have been used effectively to resolve this problem satisfactorily. The system is divided into a number of subsystems and each one is analyzed separately. Then, composite system reliability evaluation techniques are employed to combine the results and arrive at a number of quantifiable reliability indices as inputs to managerial decisions. The major subsystems involved are generation, transmission, distri- bution, substations, and protection systems. Care should be taken to ensure that reliabilities of different parts of the system conform to each other and that no part of the system is unusually strong or weak. Obviously, different levels of reliability will be required for different parts of the system depending on the impacts of failures at different points on the interconnected power system. 110.22 Reliability and Economics Reliability and economics are very closely related. Issues such as the level of reliability required, the amount of additional expenditures justified, where to invest the additional resources to maximize reliability, how to achieve a certain level of overall reliability at minimum cost, and how to assess the cost of failures and monetary equivalent of non-monetary items are all quite complex and not purely technical. However, once managerial decisions are made and reliability goals are set, certain well-proven techniques such as incorporating redun- dancy, improving maintenance procedures, selecting better quality components, etc. can be employed by the designer to achieve the goals. Qt C t n nr nr ( ) @ ( ) -+() -+ 1 1 l ? 2000 by CRC Press LLC Defining Terms Availability: The availability A(t) is the probability that a system is performing its required function success- fully at time t. The steady-state availability A is the fraction of time that an item, system, or component is able to perfom its specified or required function. Bathtub curve: For most physical components and living entities, the plot of failure (or hazard) rate vs. time has the shape of the longitudinal cross-section of a bathtub and hence its name. Hazard rate function: The plot of instantaneous failure rate vs. time is called the hazard function. It cleary and distinctly exhibits the different life cycles of the component. MTTF: The mean time to failure is the mean or expected value of “time to failure”. Parallel structure: Also known as a completely redundant system, it describes a system that can succeed when at least one of two or more components succeeds. Redundancy: Refers to the existence of more than one means, identical or otherwise, for accomplishing a task or mission. Reliability: The reliability R(t) of an item or system is the probability that it has performed successfully over the time interval from 0 to t. In the case of non-repairable systems, R(t) = A(t). With repair, R(t) £ A(t). Series structure: Also known as a chain structure or non-redundant system, it describes a system whose success depends on the success of all of its components. Related Topics 23.2 Testing ? 98.5 Mean Time to Failure ? 98.10 Markov Modeling ? 98.12 Reliability Calculations for Real Time Systems References R. Billinton and R.N. Allan, Reliability Evaluation of Engineering Systems: Concepts and Techniques, 2nd ed., New York: Plenum, 1992. E.E. Lewis, Introduction to Reliability Engineering, New York: John Wiley & Sons, 1987. M.G. Pecht and F.R. Nash, “Predicting the reliability of electronic equipment”, Proc. IEEE, 82(7), 992–1004, 1994. R. Ramakumar, Engineering Reliability: Fundamentals and Applications, Englewood Cliffs, N.J.: Prentice–Hall, 1993. M.L. Shooman, Probabilistic Reliability: An Engineering Approach, 2nd ed., Malabar, Fla.: R.E. Krieger Pub- lishing Company, 1990. For Further Information R. Billinton and R.N. Allan, Reliability Evaluation of Power Systems, London, England: Pitman Advanced Publishing Program, 1984. A.E. Green and A.J. Bourne, Reliability Technology, New York: Wiley-Interscience, 1972. E.J. Henley and H. Kumamoto, Probabilistic Risk Assessment—Reliability Engineering, Design, and Analysis, New York: IEEE Press, 1991. IEEE Transactions on Reliability, New York: Institute of Electrical and Electronics Engineers. P.D.T. O’Connor, Practical Reliability Engineering, 3rd ed. New York: John Wiley & Sons, 1985. Proceedings: Annual Reliability and Maintainability Symposium, New York: Institute of Electrical and Electronics Engineers. D.P. Siewiorek, and R.S. Swarz, The Theory and Practice of Reliabile System Design, Bedford, Mass.: Digital Press, 1982. K.S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications, Englewood Cliffs, N.J.: Prentice-Hall, 1982. A. Villemeur, Reliability, Availability, Maintainability and Safety Assessment, Volumes I and II, New York: John Wiley & Sons, 1992. ? 2000 by CRC Press LLC