. a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12 c System and Software Safety Nancy G. Leveson MIT Aero/Astro Dept. Safeware Engineering Corp. Copyright by the author, June 2001. All rights reserved. Copying without fee is permitted provided that the copies are not made or distributed for direct commercial advantage and provided that credit to the source is given. Abstracting with credit is permitted. a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a10c The Problem The first step in solving any problem is to understand it. We often propose solutions to problems that we do not understand and then are surprised when the solutions fail to have the anticipated effect. c Accident with No Component Failures a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a24 c a14a16a15a2a1a9a17a19a18 a6a2a20a2a21 a1a2a22 LC COMPUTER WATER COOLING CONDENSER VENT REFLUX REACTOR VAPOR LA CATALYST GEARBOX Types of Accidents Component Failure Accidents Single or multiple component failures Usually assume random failure System Accidents Arise in interactions among components No components may have "failed" c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a23 a14a16a15a2a1a9a17a19a18 a6a2a20a2a21 a1a2a22 Caused by interactive complexity and tight coupling Exacerbated by the introduction of computers. . . c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a36 a14a16a15a2a1a9a17a19a18 a6a2a20a2a21 a1a2a22 Interactive Complexity Complexity is a moving target The underlying factor is intellectual manageability 1. A "simple" system has a small number of unknowns in its interactions within the system and with its environment. 2. A system is intellectually unmanageable when the level of interactions reaches the point where they cannot be thoroughly planned understood anticipated guarded against 3. Introducing new technology introduces unknowns and even "unk?unks." c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a37 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 Computers and Risk We seem not to trust one another as much as would be desirable. In lieu of trusting each other, are we putting too much trust in our technology? . . . Perhaps we are not educating our children sufficiently well to understand the reasonable uses and limits of technology. Thomas B. Sheridan c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a41 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a38a33 a5a4a35 The Computer Revolution General Special Purpose + Software = Purpose Machine Machine Software is simply the design of a machine abstracted from its physical realization. Machines that were physically impossible or impractical to build become feasible. Design can be changed without retooling or manufacturing. Can concentrate on steps to be achieved without worrying about how steps will be realized physically. Advantages = Disadvantages Computer so powerful and so useful because it has eliminated many of physical constraints of previous machines. Both its blessing and its curse: + No longer have to worry about physical realization of our designs. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a40a39 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a38a33 a5a4a35 ? No longer have physical laws that limit the complexity of our designs. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a42 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 The Curse of Flexibility Software is the resting place of afterthoughts No physical constraints To enforce discipline on design, construction and modification To control complexity So flexible that start working with it before fully understanding what need to do ‘‘And they looked upon the software and saw that it was good, but they just had to add one other feature ...’’ c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12a44a43 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 Software Myths 1. Good software engineering is the same for all types of software. 2. Software is easy to change. 3. Software errors are simply ‘‘teething’’ problems. 4. Reusing software will increase safety. 5. Testing or ‘‘proving’’ software correct will remove all the errors. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12a2a12 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 Abstraction from Physical Design Software engineers are doing system design Expert Autopilot Autopilot Engineer Design ofSoftwareSystem Requirements Most errors in operational software related to requirements Completeness a particular problem Software "failure modes" are different Usually does exactly what you tell it to do Problems occur from operation, not lack of operation Usually doing exactly what software engineers wanted c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12a44a10 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 Typical Fault Trees ...... Test software0 Mitigation ProbabilityHazard Cause Software Error (error) fails Software Hazard OR a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12a44a24c a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 Black Box Testing Test data derived solely from specification (i.e., without knowledge of internal structure of program). Need to test every possible input x := y * 2 (since black box, only way to be sure to detect this is to try every input condition) Valid inputs up to max size of machine (not astronomical) Also all invalid input (e.g., testing Ada compiler requires all valid and invalid programs) If program has ‘‘memory’’, need to test all possible unique valid and invalid sequences. So for most programs, exhaustive in put testing is impractical. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12a44a23 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 White Box Testing Derive test data by examining program’s logic. Exhaustic path testing: Two flaws 1) Number of unique paths through program is astronomical. 20x loop 20 19 18 14 5 + 5 + 5 + ... + 5 = 10 = 100 trillion If could develop/execute/verify one test case every five minutes = 1 billion years If had magic test processor that could develop/execute/evaluate one test per msec = 3170 years. (control?flow graph) c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12a44a36 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 White Box Testing (con’t) 2) Could test every path and program may still have errors! Does not guarantee program matches specification, i.e., wrong program. Missing paths: would not detect absence of necessary paths Could still have data?sensitivity errors. e.g. program has to compare two numbers for convergence if (A ? B) < epsilon ... is wrong because should compare to abs(A ? B) Detection of this error dependent on values used for A and B and would not necessarily be found by executing every path through program. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12a44a37 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 Mathematical Modeling Difficulties Large number of states and lack of regularity Lack of physical continuity: requires discrete rather than continuous math Specifications and proofs using logic: May be same size or larger than code More difficult to construct than code Harder to understand than code Therefore, as difficult and error?prone as code itself Have not found good ways to measure software quality c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12a44a41 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 A Possible Solution Enforce discipline and control complexity Limits have changed from structural integrity and physical constraints of materials to intellectual limits Improve communication among engineers Build safety in by enforcing constraints on behavior Example (batch reactor) System safety constraint: Water must be flowing into reflux condenser whenever catalyst is added to reactor. Software safety constraint: Software must always open water valve before catalyst valve a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12a44a39 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 Stages in Process Control System Evolution 1. Mechanical systems Direct sensory perception of process Displays are directly connected to process and thus are physical extensions of it. Design decisions highly constrained by: Available space c Physics of underlying process Limited possibility of action at a distance c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a13a12a44a42 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 Stages in Process Control System Evolution (2) 2. Electromechanical systems Capability for action at a distance Need to provide an image of process to operators Need to provide feedback on actions taken. Relaxed constraints on designers but created new possibilities for designer and operator error. a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a10a2a43c a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 Stages in Process Control System Evolution (3) 3. Computer?based systems Allow multiplexing of controls and displays. Relaxes even more constraints and introduces more possibility for error. But constraints shaped environment in ways that efficiently transmitted valuable process information and supported cognitive processes of operators. Finding it hard to capture and present these qualities in new systems. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a10a45a12 a25 a6a2a22a9a26a2a27a2a28 a1a2a18 a5a30a29a2a7a2a31a9a32a34a33 a5a4a35 The Problem to be Solved The primary safety problem in computer?based systems is the lack of appropriate constraints on design. The job of the system safety engineer is to identify the design constraints necessary to maintain safety and to ensure the system and software design enforces them. . c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a10a2a24 a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 . . Safety Reliability Accidents in high?tech systems are changing their nature, and we must change our approaches to safety accordingly. . . c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a10a2a23 a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 Confusing Safety and Reliability From an FAA report on ATC software architectures: "The FAA’s en route automation meets the criteria for consideration as a safety?critical system. Therefore, en route automation systems must posses ultra?high reliability." From a blue ribbon panel report on the V?22 Osprey problems: "Safety [software]: ... Recommendation: Improve reliability, then verify by extensive test/fix/test in challenging environments." c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a10a2a36 a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 Does Software Fail? Failure: Nonperformance or inability of system or component to perform its intended function for a specified time under specified environmental conditions. A basic abnormal occurrence, e.g., burned out bearing in a pump relay not closing properly when voltage applied Fault: Higher?order events, e.g., relay closes at wrong time due to improper functioning of an upstream component. All failures are faults but not all faults are failures. a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a10a2a37c a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 Reliability Engineering Approach to Safety Reliability: The probability an item will perform its required function in the specified manner over a given time period and under specified or assumed conditions. (Note: Most software?related accidents result from errors in specified requirements or function and deviations from assumed conditions.) Concerned primarily with failures and failure rate reduction Parallel redundancy Standby sparing Safety factors and margins Derating Screening Timed replacements c Reliability Engineering Approach to Safety (2) Assumes accidents are the result of component failure. Techniques exist to increase component reliability Failure rates in hardware are quantifiable. Omits important factors in accidents. May even decrease safety. Many accidents occur without any component ‘‘failure’’ e.g. Accidents may be caused by equipment operation outside parameters and time limits upon which reliability analyses are based. Or may be caused by interactions of components all operating according to specification Highly reliable components are not necessarily safe. a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a10a2a41 a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a10a2a39 a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 Reliability Approach to Software Safety Standard engineering techniques of Preventing failures through redundancy Increasing component reliability Reuse of designs and learning from experience won’t work for software and system accidents. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a10a2a42 a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 Preventing Failures through Redundancy Redundancy simply makes complexity worse. NASA experimental aircraft example Any solutions that involve adding complexity will not not solve problems that stem from intellectual unmanageability and interactive complexity. Majority of software?related accidents caused by requirements errors. Does not work for software even if accident is caused by a software implementation error. Software errors not caused by random wearout failures. a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a24a2a43 a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 Increasing Software Reliability (Integrity) Appearing in many new international standards for software safety (e.g., 61508) "Safety integrity level" Sometimes give reliability number (e.g., 10 ?9 ) Can software reliability be measured? What does it even mean? Safety involves more than simply getting software "correct" Example: altitude switch 1. Signal safety?increasing => Require any of three altimeters report below threshold 2. Signal safety?reducing => c Require all three altimeters to report below threshold c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a24a45a12 a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 Software Component Reuse One of most common factors in software?related accidents Software contains assumptions about its environment. Accidents occur when these assumptions are incorrect. Therac?25 Ariane 5 U.K. ATC software Most likely to change the features embedded in or controlled by the software. COTS makes safety analysis more difficult. Safety and reliability are different qualities! c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a24a2a10 a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 Software?Related Accidents Are usually caused by flawed requirements Incomplete or wrong assumptions about operation of controlled system or required operation of computer. Unhandled controlled?system states and environmental conditions. Merely trying to get the software ‘‘correct’’ or to make it reliable will not make it safer under these conditions. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a24a2a24 a46 a29a2a47 a1a2a28a48a30a3a4a5a50a49a2a32a34a1a2a21 a33 a29a2a20a2a33 a21 a33 a28 a48 Software?Related Accidents (con’t.) Software may be highly reliable and ‘‘correct’’ and still be unsafe. Correctly implements requirements but specified behavior unsafe from a system perspective. Requirements do not specify some particular behavior required for system safety (incomplete) Software has unintended (and unsafe) behavior beyond what is specified in requirements. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a24a2a23 A Little Systems Theory Systems theory can act as an alternative to reliability theory for dealing with safety. c a0a55a1a2a3a50a1a2a5a4a6a2a7a9a8a40a24a2a36 a46 a48a4a5a4a28 a1a2a22a52a51a38a29a2a53a4a29a2a18 a31a9a54a19a7a2a29a2a21 a48a50a5a4a33 a5 Ways to Cope with Complexity Analytic Reduction (Descartes) Divide system into distinct parts for analysis purposes. Examine the parts separately. Three important assumptions: 1. The division into parts will not distort the phenomenon being studied. 2. Components are the same when examined singly as when playing their part in the whole. 3. Principles governing the assembling of the components into the whole are themselves straightforward. c a0a55a1a2a3a50a1a2a5a4a6a2a7a9a8a40a24a2a37 a46 a48a4a5a4a28 a1a2a22a52a51a38a29a2a53a4a29a2a18 a31a9a54a19a7a2a29a2a21 a48a50a5a4a33 a5 Ways to Cope with Complexity (con’t.) Statistics Treat as a structureless mass with interchangeable parts. Use Law of Large Numbers to describe behavior in terms of averages. Assumes components sufficiently regular and random in their behavior that they can be studied statistically. a0a55a1a2a3a50a1a2a5a4a6a2a7a9a8a40a24a2a41 a46 a48a4a5a4a28 a1a2a22a52a51a38a29a2a53a4a29a2a18 a31a9a54a19a7a2a29a2a21 a48a50a5a4a33 a5 c What about software? Too complex for complete analysis: Separation into non?interacting subsystems distorts the results. The most important properties are emergent. Too organized for statistics Too much underlying structure that distorts the statistics. c a0a55a1a2a3a50a1a2a5a4a6a2a7a9a8a40a24a2a39 a46 a48a4a5a4a28 a1a2a22a52a51a38a29a2a53a4a29a2a18 a31a9a54a19a7a2a29a2a21 a48a50a5a4a33 a5 Systems Theory Developed for biology (Bertalanffly) and cybernetics (Norbert Weiner) For systems too complex for complete analysis Separation into non?interacting subsystems distorts results Most important properties are emergent. and too organized for statistical analysis Concentrates on analysis and design of whole as distinct from parts (basis of system engineering) Some properties can only be treated adequately in their entirety, taking into account all social and technical aspects. These properties derive from relationships between the parts of systems ?? how they interact and fit together. c a0a55a1a2a3a50a1a2a5a4a6a2a7a9a8a40a24a2a42 a46 a48a4a5a4a28 a1a2a22a52a51a38a29a2a53a4a29a2a18 a31a9a54a19a7a2a29a2a21 a48a50a5a4a33 a5 Systems Theory (2) Two pairs of ideas: 1. Emergence and hierarchy Levels of organization, each more complex than one below. Levels characterized by emergent properties Irreducible Represent constraints upon the degree of freedom of components a lower level. Safety is an emergent system property It is NOT a component property. It can only be analyzed in the context of the whole. a0a55a1a2a3a50a1a2a5a4a6a2a7a9a8a40a23a2a43c a46 a48a4a5a4a28 a1a2a22a52a51a38a29a2a53a4a29a2a18 a31a9a54a19a7a2a29a2a21 a48a50a5a4a33 a5 Systems Theory (3) 2. Communication and control Hierarchies characterized by control processes working at the interfaces between levels. A control action imposes constraints upon the activity at one level of a hierarchy. Open systems are viewed as interrelated components kept in a state of dynamic equilibrium by feedback loops of information and control. Control in open systems implies need for communication a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a23a45a12 a14a16a15a2a1a9a54a19a26a2a26a2a18 a6a2a29a2a56a4a15 c An Overview of The Approach Engineers should recognize that reducing risk is not an impossible task, even under financial and time constraints. All it takes in many cases is a different perspective on the design problem. Mike Martin and Roland Schinzinger Ethics in Engineering c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a23a2a10 a14a16a15a2a1a9a54a19a26a2a26a2a18 a6a2a29a2a56a4a15 System Safety A planned, disciplined, and systematic approach to preventing or reducing accidents throughout the life cycle of a system. ‘‘Organized common sense ’’ (Mueller, 1968) Primary concern is the management of hazards: Hazard identification evaluation elimination control through analysis design management MIL?STD?882 c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a23a2a24 a14a16a15a2a1a9a54a19a26a2a26a2a18 a6a2a29a2a56a4a15 System Safety (2) Hazard analysis and control is a continuous, iterative process throughout system development and use. development Conceptual Design Development Operations Operational feedback Change analysis Verification Hazard resolution Hazard identification Hazard resolution precedence: 1. Eliminate the hazard 2. Prevent or minimize the occurrence of the hazard 3. Control the hazard if it occurs. 4. Minimize damage. Management c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a23a2a23 a14a16a15a2a1a9a54a19a26a2a26a2a18 a6a2a29a2a56a4a15 System Safety Engineering Emphasizes building in safety rather than adding it on to a completed design. Looks at systems as a whole, not just components Takes a larger view of hazards than just failures. Emphasizes hazard analysis and design to eliminate or control hazards. Emphasizes qualitative rather than quantitative approaches. c a0a2a1a2a3a4a1a2a5a50a6a2a7a9a8a40a23a2a36a2a58a2a23a2a37 a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 Terminology Accident: An undesired and unplanned (but not necessarily unexpected) event that results in (at least) a specified level of loss. Incident: An event that involves no loss (or only minor loss) but with the potential for loss under different circumstances. Hazard: A state or set of conditions that, together with other conditions in the environment, will lead to an accident (loss event). Note that a hazard is NOT equal to a failure. ‘‘Distinguishing hazards from failures is implicit in understanding the difference between safety and reliability engineering. C.O Miller Hazard Level: A combination of severity (worst potential damage in case of an accident) and likelihood of occurence of the hazard. Risk: The hazard level combined with the likelihood of the hazard leading to an accident plus exposure (or duration) of the hazard. RISK HAZARD LEVEL Hazard Likelihood of Hazard Likelihood of hazard severity hazard occurring exposure leading to an accident Safety: Freedom from accidents or losses. SAFE Increasing level No loss of loss c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a23a2a41 a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 Hazard analysis affects, and in turn, is affected by all aspects of the development process. Operations Training Test QA Hazard analysis Maintenance Management Design Hazard Analysis Hazard analysis is the heart of any system safety program. Used for: Developing requirements and design constraints Validating requirements and design for safety Preparing operational procedures and instructions Test planning Management planning Serves as: A framework for ensuing steps c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a23a2a39 a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 A checklist to ensure management and technical responsibilities for safety are accomplished. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a23a2a42 a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 "Types" (Stages) of Hazard Analysis Preliminary Hazard Analysis (PHA) Identify, assess, and prioritize hazards Identify high?level safety design constraints System Hazard Analysis (SHA) Examine subsystem interfaces to evaluate safety of system working as a whole Refine design constraints and trace to individual components (including operators) a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a36a2a43 a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 "Types" (Stages) of Hazard Analysis (2) Subsystem Hazard Analysis (SSHA) Determine how subsystem design and behavior can contribute to system hazards. Evaluate subsystem design for compliance with safety constraints. Change and Operations Analysis Evaluate all changes for potential to contribute to hazards c Analyze operational experience c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a36a45a12 . a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 . Preliminary Hazard Analysis 1. Identify system hazards 2. Translate system hazards into high?level system safety design constraints. 3. Assess hazards if required to do so. 4. Establish the hazard log. . . c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a36a2a10 a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 System Hazards for Automated Train Doors Train starts with door open. Door opens while train is in motion. Door opens while improperly aligned with station platform. Door closes while someone is in doorway Door that closes on an obstruction does not reopen or reopened door does not reclose. Doors cannot be opened for emergency evacuation. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a36a2a24 System Hazards for Air Traffic Control Controlled aircraft violate minimum separation standards (NMAC). Airborne controlled aircraft enters an unsafe atmospheric region. Controlled airborne aircraft enters restricted airspace without authorization. Controlled airborne aircraft gets too close to a fixed obstable other than a safe point of touchdown on assigned runway (CFIT) Controlled airborne aircraft and an intruder in controlled airspace violate minimum separation. Controlled aircraft operates outside its performance envelope. Aircraft on ground comes too close to moving objects or collides with stationary objects or leaves the paved area. Aircraft enters a runway for which it does not have clearance. Controlled aircraft executes an extreme maneuver within its performance envelope. Loss of aircraft control. a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a36a2a23 a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 Exercise: Identify the system hazards for this cruise?control system The cruise control system operates only when the engine is running. When the driver turns the system on, the speed at which the car is traveling at that instant is maintained. The system monitors the car’s speed by sensing the rate at which the wheels are turning, and it maintains desired speed by controlling the throttle position. After the system has been turned on, the driver may tell it to start increasing speed, wait a period of time, and then tell it to stop increasing speed. Throughout the time period, the system will increase the speed at a fixed rate, and then will maintain the final speed reached. The driver may turn off the system at any time. The system will turn off if it senses that the accelerator has been depressed far enough to override the throttle control. If the system is on and senses that the brake has been depressed, it will cease maintaining speed but will not turn off. The driver may tell the system to resume speed, whereupon it will return to the speed it was maintaining before braking and resume maintenance of that speed. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a36a2a36 a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 Hazard Identification Use historical safety experience, lessons learned, trouble reports, hazard analyses, and accident and incident files. Look at published lists, checklists, standards, and codes of practice. Examine basic energy sources, flows, high?energy items, hazardous materials (fuels, propellants, lasers, explosives, toxic substances, and pressure systems). Look at potential interface problems such as material incompatibilties, possibilities for inadvertent activation, contamination, and adverse environmental scenarios. Review mission and basic performance requirements including environments in which operations will take place. Look at all possible system uses, all modes of operation, all possible environments, and all times during operation. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a36a2a37 a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 Hazard Identification (2) Examine human?machine interface. Look at transition phases, nonroutine operating modes, system changes, changes in technical and social environment, and changes between modes of operation. Use scientific investigation of physical, chemical, and other properties of system. Think through entire process, step by step, anticipating what might go wrong, how to prepare for it, and what to do if the worst happens. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a40a36a2a41a2a49a9a36a2a39 a51a38a29a2a53a4a29a2a18 a31a9a54a19a7a2a29a2a21 a48a50a5a4a33 a5 Hazards must be translated into design constraints. HAZARD DESIGN CRITERION Train starts with door open. any door open. Train must not be capable of moving with Door opens while train is in motion. motion. Doors must remain closed while train is in with station platform. Door opens while improperly aligned Door must be capable of opening only after train is stopped and properly aligned with platform unless emergency exists (see below). doorway. Door closes while someone is in Door areas must be clear before door closing begins. Door that closes on an obstruction does not reopen or reopened door does not reclose. reclose. removal of obstruction and then automatically An obstructed door must reopen to permit Doors cannot be opened for emergency evacuation. emergency evacuation. anywhere when the train is stopped for Means must be provided to open doors Example PHA for ATC Approach Control HAZARDS REQUIREMENTS/CONSTRAINTS 1. A pair of controlled aircraft violate minimum separation standards. 1b. ATC shall provide conflict alerts. maintain safe separation between aircraft. 1a. ATC shall provide advisories that areas, thunderstorm cells) (icing conditions, windshear unsafe atmospheric region. 2. A controlled aircraft enters an direct aircraft into areas with unsafe atmospheric conditions. 2a. ATC must not issue advisories that 2b. ATC shall provide weather advisories and alerts to flight crews. 2c. ATC shall warn aircraft that enter an unsafe atmospheric region. c a0a2a1a2a3a4a1a2a5a4a6a2a7a9a8a11a36a2a42a2a58a2a37a2a43 a51a34a29a2a53a4a29a2a18 a31a9a54a57a7a2a29a2a21 a48a4a5a4a33 a5 Example PHA for ATC Approach Control (2) to avoid intruders if at all possible. 5. HAZARDS REQUIREMENTS/CONSTRAINTS 3. restricted airspace without authorization. 4. close to a fixed obstacle or terrain other than a safe point of touchdown on assigned runway. 5. intruder in controlled airspace violate minimum separation standards. 3a. direct an aircraft into restricted airspace unless avoiding a greater hazard. 3b. aircraft to prevent their incursion into restricted airspace. 4. maintain safe separation between aircraft and terrain or physical obstacles. ATC shall provide alerts and advisories A controlled aircraft enters A controlled aircraft gets too A controlled aircraft and an ATC must not issue advisories that ATC shall provide timely warnings to ATC shall provide advisories that HAZARDS 6. Loss of controlled flight or loss of airframe integrity. REQUIREMENTS/CONSTRAINTS safety of flight. the pilot or aircraft cannot fly or that 6c. ATC must not issue advisories that 6b. ATC advisories must not distract or disrupt the crew from maintaining degrade the continued safe flight of the aircraft. it at the wrong place. that cause an aircraft to fall below 6a. ATC must not issue advisories outside the safe performance envelope of the aircraft. 6d. ATC must not provide advisories the standard glidepath or intersect