. Software System Safety Copyright Nancy G. Leveson, July 2002. c Accident with No Component Failures a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19 c a0a2a1a4a3a6a5a8a7 a9a4a10a4a11 a3a4a12 LC COMPUTER WATER COOLING CONDENSER VENT REFLUX REACTOR VAPOR LA CATALYST GEARBOX Types of Accidents Component Failure Accidents Single or multiple component failures Usually assume random failure System Accidents Arise in interactions among components No components may have "failed" c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21 a0a2a1a4a3a6a5a8a7 a9a4a10a4a11 a3a4a12 Caused by interactive complexity and tight coupling Exacerbated by the introduction of computers. . . a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a34c Interactive Complexity a0a2a1a4a3a6a5a8a7 a9a4a10a4a11 a3a4a12 Complexity is a moving target The underlying factor is intellectual manageability 1. A "simple" system has a small number of unknowns in its interactions within the system and with its environment. 2. A system is intellectually unmanageable when the level of interactions reaches the point where they cannot be thoroughly planned understood anticipated guarded against 3. Introducing new technology introduces unknowns and even "unk?unks." c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a35 a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33 Computers and Risk We seem not to trust one another as much as would be desirable. In lieu of trusting each other, are we putting too much trust in our technology? . . . Perhaps we are not educating our children sufficiently well to understand the reasonable uses and limits of technology. Thomas B. Sheridan c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a37 a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33 A Possible Solution Enforce discipline and control complexity Limits have changed from structural integrity and physical constraints of materials to intellectual limits Improve communication among engineers Build safety in by enforcing constraints on behavior Example (batch reactor) System safety constraint: Water must be flowing into reflux condenser whenever catalyst is added to reactor. Software safety constraint: Software must always open water valve before catalyst valve a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a36 a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33 Stages in Process Control System Evolution 1. Mechanical systems Direct sensory perception of process Displays are directly connected to process and thus are physical extensions of it. Design decisions highly constrained by: Available space c Physics of underlying process Limited possibility of action at a distance c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a38 a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33 Stages in Process Control System Evolution (2) 2. Electromechanical systems Capability for action at a distance Need to provide an image of process to operators Need to provide feedback on actions taken. Relaxed constraints on designers but created new possibilities for designer and operator error. a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a39c a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33 Stages in Process Control System Evolution (3) 3. Computer?based systems Allow multiplexing of controls and displays. Relaxes even more constraints and introduces more possibility for error. But constraints shaped environment in ways that efficiently transmitted valuable process information and supported cognitive processes of operators. Finding it hard to capture and present these qualities in new systems. c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a4a19 a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33 The Problem to be Solved The primary safety problem in computer?based systems is the lack of appropriate constraints on design. The job of the system safety engineer is to identify the design constraints necessary to maintain safety and to ensure the system and software design enforces them. . c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a21 a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43 . . Safety Reliability Accidents in high?tech systems are changing their nature, and we must change our approaches to safety accordingly. . . c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a34 a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43 Confusing Safety and Reliability From an FAA report on ATC software architectures: "The FAA’s en route automation meets the criteria for consideration as a safety?critical system. Therefore, en route automation systems must posses ultra?high reliability." From a blue ribbon panel report on the V?22 Osprey problems: "Safety [software]: ... Recommendation: Improve reliability, then verify by extensive test/fix/test in challenging environments." c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a35 a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43 Does Software Fail? Failure: Nonperformance or inability of system or component to perform its intended function for a specified time under specified environmental conditions. A basic abnormal occurrence, e.g., burned out bearing in a pump relay not closing properly when voltage applied Fault: Higher?order events, e.g., relay closes at wrong time due to improper functioning of an upstream component. All failures are faults but not all faults are failures. c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a46 a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43 Reliability Engineering Approach to Safety Reliability: The probability an item will perform its required function in the specified manner over a given time period and under specified or assumed conditions. (Note: Most software?related accidents result from errors in specified requirements or function and deviations from assumed conditions.) Concerned primarily with failures and failure rate reduction Parallel redundancy Standby sparing Safety factors and margins Derating Screening Timed replacements c a13a48a3a4a14a45a3a4a16a15a9a4a17a6a18a49a19a40a50 a41 a27a4a42 a3a4a26a43a28a14a15a16a15a44a4a30a47a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43 Reliability Engineering Approach to Safety (2) Assumes accidents are the result of component failure. Techniques exist to increase component reliability Failure rates in hardware are quantifiable. Omits important factors in accidents. May even decrease safety. Many accidents occur without any component ‘‘failure’’ e.g. Accidents may be caused by equipment operation outside parameters and time limits upon which reliability analyses are based. Or may be caused by interactions of components all operating according to specification Highly reliable components are not necessarily safe. c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a37 a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43 Software Component Reuse One of most common factors in software?related accidents Software contains assumptions about its environment. Accidents occur when these assumptions are incorrect. Therac?25 Ariane 5 U.K. ATC software Most likely to change the features embedded in or controlled by the software. COTS makes safety analysis more difficult. Safety and reliability are different qualities! c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a36 a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43 Software?Related Accidents Are usually caused by flawed requirements Incomplete or wrong assumptions about operation of controlled system or required operation of computer. Unhandled controlled?system states and environmental conditions. Merely trying to get the software ‘‘correct’’ or to make it reliable will not make it safer under these conditions. c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a38 a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43 Software?Related Accidents (con’t.) Software may be highly reliable and ‘‘correct’’ and still be unsafe. Correctly implements requirements but specified behavior unsafe from a system perspective. Requirements do not specify some particular behavior required for system safety (incomplete) Software has unintended (and unsafe) behavior beyond what is specified in requirements. c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a39 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 System Safety A planned, disciplined, and systematic approach to preventing or reducing accidents throughout the life cycle of a system. ‘‘Organized common sense ’’ (Mueller, 1968) Primary concern is the management of hazards: Hazard identification evaluation elimination control through analysis design management MIL?STD?882 c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a53a19 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 System Safety (2) Hazard analysis and control is a continuous, iterative process throughout system development and use. Operations Change analysis Verification Hazard resolution Hazard identification Operational feedback Hazard resolution precedence: 1. Eliminate the hazard 2. Prevent or minimize the occurrence of the hazard 3. Control the hazard if it occurs. 4. Minimize damage. Management development Conceptual Design Development c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a21 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 Process Steps 1. Perform a Preliminary Hazard Analysis Produces hazard list 2. Perform a System Hazard Analysis (not just Failure Analysis) Identifies potential causes of hazards 3. Identify appropriate design constraints on system, software, and humans. 4. Design at system level to eliminate or control hazards. 5. Trace unresolved hazards and system hazard controls to software requirements. a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a34 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 c Specifying Safety Constraints Most software requirements only specify nominal behavior Need to specify off?nominal behavior Need to specify what software must NOT do What must not do is not inverse of what must do Derive from system hazard analysis a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a35 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 c 6. Process Steps (2) Software requirements review and analysis Completeness Simulation and animation Software hazard analysis Robustness (environment) analysis Mode confusion and other human error analyses Human factors analyses (usability, workload, etc.) a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a46 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 c 7. 8. Process Steps (3) Implementation with safety in mind Defensive programming Assertions and run?time checking Separation of critical functions Elimination of unnecessary functions Exception?handling etc. Off?nominal and safety testing c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a50 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 Process Steps (4) 9. Operational Analysis and Auditing Change analysis Incident and accident analysis Performance monitoring Periodic audits c a103a58a57a8a104a105a57a8a106a105a64a108a107a110a109a112a111a53a113 a54a56a55a58a57a60a59a62a61a53a61a8a63 a64a8a65a53a66a67a55 A Human?Centered, Safety?Driven Design Process System Human Factors Engineering System Safety Task Allocation Principles a68a70a69a22a71a73a72a40a74a75a68a22a76 a69a22a76 a69a22a77 a78a75a79a81a80a81a79 a72a40a82a70a83a84a71a22a82 a79 a76 a77a70a69a22a85 a86a70a87 a68a70a88a81a89 a86a22a90a70a91a92a86 a82a22a93a22a68a70a94a81a76 a90 a74 a72a95a68 a79 a89 a79 a68a22a69a22a71a73a88 a90 a83a97a96 a90 a69a22a82a22a69a70a72 a98a73a90 a71a70a82 a87 a68a22a69a70a71a73a82a22a94a81a68 a87 a99 a68a22a72a95a82 a90 a96a22a82a70a74a15a68a22a72 a90 a74 a100 a76 a82 a87 a71a73a72a95a82 a79 a72a95a76a69a70a77a22a101a22a76 a69 a79 a72a95a68 a87 a87 a68a70a72a40a76 a90 a69a70a101 a102 a96a70a82a22a74a15a68a70a72a40a76 a90 a69 a79 Preliminary Hazard Analysis System Hazard Analysis Safety Verification Operational Analysis Preliminary Task Analysis a68a70a69a22a71a73a71a22a82 a79 a76 a77a22a69a73a88 a90 a69 a79 a72a95a74a15a68a70a76 a69a70a72 a79 a90 a96a22a82a22a74a75a68a22a72a40a76 a90 a69a70a68 a87 a74a75a82a22a114 a99 a76 a74a75a82a22a83a73a82a22a69a22a72 a79 a115 a82a70a69a22a82a22a74a75a68a22a72a40a82 a79a28a80a81a79 a72a95a82a22a83a84a68a70a69a22a71 a116 a82a70a74a15a76 a117a40a76a88a81a68a70a72a40a76 a90 a69 a68a70a69a22a71 a90 a96a22a82a70a74a15a68a22a72 a90 a74a118a83a73a68a22a69 a99 a68 a87 a79 a71a70a76 a79 a96 a87 a68 a80a81a79 a101a70a72a40a74a75a68a22a76 a69a22a76 a69a70a77a73a83a97a68a70a72a40a82a22a74a75a76 a68 a87 a79 a101 a88 a90 a83a73a96 a90 a69a22a82a70a69a22a72 a79 a101a22a88 a90 a69a22a72a40a74 a90a22a87 a79 a68a22a69a22a71 a119 a82 a79 a76 a77a22a69a73a68a22a69a22a71a73a88 a90 a69 a79 a72a95a74 a99 a88a28a72 a82a70a69a22a94a81a76 a74 a90 a69a22a83a73a82a22a69a22a72a95a68 a87 a68 a79a28a79 a99 a83a73a96a22a72a95a76 a90 a69 a79 a120 a71a22a82a70a69a22a72a40a76 a117 a80a92a79a81a80a81a79 a72a95a82a22a83a121a77 a90 a68 a87 a79 a68a22a69a22a71 a78 a76 a69a22a88 a87 a99 a71a22a76 a69a22a77a73a122 a98a73a120 a85 a77a70a82a22a69a22a82a22a74a75a68a22a72a95a82 a79a81a80a81a79 a72a95a82a22a83a121a71a22a82 a79 a76 a77a22a69 a123a60a87 a87 a90 a88a81a68a70a72a40a82a73a72a95a68 a79 a89 a79 a68a70a69a22a71 Operator Goals and Hazard List Responsibilities Fault Tree Analysis Safety Requirements and Constraints Operator Task and Training Requirements Completeness/Consistency Analysis Operator Task Analysis Simulation and Animation Simulation/Experiments State Machine Hazard Usability Analysis Analysis Other Human Factors Deviation Analysis (FMECA) Evaluation (workload, situation Mode Confusion Analysis awareness, etc.) Human Error Analysis Timing and other analyses Safety Testing Software FTA Operational Analysis Performance Monitoring Change Analysis Periodic audits Incident and accident analysis Periodic audits Change Analysis Performance Monitoring c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a36 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 Level 1: System Purpose High?Level Requirements [1.2] TCAS shall provide collision avoidance protection for any two aircraft closing horizontally at any rate up to 1200 knots and vertically up to 10,000 feet per minute. Assumption: Commercial aircraft can operate up to 600 knots and 5000 fpm during vertical climb or controlled descent (and therefore the planes can close horizontally up to 1200 knots and vertically up to 10,000 fpm. Design and Safety Constraints [SC5] The system must not disrupt the pilot and ATC operations during critical phases of flight nor disrupt aircraft operation. [SC5.1] The pilot of a TCAS?equipped aircraft must have the option to switch to the Traffic?Advisory?Only mode where TAs are displayed but display of resolution advisories is prohibited. Assumption: This feature will be used during final approach to parallel runways when two aircraft are projected to come close to each other and TCAS would call for an evasive maneuver. c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a38 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 Example Level 1 Safety Constraints for TCAS SC?7 TCAS must not create near misses (result in a hazardous level of vertical separation) that would not have occurred had the aircraft not carried TCAS. SC?7.1 Crossing maneuvers must be avoided if possible. 2.36, 2.38, 2.48, 2.49.2 SC?7.2 The reversal of a displayed advisory must be extremely rare. 2.51, 2.56.3, 2.65.3, 2.66 SC?7.3 TCAS must not reverse an advisory if the pilot will have insufficient time to respond to the RA before the closest point of approach (four seconds or less) or if own and intruder aircraft are separated by less than 200 feet vertically when 10 seconds or less remain to closest point of approach. 2.52 c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a34a4a39 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 Level 1: System Purpose (3) System Limitations L.5 TCAS provides no protection against aircraft with nonoperational or non?Mode C transponders. Operator Requirements OP. 4 After the threat is resolved the pilot shall return promptly and smoothly to his/her previously assigned flight path. Human?Interface Requirements Hazard and other System Analyses c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a34a53a19 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 Hazard List for TCAS H1: Near midair collision (NMAC): An encounter for which, at the closest point of approach, the vertical separation is less than 100 feet and the horizontal separation is less than 500 feet. H2: TCAS causes controlled maneuver into ground e.g. descend command near terrain H3: TCAS causes pilot to lose control of the aircraft. H4: TCAS interferes with other safety?related systems e.g. interferes with ground proximity warning c TCAS does not display a resolution advisory. TCAS unit is not providing RAs. <Self?monitor shuts down TCAS unit> Sensitivity level set such that no RAs are displayed. ... No RA inputs are provided to the display. No RA is generated by the logic Inputs do not satisfy RA criteria a13a48a3a4a14a45a3a4a16a15a9a4a17a6a18a70a34a4a21 a0a105a1a4a3a6a51a124a24a4a24a4a7 a9a4a27a4a52a15a1 Surveillance puts threat outside corrective RA position. Surveillance does not pass adequate track to the logic <Threat is non?Mode C aircraft> L.5 1.23.1<Surveillance failure> to be calculated> Altitude reports put threat outside corrective RA position Altitude errors put threat on ground <Uneven terrain> <Intruder altitude error> <Own Mode C altitude error> <Own radar altimeter error> 2.19 1.23.1 1.23.1 Altitude errors put threat in non?threat position. ... <Intruder maneuver causes logic to delay RA beyond CPA> 2.35 SC4.2 ... <Process/display connectors fail> <Display is preempted by other functions> <Display hardware fails> 2.22 SC4.8 1.23.1 TCAS displays a resolution advisory that the pilot does not follow. Pilot does not execute RA at all. Crew does not perceive RA alarm. <Inadequate alarm design> <Crew is preoccupied> 1.4 to 1.14 2.74, 2.76 <Crew does not believe RA is correct.> OP.1 ... Pilot executes the RA but inadequately <Pilot stops before RA is removed> OP.10 OP.4 OP.10 <Pilot continues beyond point RA is removed> <Pilot delays execution beyond time allowed> c a125a48a126a4a127a45a126a4a128a15a129a4a130a6a131a70a132a4a132 a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1 2.19 When below 1700 feet AGL, the CAS logic uses the difference between its own aircraft pressure altitude and radar altitude to determine the approximate elevation of the ground above sea level (see Figure 2.5). It then subtracts the latter value from the pressure altitude value received from the target to determine the approximate altitude of the target above the ground (barometric altitude ? radar altitude + 180 feet). If this altitude is less than 180 feet, TCAS considers the target to be on the ground ( 1.SC4.9). Traffic and resolution advisories are inhibited for any intruder whose tracked altitude is below this estimate. Hysteresis is provided to reduce vacillations in the display of traffic advisories that might result from hilly terrain ( FTA?320). All RAs are inhibited when own TCAS is within 500 feet of the ground. OWN TCAS Barometric Airborne Declared Radar Altimeter Value Altimeter Allowance 180?foot on Ground Declared on Ground Declared a224a225 c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a132a4a199 a133a2a134a4a126a6a135a8a136a4a136a4a137 a129a4a138a4a139a15a134 Example Level?2 System Design for TCAS SENSE REVERSALS Reversal?Provides?More?Separation m?301 2.51 In most encounter situations, the resolution advisory sense will be maintained for the duration of an encounter with a threat aircraft. SC?7.2 However, under certain circumstances, it may be necessary for that sense to be reversed. For example, a conflict between two TCAS?equipped aircraft will, with very high probability, result in selection of complementary advisory senses because of the coordination protocol between the two aircraft. However, if coordination communications between the two aircraft are disrupted at a critical time of sense selection, both aircraft may choose their advisories independently. FTA?1300 This could possibly result in selection of incompatible senses. FTA?395 2.51.1 [Information about how incompatibilities are handled] a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a132a4a200c a133a2a134a4a126a6a135a8a136a4a136a4a137 a129a4a138a4a139a15a134 Level 3 Modeling Language Example a157 a156a158a155a145a141a145a144a4a149a151a154a60a144a4a143a145a159a158a159a158a146 a160 a189 a190a92a154a192a191a92a193a195a194a92a196a142a191a92a197 a198a142a154a49a175a142a154a192a193a112a198 a161a151a144a4a162a118a201a108a146 a202a179a143a145a156a158a141a145a149a151a154a60a144a4a143a145a159a158a159a163a146 a160 a161a151a162a118a156a163a141a118a147a145a156a158a146 a143a145a152 a149a151a154a49a155a118a144a4a141a145a143a145a156 a154a60a155a145a144a4a141a145a143a118a156 a154a49a155a118a144a4a141a145a143a145a156 a157 a156a158a155a145a141a118a144a4a149a151a154a49a144a4a143a145a159a158a159a158a146 a160 a234a192a235 . . . . . . . . . . . a140a142a141a118a143a145a144a4a146 a147a118a148a145a149a151a150a142a143a145a152 a146 a153 a161a151a162a118a156a163a141a118a147a145a156a158a146 a143a145a152 a149a151a154a49a155a118a144a4a141a145a143a145a156a158a149a151a164a92a162a118a147a145a153a145a146 a156a163a146 a162a145a147 . a165 a166a167a60a168a151a169a145a170a4a171a145a170a4a172 a157 a147a118a149a151a173a73a144a4a162a118a174a145a147a145a153a157 a156a158a155a145a141a145a144a4a149a151a175a142a146 a144a4a160a6a144a58a143a118a159a163a156 a180a6a181a8a182a124a183a184a182 a180a6a181a8a182a184a183a124a185 a186 a181a8a187a184a183a124a185 a180a6a181a8a182a184a188a124a185 a165 . . a161a151a144a4a162a118a201a6a146 a202a179a143a145a156a158a141a145a149a151a154a60a144a4a143a145a159a158a159a163a146 a160a6a149a151a164a92a162a145a147a118a153a145a146 a156a158a146 a162a118a147 a191a215a143a145a147a145a148a118a141a145a149a151a150a142a143a145a152 a146 a153 a175a142a152 a156a163a149a142a191a215a141a145a231a145a162a118a144a4a156a163a146 a147a145a148 a166a167a49a168a151a169a118a170a4a171a145a170a4a172a233a232 a162a145a176a6a156 a165 a165a236 a236 a165 a236 a236 a165 a226 a237 a172a145a169a118a238a145a239a28a166 a206a60a170a53a166 a210a60a167a192a217 a175a241a156a158a155a118a144a58a141a118a143a145a156a118a146 a176a73a144a4a141a118a160a6a152 a143a145a176a6a176a6a146 a159a163a146 a141a145a153a179a143a145a176a178a162a145a156a158a155a145a141a118a144a49a156a158a144a4a143a145a159a158a159a158a146 a160a178a146 a159a145a146 a156a158a176a178a143a145a152 a156a158a146 a156a158a174a145a153a118a141a179a144a4a141a145a231a145a162a118a144a4a156a163a146 a147a145a148 a155a118a143a145a176a178a177a145a141a145a141a118a147a179a152 a162a145a176a6a156 a221 a203 a197a158a218a28a219a118a220 a143a118a147a145a153a179a141a145a146 a156a163a155a118a141a145a144a60a156a163a155a118a141a179a177a145a141a145a143a118a144a4a146 a147a145a148a179a162a118a144a49a144a4a143a118a147a145a148a145a141a179a146 a147a145a231a145a174a118a156a163a176a178a143a145a144a4a141a179a146 a147a145a229a6a143a145a152 a146 a153a112a222 a146 a159a145a146 a156a158a176a73a143a118a152 a156a158a146 a156a158a174a145a153a145a141a179a144a4a141a118a231a145a162a145a144a4a156a158a146 a147a118a148a179a155a145a143a145a176a178a177a145a141a118a141a145a147a179a152 a162a145a176a6a156a145a143a118a147a145a153a179a177a145a162a118a156a163a155a179a156a158a155a145a141a179a144a4a143a118a147a145a148a145a141a179a143a118a147a145a153a179a177a145a141a118a143a145a144a4a146 a147a118a148a179a143a145a144a4a141 a146a229a6a143a118a152 a153a179a177a118a174a145a156a145a147a118a141a145a146 a156a158a155a118a141a145a144a49a156a158a155a118a141a179a231a145a144a4a162a145a201a6a146 a202a179a143a118a156a163a141a179a147a118a162a179a144a49a231a118a162a145a156a158a141a145a147a145a156a158a146 a143a118a152a6a156a163a155a118a144a4a141a145a143a145a156a118a160a6a152 a143a145a176a6a176a6a146 a159a163a146 a160a6a143a145a156a158a146 a162a145a147a179a160a6a144a4a146 a156a163a141a118a144a58a146 a143 a143a118a144a4a141a179a176a6a143a145a156a158a146 a176a6a159a158a146 a141a145a153a215a222a28a162a145a144a60a156a158a155a145a141a179a143a145a146 a144a4a160a6a144a4a143a145a159a158a156a145a146 a176a178a162a145a147a179a156a158a155a145a141a179a148a118a144a4a162a145a174a145a147a118a153 a221 a203 a197a158a218 a203 a220 . a205 a171a145a206a60a206a207a166 a167a60a208a209a170a4a210a212a211a110a172a145a213a145a172a215a214a28a216a215a217 a203 a197 a203 a219a112a223 a203 a197 a203a145a204 a205 a171a145a206a60a206a207a166 a167a60a208a209a170a4a210a212a211a110a172a145a213a145a172a215a214a28a240a215a217 a227 a197 a228a112a197a158a218a145a223a28a154a60a144a4a143a145a159a158a159a158a146 a160a6a149a151a175a142a153a145a229a6a146 a176a6a162a118a144a4a230 c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a132a4a248 . a242a32a138a4a243a15a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a15a247 a128 . Preliminary Hazard Analysis 1. Identify system hazards 2. Translate system hazards into high?level system safety design constraints. 3. Assess hazards if required to do so. 4. Establish the hazard log. . . c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a132a4a249 a242a32a138a4a243a15a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a15a247 a128 System Hazards for Automated Train Doors Train starts with door open. Door opens while train is in motion. Door opens while improperly aligned with station platform. Door closes while someone is in doorway Door that closes on an obstruction does not reopen or reopened door does not reclose. Doors cannot be opened for emergency evacuation. c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a132a4a250 System Hazards for Air Traffic Control Controlled aircraft violate minimum separation standards (NMAC). Airborne controlled aircraft enters an unsafe atmospheric region. Controlled airborne aircraft enters restricted airspace without authorization. Controlled airborne aircraft gets too close to a fixed obstable other than a safe point of touchdown on assigned runway (CFIT) Controlled airborne aircraft and an intruder in controlled airspace violate minimum separation. Controlled aircraft operates outside its performance envelope. Aircraft on ground comes too close to moving objects or collides with stationary objects or leaves the paved area. Aircraft enters a runway for which it does not have clearance. Controlled aircraft executes an extreme maneuver within its performance envelope. Loss of aircraft control. a242a32a138a4a243a15a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a15a247 a128 c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a252 a242a32a138a4a243a15a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a15a247 a128 Exercise: Identify the system hazards for this cruise?control system The cruise control system operates only when the engine is running. When the driver turns the system on, the speed at which the car is traveling at that instant is maintained. The system monitors the car’s speed by sensing the rate at which the wheels are turning, and it maintains desired speed by controlling the throttle position. After the system has been turned on, the driver may tell it to start increasing speed, wait a period of time, and then tell it to stop increasing speed. Throughout the time period, the system will increase the speed at a fixed rate, and then will maintain the final speed reached. The driver may turn off the system at any time. The system will turn off if it senses that the accelerator has been depressed far enough to override the throttle control. If the system is on and senses that the brake has been depressed, it will cease maintaining speed but will not turn off. The driver may tell the system to resume speed, whereupon it will return to the speed it was maintaining before braking and resume maintenance of that speed. c a125a4a126a4a127a45a126a4a128a15a129a4a130a6a131a70a251a53a253 a242a32a138a4a243a45a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a45a247 a128 Hazards must be translated into design constraints. HAZARD DESIGN CRITERION Train starts with door open. any door open. Train must not be capable of moving with Door opens while train is in motion. motion. Doors must remain closed while train is in with station platform. Door opens while improperly aligned Door must be capable of opening only after train is stopped and properly aligned with platform unless emergency exists (see below). doorway. Door closes while someone is in Door areas must be clear before door closing begins. Door that closes on an obstruction does not reopen or reopened door does not reclose. reclose. removal of obstruction and then automatically An obstructed door must reopen to permit Doors cannot be opened for emergency evacuation. emergency evacuation. anywhere when the train is stopped for Means must be provided to open doors Example PHA for ATC Approach Control HAZARDS REQUIREMENTS/CONSTRAINTS 1. A pair of controlled aircraft violate minimum separation standards. 1b. ATC shall provide conflict alerts. maintain safe separation between aircraft. 1a. ATC shall provide advisories that areas, thunderstorm cells) (icing conditions, windshear unsafe atmospheric region. 2. A controlled aircraft enters an direct aircraft into areas with unsafe atmospheric conditions. 2a. ATC must not issue advisories that 2b. ATC shall provide weather advisories and alerts to flight crews. 2c. ATC shall warn aircraft that enter an unsafe atmospheric region. c a125a4a126a4a127a45a126a4a128a15a129a4a130a6a131a70a251a4a254 a242a32a138a4a243a45a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a45a247 a128 Example PHA for ATC Approach Control (2) to avoid intruders if at all possible. 5. HAZARDS REQUIREMENTS/CONSTRAINTS 3. restricted airspace without authorization. 4. close to a fixed obstacle or terrain other than a safe point of touchdown on assigned runway. 5. intruder in controlled airspace violate minimum separation standards. 3a. direct an aircraft into restricted airspace unless avoiding a greater hazard. 3b. aircraft to prevent their incursion into restricted airspace. 4. maintain safe separation between aircraft and terrain or physical obstacles. ATC shall provide alerts and advisories A controlled aircraft enters A controlled aircraft gets too A controlled aircraft and an ATC must not issue advisories that ATC shall provide timely warnings to ATC shall provide advisories that HAZARDS 6. Loss of controlled flight or loss of airframe integrity. REQUIREMENTS/CONSTRAINTS safety of flight. the pilot or aircraft cannot fly or that 6c. ATC must not issue advisories that 6b. ATC advisories must not distract or disrupt the crew from maintaining degrade the continued safe flight of the aircraft. it at the wrong place. that cause an aircraft to fall below 6a. ATC must not issue advisories outside the safe performance envelope of the aircraft. 6d. ATC must not provide advisories the standard glidepath or intersect c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a132 a1a0a1a2 a4a247 a137 a126a1a3a255a32a126 a6a126a4a130 a1a4 a128a28a135a124a130a4a138a4a245 a246a15a128a15a247 a128 Requirements Validation Requirements are source of most operational errors and almost all the software contributions to accidents. Much of software hazard analysis effort therefore should focus on requirements. Problem is dealing with complexity 1) Use blackbox models to separate external behavior from complexity of internal design to accomplish the behavior. 2) Use abstraction and metamodels to handle large number of discrete states required to describe software behavior. Do not have continuous math to assist us But new types of state machine modeling languages drastically reduce number of states and transitions modeler needs to describe. c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a251 a1a0a1a2 a4a247 a137 a126a1a3a255a32a126 a6a126a4a130 a1a4 a128a28a135a124a130a4a138a4a245 a246a15a128a15a247 a128 Requirements Analysis Model Execution, Animation, and Visualization Completeness State Machine Hazard Analysis (backwards reachability) Software Deviation Analysis Human Error Analysis Test Coverage Analysis and Test Case Generation Automatic code generation? c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a199 a1a0a1a2 a4a247 a137 a126a1a3a255a32a126 a6a126a4a130 a1a4 a128a28a135a124a130a4a138a4a245 a246a15a128a15a247 a128 Requirements Completeness Most software?related accidents involve software requirements deficiencies. Accidents often result from unhandled and unspecified cases. We have defined a set of criteria to determine whether a requirements specification is complete. Derived from accidents and basic engineering principles. Validated (at JPL) and used on industrial projects. Completeness: Requirements are sufficient to distinguish the desired behavior of the software from that of any other undesired program that might be designed. c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a200 a1a0a1a2 a4a247 a137 a126a1a3a255a32a126 a6a126a4a130 a1a4 a128a28a135a124a130a4a138a4a245 a246a15a128a15a247 a128 Requirements Completeness Criteria (2) How were criteria derived? Mapped the parts of a control loop to a state machine I/O I/O Defined completeness for each part of state machine States, inputs, outputs, transitions Mathematical completeness Added basic engineering principles (e.g., feedback) Added what have learned from accidents c a125a4a126a4a127a15a126a4a128a45a129a48a130a6a131a22a251a4a248 a1a0a1a2 a4a247 a137 a126a1a3a255a32a126 a6a126a4a130 a1a4 a128a112a135a124a130a4a138a4a245 a246a15a128a45a247 a128 Requirements Completeness Criteria (3) About 60 criteria in all including human?computer interaction. (won’t go through them all they are in the book) Startup, shutdown Mode transitions Inputs and outputs Value and timing Load and capacity Environment capacity Failure states and transitions Human?computer interface Robustness Data age Latency Feedback Reversibility Preemption Path Robustness Most integrated into SpecTRM?RL language design or simple tools can check them. c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a250 a5a32a126a4a128a15a247 a6a4a130 Design for Safety Software design must enforce safety constraints Should be able to trace from requirements to code (vice versa) Design should incorporate basic safety design principles c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a199a4a252 a5a32a126a4a128a15a247 a6a4a130 Safe Design Precedence HAZARD ELIMINATION Substitution Simplification Decoupling Elimination of human errors Reduction of hazardous materials or conditions HAZARD REDUCTION Redundancy Safety Factors and Margins Failure Minimization Design for controllability Barriers Lockins, Lockouts, Interlocks HAZARD CONTROL Reducing exposure Isolation and containment Protection systems and fail?safe design DAMAGE REDUCTION Decreasing cost Increasing effectiveness