.
Software System Safety
Copyright Nancy G. Leveson, July 2002.
c
Accident with No Component Failures
a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19 c
a0a2a1a4a3a6a5a8a7 a9a4a10a4a11 a3a4a12
LC
COMPUTER
WATER
COOLING
CONDENSER
VENT
REFLUX
REACTOR
VAPOR
LA
CATALYST
GEARBOX
Types of Accidents
Component Failure Accidents
Single or multiple component failures
Usually assume random failure
System Accidents
Arise in interactions among components
No components may have "failed"
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21
a0a2a1a4a3a6a5a8a7 a9a4a10a4a11 a3a4a12
Caused by interactive complexity and tight coupling
Exacerbated by the introduction of computers.
. .
a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a34c
Interactive Complexity
a0a2a1a4a3a6a5a8a7 a9a4a10a4a11 a3a4a12
Complexity is a moving target
The underlying factor is intellectual manageability
1. A "simple" system has a small number of unknowns in its
interactions within the system and with its environment.
2. A system is intellectually unmanageable when the level of
interactions reaches the point where they cannot be thoroughly
planned
understood
anticipated
guarded against
3. Introducing new technology introduces unknowns and
even "unk?unks."
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a35
a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33
Computers and Risk
We seem not to trust one another as much as would be
desirable. In lieu of trusting each other, are we putting
too much trust in our technology? . . . Perhaps we are
not educating our children sufficiently well to understand
the reasonable uses and limits of technology.
Thomas B. Sheridan
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a37
a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33
A Possible Solution
Enforce discipline and control complexity
Limits have changed from structural integrity and physical
constraints of materials to intellectual limits
Improve communication among engineers
Build safety in by enforcing constraints on behavior
Example (batch reactor)
System safety constraint:
Water must be flowing into reflux condenser whenever
catalyst is added to reactor.
Software safety constraint:
Software must always open water valve before catalyst valve
a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a36
a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33
Stages in Process Control System Evolution
1. Mechanical systems
Direct sensory perception of process
Displays are directly connected to process and thus
are physical extensions of it.
Design decisions highly constrained by:
Available space
c
Physics of underlying process
Limited possibility of action at a distance
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a38
a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33
Stages in Process Control System Evolution (2)
2. Electromechanical systems
Capability for action at a distance
Need to provide an image of process to operators
Need to provide feedback on actions taken.
Relaxed constraints on designers but created new
possibilities for designer and operator error.
a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a39c
a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33
Stages in Process Control System Evolution (3)
3. Computer?based systems
Allow multiplexing of controls and displays.
Relaxes even more constraints and introduces
more possibility for error.
But constraints shaped environment in ways that efficiently
transmitted valuable process information and supported
cognitive processes of operators.
Finding it hard to capture and present these qualities
in new systems.
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a4a19
a23 a9a4a12a6a24a4a25a4a26 a3a4a7a16a28a27a4a17a4a29a6a30a32a31 a16a15a33
The Problem to be Solved
The primary safety problem in computer?based systems
is the lack of appropriate constraints on design.
The job of the system safety engineer is to identify the
design constraints necessary to maintain safety and to
ensure the system and software design enforces them.
.
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a21
a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43
. .
Safety Reliability
Accidents in high?tech systems are changing
their nature, and we must change our approaches
to safety accordingly.
. .
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a34
a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43
Confusing Safety and Reliability
From an FAA report on ATC software architectures:
"The FAA’s en route automation meets the criteria for
consideration as a safety?critical system. Therefore,
en route automation systems must posses ultra?high
reliability."
From a blue ribbon panel report on the V?22 Osprey problems:
"Safety [software]: ...
Recommendation: Improve reliability, then verify by
extensive test/fix/test in challenging environments."
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a35
a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43
Does Software Fail?
Failure: Nonperformance or inability of system or component
to perform its intended function for a specified time
under specified environmental conditions.
A basic abnormal occurrence, e.g.,
burned out bearing in a pump
relay not closing properly when voltage applied
Fault: Higher?order events, e.g.,
relay closes at wrong time due to improper functioning
of an upstream component.
All failures are faults but not all faults are failures.
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a46
a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43
Reliability Engineering Approach to Safety
Reliability: The probability an item will perform its required
function in the specified manner over a given time
period and under specified or assumed conditions.
(Note: Most software?related accidents result from errors
in specified requirements or function and deviations
from assumed conditions.)
Concerned primarily with failures and failure rate reduction
Parallel redundancy
Standby sparing
Safety factors and margins
Derating
Screening
Timed replacements
c a13a48a3a4a14a45a3a4a16a15a9a4a17a6a18a49a19a40a50
a41 a27a4a42 a3a4a26a43a28a14a15a16a15a44a4a30a47a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43
Reliability Engineering Approach to Safety (2)
Assumes accidents are the result of component failure.
Techniques exist to increase component reliability
Failure rates in hardware are quantifiable.
Omits important factors in accidents.
May even decrease safety.
Many accidents occur without any component ‘‘failure’’
e.g. Accidents may be caused by equipment operation
outside parameters and time limits upon which
reliability analyses are based.
Or may be caused by interactions of components
all operating according to specification
Highly reliable components are not necessarily safe.
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a37
a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43
Software Component Reuse
One of most common factors in software?related accidents
Software contains assumptions about its environment.
Accidents occur when these assumptions are incorrect.
Therac?25
Ariane 5
U.K. ATC software
Most likely to change the features embedded in or
controlled by the software.
COTS makes safety analysis more difficult.
Safety and reliability are different qualities!
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a36
a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43
Software?Related Accidents
Are usually caused by flawed requirements
Incomplete or wrong assumptions about operation of
controlled system or required operation of computer.
Unhandled controlled?system states and environmental
conditions.
Merely trying to get the software ‘‘correct’’ or to make it
reliable will not make it safer under these conditions.
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a20a19a40a38
a41 a27a4a42 a3a4a26a43a28a14a15a16a45a44a4a30a32a3a4a11 a31 a27a4a10a4a31 a11 a31 a26 a43
Software?Related Accidents (con’t.)
Software may be highly reliable and ‘‘correct’’ and still
be unsafe.
Correctly implements requirements but specified
behavior unsafe from a system perspective.
Requirements do not specify some particular behavior
required for system safety (incomplete)
Software has unintended (and unsafe) behavior beyond
what is specified in requirements.
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a39
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
System Safety
A planned, disciplined, and systematic approach to
preventing or reducing accidents throughout the life
cycle of a system.
‘‘Organized common sense ’’ (Mueller, 1968)
Primary concern is the management of hazards:
Hazard
identification
evaluation
elimination
control
through
analysis
design
management
MIL?STD?882
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a53a19
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
System Safety (2)
Hazard analysis and control is a continuous, iterative process
throughout system development and use.
Operations
Change analysis
Verification
Hazard resolution
Hazard identification
Operational feedback
Hazard resolution precedence:
1. Eliminate the hazard
2. Prevent or minimize the occurrence of the hazard
3. Control the hazard if it occurs.
4. Minimize damage.
Management
development
Conceptual
Design Development
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a21
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
Process Steps
1. Perform a Preliminary Hazard Analysis
Produces hazard list
2. Perform a System Hazard Analysis (not just Failure Analysis)
Identifies potential causes of hazards
3. Identify appropriate design constraints on system, software,
and humans.
4. Design at system level to eliminate or control hazards.
5. Trace unresolved hazards and system hazard controls to
software requirements.
a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a34
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
c
Specifying Safety Constraints
Most software requirements only specify nominal behavior
Need to specify off?nominal behavior
Need to specify what software must NOT do
What must not do is not inverse of what must do
Derive from system hazard analysis
a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a35
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
c
6.
Process Steps (2)
Software requirements review and analysis
Completeness
Simulation and animation
Software hazard analysis
Robustness (environment) analysis
Mode confusion and other human error analyses
Human factors analyses (usability, workload, etc.)
a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a46
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
c
7.
8.
Process Steps (3)
Implementation with safety in mind
Defensive programming
Assertions and run?time checking
Separation of critical functions
Elimination of unnecessary functions
Exception?handling etc.
Off?nominal and safety testing
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a50
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
Process Steps (4)
9. Operational Analysis and Auditing
Change analysis
Incident and accident analysis
Performance monitoring
Periodic audits
c a103a58a57a8a104a105a57a8a106a105a64a108a107a110a109a112a111a53a113
a54a56a55a58a57a60a59a62a61a53a61a8a63 a64a8a65a53a66a67a55
A Human?Centered, Safety?Driven Design Process
System
Human Factors
Engineering
System Safety
Task Allocation Principles
a68a70a69a22a71a73a72a40a74a75a68a22a76 a69a22a76 a69a22a77
a78a75a79a81a80a81a79
a72a40a82a70a83a84a71a22a82
a79
a76 a77a70a69a22a85
a86a70a87
a68a70a88a81a89
a86a22a90a70a91a92a86
a82a22a93a22a68a70a94a81a76
a90
a74
a72a95a68
a79
a89
a79
a68a22a69a22a71a73a88
a90
a83a97a96
a90
a69a22a82a22a69a70a72
a98a73a90
a71a70a82
a87
a68a22a69a70a71a73a82a22a94a81a68
a87 a99
a68a22a72a95a82
a90
a96a22a82a70a74a15a68a22a72
a90
a74
a100
a76 a82
a87
a71a73a72a95a82
a79
a72a95a76a69a70a77a22a101a22a76 a69
a79
a72a95a68
a87 a87
a68a70a72a40a76
a90
a69a70a101
a102
a96a70a82a22a74a15a68a70a72a40a76
a90
a69
a79
Preliminary Hazard Analysis
System Hazard Analysis
Safety Verification
Operational Analysis
Preliminary Task Analysis
a68a70a69a22a71a73a71a22a82
a79
a76 a77a22a69a73a88
a90
a69
a79
a72a95a74a15a68a70a76 a69a70a72
a79
a90
a96a22a82a22a74a75a68a22a72a40a76
a90
a69a70a68
a87
a74a75a82a22a114
a99
a76 a74a75a82a22a83a73a82a22a69a22a72
a79
a115
a82a70a69a22a82a22a74a75a68a22a72a40a82
a79a28a80a81a79
a72a95a82a22a83a84a68a70a69a22a71
a116
a82a70a74a15a76 a117a40a76a88a81a68a70a72a40a76
a90
a69
a68a70a69a22a71
a90
a96a22a82a70a74a15a68a22a72
a90
a74a118a83a73a68a22a69
a99
a68
a87 a79
a71a70a76
a79
a96
a87
a68
a80a81a79
a101a70a72a40a74a75a68a22a76 a69a22a76 a69a70a77a73a83a97a68a70a72a40a82a22a74a75a76 a68
a87 a79
a101
a88
a90
a83a73a96
a90
a69a22a82a70a69a22a72
a79
a101a22a88
a90
a69a22a72a40a74
a90a22a87 a79
a68a22a69a22a71
a119
a82
a79
a76 a77a22a69a73a68a22a69a22a71a73a88
a90
a69
a79
a72a95a74
a99
a88a28a72
a82a70a69a22a94a81a76 a74
a90
a69a22a83a73a82a22a69a22a72a95a68
a87
a68
a79a28a79 a99
a83a73a96a22a72a95a76
a90
a69
a79
a120
a71a22a82a70a69a22a72a40a76 a117
a80a92a79a81a80a81a79
a72a95a82a22a83a121a77
a90
a68
a87 a79
a68a22a69a22a71
a78
a76 a69a22a88
a87 a99
a71a22a76 a69a22a77a73a122
a98a73a120
a85
a77a70a82a22a69a22a82a22a74a75a68a22a72a95a82
a79a81a80a81a79
a72a95a82a22a83a121a71a22a82
a79
a76 a77a22a69
a123a60a87 a87 a90
a88a81a68a70a72a40a82a73a72a95a68
a79
a89
a79
a68a70a69a22a71
Operator Goals and Hazard List
Responsibilities
Fault Tree Analysis
Safety Requirements and
Constraints
Operator Task and
Training Requirements
Completeness/Consistency
Analysis
Operator Task Analysis Simulation and Animation
Simulation/Experiments
State Machine Hazard
Usability Analysis
Analysis
Other Human Factors
Deviation Analysis (FMECA)
Evaluation
(workload, situation Mode Confusion Analysis
awareness, etc.)
Human Error Analysis
Timing and other analyses
Safety Testing
Software FTA
Operational Analysis
Performance Monitoring
Change Analysis
Periodic audits
Incident and accident analysis
Periodic audits
Change Analysis
Performance Monitoring
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a36
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
Level 1: System Purpose
High?Level Requirements
[1.2] TCAS shall provide collision avoidance protection for any two
aircraft closing horizontally at any rate up to 1200 knots and
vertically up to 10,000 feet per minute.
Assumption: Commercial aircraft can operate up to 600 knots and
5000 fpm during vertical climb or controlled descent (and therefore
the planes can close horizontally up to 1200 knots and vertically
up to 10,000 fpm.
Design and Safety Constraints
[SC5] The system must not disrupt the pilot and ATC operations during
critical phases of flight nor disrupt aircraft operation.
[SC5.1] The pilot of a TCAS?equipped aircraft must have the
option to switch to the Traffic?Advisory?Only mode where TAs
are displayed but display of resolution advisories is prohibited.
Assumption: This feature will be used during final approach to
parallel runways when two aircraft are projected to come close
to each other and TCAS would call for an evasive maneuver.
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a21a4a38
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
Example Level 1 Safety Constraints for TCAS
SC?7 TCAS must not create near misses (result in a hazardous level of vertical
separation) that would not have occurred had the aircraft not carried TCAS.
SC?7.1 Crossing maneuvers must be avoided if possible.
2.36, 2.38, 2.48, 2.49.2
SC?7.2 The reversal of a displayed advisory must be extremely
rare.
2.51, 2.56.3, 2.65.3, 2.66
SC?7.3 TCAS must not reverse an advisory if the pilot will have
insufficient time to respond to the RA before the closest
point of approach (four seconds or less) or if own and
intruder aircraft are separated by less than 200 feet vertically
when 10 seconds or less remain to closest point of approach.
2.52
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a34a4a39
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
Level 1: System Purpose (3)
System Limitations
L.5 TCAS provides no protection against aircraft with nonoperational
or non?Mode C transponders.
Operator Requirements
OP. 4 After the threat is resolved the pilot shall return promptly and
smoothly to his/her previously assigned flight path.
Human?Interface Requirements
Hazard and other System Analyses
c a13a4a3a4a14a15a3a4a16a15a9a4a17a6a18a22a34a53a19
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
Hazard List for TCAS
H1: Near midair collision (NMAC): An encounter for which, at the
closest point of approach, the vertical separation is less than
100 feet and the horizontal separation is less than 500 feet.
H2: TCAS causes controlled maneuver into ground
e.g. descend command near terrain
H3: TCAS causes pilot to lose control of the aircraft.
H4: TCAS interferes with other safety?related systems
e.g. interferes with ground proximity warning
c
TCAS does not display a resolution advisory.
TCAS unit is not providing RAs.
<Self?monitor shuts down TCAS unit>
Sensitivity level set such that no RAs are displayed.
...
No RA inputs are provided to the display.
No RA is generated by the logic
Inputs do not satisfy RA criteria
a13a48a3a4a14a45a3a4a16a15a9a4a17a6a18a70a34a4a21
a0a105a1a4a3a6a51a124a24a4a24a4a7 a9a4a27a4a52a15a1
Surveillance puts threat outside corrective RA position.
Surveillance does not pass adequate track to the logic
<Threat is non?Mode C aircraft> L.5
1.23.1<Surveillance failure>
to be calculated>
Altitude reports put threat outside corrective RA position
Altitude errors put threat on ground
<Uneven terrain>
<Intruder altitude error>
<Own Mode C altitude error>
<Own radar altimeter error>
2.19
1.23.1
1.23.1
Altitude errors put threat in non?threat position.
...
<Intruder maneuver causes logic to delay
RA beyond CPA>
2.35 SC4.2
...
<Process/display connectors fail>
<Display is preempted by other functions>
<Display hardware fails>
2.22 SC4.8
1.23.1
TCAS displays a resolution advisory that the pilot does not follow.
Pilot does not execute RA at all.
Crew does not perceive RA alarm.
<Inadequate alarm design>
<Crew is preoccupied>
1.4 to 1.14 2.74, 2.76
<Crew does not believe RA is correct.> OP.1
...
Pilot executes the RA but inadequately
<Pilot stops before RA is removed> OP.10
OP.4
OP.10
<Pilot continues beyond point RA is removed>
<Pilot delays execution beyond time allowed>
c a125a48a126a4a127a45a126a4a128a15a129a4a130a6a131a70a132a4a132
a0a2a1a4a3a6a51a8a24a4a24a4a7 a9a4a27a4a52a15a1
2.19 When below 1700 feet AGL, the CAS logic uses the difference
between its own aircraft pressure altitude and radar altitude to
determine the approximate elevation of the ground above sea
level (see Figure 2.5). It then subtracts the latter value from the
pressure altitude value received from the target to determine the
approximate altitude of the target above the ground (barometric
altitude ? radar altitude + 180 feet). If this altitude is less than
180 feet, TCAS considers the target to be on the ground ( 1.SC4.9).
Traffic and resolution advisories are inhibited for any intruder whose
tracked altitude is below this estimate. Hysteresis is provided to
reduce vacillations in the display of traffic advisories that might
result from hilly terrain ( FTA?320). All RAs are inhibited when
own TCAS is within 500 feet of the ground.
OWN TCAS
Barometric
Airborne
Declared
Radar
Altimeter
Value
Altimeter
Allowance
180?foot
on Ground
Declared
on Ground
Declared
a224a225
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a132a4a199
a133a2a134a4a126a6a135a8a136a4a136a4a137 a129a4a138a4a139a15a134
Example Level?2 System Design for TCAS
SENSE REVERSALS Reversal?Provides?More?Separation
m?301
2.51 In most encounter situations, the resolution advisory sense will be
maintained for the duration of an encounter with a threat aircraft.
SC?7.2
However, under certain circumstances, it may be necessary for
that sense to be reversed. For example, a conflict between two
TCAS?equipped aircraft will, with very high probability, result in
selection of complementary advisory senses because of the
coordination protocol between the two aircraft. However, if
coordination communications between the two aircraft are
disrupted at a critical time of sense selection, both aircraft may
choose their advisories independently.
FTA?1300
This could possibly result in selection of incompatible senses.
FTA?395
2.51.1 [Information about how incompatibilities are handled]
a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a132a4a200c
a133a2a134a4a126a6a135a8a136a4a136a4a137 a129a4a138a4a139a15a134
Level 3 Modeling Language Example
a157 a156a158a155a145a141a145a144a4a149a151a154a60a144a4a143a145a159a158a159a158a146 a160
a189 a190a92a154a192a191a92a193a195a194a92a196a142a191a92a197 a198a142a154a49a175a142a154a192a193a112a198
a161a151a144a4a162a118a201a108a146 a202a179a143a145a156a158a141a145a149a151a154a60a144a4a143a145a159a158a159a163a146 a160
a161a151a162a118a156a163a141a118a147a145a156a158a146 a143a145a152 a149a151a154a49a155a118a144a4a141a145a143a145a156
a154a60a155a145a144a4a141a145a143a118a156
a154a49a155a118a144a4a141a145a143a145a156 a157 a156a158a155a145a141a118a144a4a149a151a154a49a144a4a143a145a159a158a159a158a146 a160
a234a192a235
. .
.
.
.
.
.
.
.
.
.
a140a142a141a118a143a145a144a4a146 a147a118a148a145a149a151a150a142a143a145a152 a146 a153
a161a151a162a118a156a163a141a118a147a145a156a158a146 a143a145a152 a149a151a154a49a155a118a144a4a141a145a143a145a156a158a149a151a164a92a162a118a147a145a153a145a146 a156a163a146 a162a145a147
.
a165
a166a167a60a168a151a169a145a170a4a171a145a170a4a172 a157 a147a118a149a151a173a73a144a4a162a118a174a145a147a145a153a157 a156a158a155a145a141a145a144a4a149a151a175a142a146 a144a4a160a6a144a58a143a118a159a163a156
a180a6a181a8a182a124a183a184a182
a180a6a181a8a182a184a183a124a185
a186 a181a8a187a184a183a124a185
a180a6a181a8a182a184a188a124a185
a165
.
.
a161a151a144a4a162a118a201a6a146 a202a179a143a145a156a158a141a145a149a151a154a60a144a4a143a145a159a158a159a163a146 a160a6a149a151a164a92a162a145a147a118a153a145a146 a156a158a146 a162a118a147
a191a215a143a145a147a145a148a118a141a145a149a151a150a142a143a145a152 a146 a153
a175a142a152 a156a163a149a142a191a215a141a145a231a145a162a118a144a4a156a163a146 a147a145a148 a166a167a49a168a151a169a118a170a4a171a145a170a4a172a233a232 a162a145a176a6a156 a165
a165a236
a236 a165
a236
a236
a165
a226
a237 a172a145a169a118a238a145a239a28a166 a206a60a170a53a166 a210a60a167a192a217 a175a241a156a158a155a118a144a58a141a118a143a145a156a118a146 a176a73a144a4a141a118a160a6a152 a143a145a176a6a176a6a146 a159a163a146 a141a145a153a179a143a145a176a178a162a145a156a158a155a145a141a118a144a49a156a158a144a4a143a145a159a158a159a158a146 a160a178a146 a159a145a146 a156a158a176a178a143a145a152 a156a158a146 a156a158a174a145a153a118a141a179a144a4a141a145a231a145a162a118a144a4a156a163a146 a147a145a148
a155a118a143a145a176a178a177a145a141a145a141a118a147a179a152 a162a145a176a6a156 a221 a203 a197a158a218a28a219a118a220 a143a118a147a145a153a179a141a145a146 a156a163a155a118a141a145a144a60a156a163a155a118a141a179a177a145a141a145a143a118a144a4a146 a147a145a148a179a162a118a144a49a144a4a143a118a147a145a148a145a141a179a146 a147a145a231a145a174a118a156a163a176a178a143a145a144a4a141a179a146 a147a145a229a6a143a145a152 a146 a153a112a222
a146 a159a145a146 a156a158a176a73a143a118a152 a156a158a146 a156a158a174a145a153a145a141a179a144a4a141a118a231a145a162a145a144a4a156a158a146 a147a118a148a179a155a145a143a145a176a178a177a145a141a118a141a145a147a179a152 a162a145a176a6a156a145a143a118a147a145a153a179a177a145a162a118a156a163a155a179a156a158a155a145a141a179a144a4a143a118a147a145a148a145a141a179a143a118a147a145a153a179a177a145a141a118a143a145a144a4a146 a147a118a148a179a143a145a144a4a141
a146a229a6a143a118a152 a153a179a177a118a174a145a156a145a147a118a141a145a146 a156a158a155a118a141a145a144a49a156a158a155a118a141a179a231a145a144a4a162a145a201a6a146 a202a179a143a118a156a163a141a179a147a118a162a179a144a49a231a118a162a145a156a158a141a145a147a145a156a158a146 a143a118a152a6a156a163a155a118a144a4a141a145a143a145a156a118a160a6a152 a143a145a176a6a176a6a146 a159a163a146 a160a6a143a145a156a158a146 a162a145a147a179a160a6a144a4a146 a156a163a141a118a144a58a146 a143
a143a118a144a4a141a179a176a6a143a145a156a158a146 a176a6a159a158a146 a141a145a153a215a222a28a162a145a144a60a156a158a155a145a141a179a143a145a146 a144a4a160a6a144a4a143a145a159a158a156a145a146 a176a178a162a145a147a179a156a158a155a145a141a179a148a118a144a4a162a145a174a145a147a118a153
a221 a203 a197a158a218 a203 a220
.
a205 a171a145a206a60a206a207a166 a167a60a208a209a170a4a210a212a211a110a172a145a213a145a172a215a214a28a216a215a217 a203 a197 a203 a219a112a223 a203 a197 a203a145a204
a205 a171a145a206a60a206a207a166 a167a60a208a209a170a4a210a212a211a110a172a145a213a145a172a215a214a28a240a215a217 a227 a197 a228a112a197a158a218a145a223a28a154a60a144a4a143a145a159a158a159a158a146 a160a6a149a151a175a142a153a145a229a6a146 a176a6a162a118a144a4a230
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a132a4a248
.
a242a32a138a4a243a15a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a15a247 a128
.
Preliminary Hazard Analysis
1. Identify system hazards
2. Translate system hazards into high?level
system safety design constraints.
3. Assess hazards if required to do so.
4. Establish the hazard log.
.
.
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a132a4a249
a242a32a138a4a243a15a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a15a247 a128
System Hazards for Automated Train Doors
Train starts with door open.
Door opens while train is in motion.
Door opens while improperly aligned with station platform.
Door closes while someone is in doorway
Door that closes on an obstruction does not reopen or reopened
door does not reclose.
Doors cannot be opened for emergency evacuation.
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a132a4a250
System Hazards for Air Traffic Control
Controlled aircraft violate minimum separation standards (NMAC).
Airborne controlled aircraft enters an unsafe atmospheric region.
Controlled airborne aircraft enters restricted airspace without
authorization.
Controlled airborne aircraft gets too close to a fixed obstable
other than a safe point of touchdown on assigned runway (CFIT)
Controlled airborne aircraft and an intruder in controlled airspace
violate minimum separation.
Controlled aircraft operates outside its performance envelope.
Aircraft on ground comes too close to moving objects or collides
with stationary objects or leaves the paved area.
Aircraft enters a runway for which it does not have clearance.
Controlled aircraft executes an extreme maneuver within its
performance envelope.
Loss of aircraft control.
a242a32a138a4a243a15a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a15a247 a128
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a252
a242a32a138a4a243a15a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a15a247 a128
Exercise: Identify the system hazards for this cruise?control system
The cruise control system operates only when the engine is running.
When the driver turns the system on, the speed at which the car is
traveling at that instant is maintained. The system monitors the car’s
speed by sensing the rate at which the wheels are turning, and it
maintains desired speed by controlling the throttle position. After the
system has been turned on, the driver may tell it to start increasing
speed, wait a period of time, and then tell it to stop increasing speed.
Throughout the time period, the system will increase the speed at a
fixed rate, and then will maintain the final speed reached.
The driver may turn off the system at any time. The system will turn
off if it senses that the accelerator has been depressed far enough to
override the throttle control. If the system is on and senses that the
brake has been depressed, it will cease maintaining speed but will not
turn off. The driver may tell the system to resume speed, whereupon
it will return to the speed it was maintaining before braking and resume
maintenance of that speed.
c a125a4a126a4a127a45a126a4a128a15a129a4a130a6a131a70a251a53a253
a242a32a138a4a243a45a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a45a247 a128
Hazards must be translated into design constraints.
HAZARD DESIGN CRITERION
Train starts with door open.
any door open.
Train must not be capable of moving with
Door opens while train is in motion.
motion.
Doors must remain closed while train is in
with station platform.
Door opens while improperly aligned
Door must be capable of opening only after
train is stopped and properly aligned with
platform unless emergency exists (see below).
doorway.
Door closes while someone is in Door areas must be clear before door
closing begins.
Door that closes on an obstruction
does not reopen or reopened door
does not reclose. reclose.
removal of obstruction and then automatically
An obstructed door must reopen to permit
Doors cannot be opened for
emergency evacuation.
emergency evacuation.
anywhere when the train is stopped for
Means must be provided to open doors
Example PHA for ATC Approach Control
HAZARDS REQUIREMENTS/CONSTRAINTS
1. A pair of controlled aircraft
violate minimum separation
standards.
1b. ATC shall provide conflict alerts.
maintain safe separation between
aircraft.
1a. ATC shall provide advisories that
areas, thunderstorm cells)
(icing conditions, windshear
unsafe atmospheric region.
2. A controlled aircraft enters an
direct aircraft into areas with unsafe
atmospheric conditions.
2a. ATC must not issue advisories that
2b. ATC shall provide weather advisories
and alerts to flight crews.
2c. ATC shall warn aircraft that enter an
unsafe atmospheric region.
c a125a4a126a4a127a45a126a4a128a15a129a4a130a6a131a70a251a4a254
a242a32a138a4a243a45a138a4a137 a244a6a135a124a130a4a138a4a245 a246a15a128a45a247 a128
Example PHA for ATC Approach Control (2)
to avoid intruders if at all possible.
5.
HAZARDS REQUIREMENTS/CONSTRAINTS
3.
restricted airspace without
authorization.
4.
close to a fixed obstacle or
terrain other than a safe point of
touchdown on assigned runway.
5.
intruder in controlled airspace
violate minimum separation
standards.
3a.
direct an aircraft into restricted airspace
unless avoiding a greater hazard.
3b.
aircraft to prevent their incursion into
restricted airspace.
4.
maintain safe separation between
aircraft and terrain or physical obstacles.
ATC shall provide alerts and advisories
A controlled aircraft enters
A controlled aircraft gets too
A controlled aircraft and an
ATC must not issue advisories that
ATC shall provide timely warnings to
ATC shall provide advisories that
HAZARDS
6. Loss of controlled flight or loss
of airframe integrity.
REQUIREMENTS/CONSTRAINTS
safety of flight.
the pilot or aircraft cannot fly or that
6c. ATC must not issue advisories that
6b. ATC advisories must not distract
or disrupt the crew from maintaining
degrade the continued safe flight of
the aircraft.
it at the wrong place.
that cause an aircraft to fall below
6a. ATC must not issue advisories outside
the safe performance envelope of the
aircraft.
6d. ATC must not provide advisories
the standard glidepath or intersect
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a132
a1a0a1a2 a4a247 a137 a126a1a3a255a32a126 a6a126a4a130 a1a4 a128a28a135a124a130a4a138a4a245 a246a15a128a15a247 a128
Requirements Validation
Requirements are source of most operational errors and almost
all the software contributions to accidents.
Much of software hazard analysis effort therefore should focus on
requirements.
Problem is dealing with complexity
1) Use blackbox models to separate external behavior from
complexity of internal design to accomplish the behavior.
2) Use abstraction and metamodels to handle large number
of discrete states required to describe software behavior.
Do not have continuous math to assist us
But new types of state machine modeling languages
drastically reduce number of states and transitions
modeler needs to describe.
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a251
a1a0a1a2 a4a247 a137 a126a1a3a255a32a126 a6a126a4a130 a1a4 a128a28a135a124a130a4a138a4a245 a246a15a128a15a247 a128
Requirements Analysis
Model Execution, Animation, and Visualization
Completeness
State Machine Hazard Analysis (backwards reachability)
Software Deviation Analysis
Human Error Analysis
Test Coverage Analysis and Test Case Generation
Automatic code generation?
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a199
a1a0a1a2 a4a247 a137 a126a1a3a255a32a126 a6a126a4a130 a1a4 a128a28a135a124a130a4a138a4a245 a246a15a128a15a247 a128
Requirements Completeness
Most software?related accidents involve software requirements
deficiencies.
Accidents often result from unhandled and unspecified cases.
We have defined a set of criteria to determine whether a
requirements specification is complete.
Derived from accidents and basic engineering principles.
Validated (at JPL) and used on industrial projects.
Completeness: Requirements are sufficient to distinguish
the desired behavior of the software from
that of any other undesired program that
might be designed.
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a200
a1a0a1a2 a4a247 a137 a126a1a3a255a32a126 a6a126a4a130 a1a4 a128a28a135a124a130a4a138a4a245 a246a15a128a15a247 a128
Requirements Completeness Criteria (2)
How were criteria derived?
Mapped the parts of a control loop to a state machine
I/O
I/O
Defined completeness for each part of state machine
States, inputs, outputs, transitions
Mathematical completeness
Added basic engineering principles (e.g., feedback)
Added what have learned from accidents
c a125a4a126a4a127a15a126a4a128a45a129a48a130a6a131a22a251a4a248
a1a0a1a2 a4a247 a137 a126a1a3a255a32a126 a6a126a4a130 a1a4 a128a112a135a124a130a4a138a4a245 a246a15a128a45a247 a128
Requirements Completeness Criteria (3)
About 60 criteria in all including human?computer interaction.
(won’t go through them all they are in the book)
Startup, shutdown
Mode transitions
Inputs and outputs
Value and timing
Load and capacity
Environment capacity
Failure states and transitions
Human?computer interface
Robustness
Data age
Latency
Feedback
Reversibility
Preemption
Path Robustness
Most integrated into SpecTRM?RL language design or simple
tools can check them.
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a251a4a250
a5a32a126a4a128a15a247 a6a4a130
Design for Safety
Software design must enforce safety constraints
Should be able to trace from requirements to code (vice versa)
Design should incorporate basic safety design principles
c a125a4a126a4a127a15a126a4a128a15a129a4a130a6a131a22a199a4a252
a5a32a126a4a128a15a247 a6a4a130
Safe Design Precedence
HAZARD ELIMINATION
Substitution
Simplification
Decoupling
Elimination of human errors
Reduction of hazardous materials or conditions
HAZARD REDUCTION
Redundancy
Safety Factors and Margins
Failure Minimization
Design for controllability
Barriers
Lockins, Lockouts, Interlocks
HAZARD CONTROL
Reducing exposure
Isolation and containment
Protection systems and fail?safe design
DAMAGE REDUCTION
Decreasing cost
Increasing effectiveness