Zhao Minda(赵敏达)Baxter John
(Atomic Energy of Canada Limited,Ontario L5K 1B2,Canada)
CANDU(CANDU is a registered trade-mark of AECL)reactors have two independent shutdown systems,which trip or terminate the nuclear reaction if any of the designated measured process signals exceeds its specified trip setpoint.Computers are used in these two shutdown systems to improve overall plant reliability and simplicity.They are referred to as the“trip computers”.Dependability requirements for these essential computers are very high.
The trip computer software is classified as“safetycritical”and as such,a rigorous software safety lifecycle is followed during the development of the trip computer software.Appropriate techniques and measures are selected according to the safety integrity level(SIL)of the system during the software development processes.
The final phase of the software safety lifecycle is to perform the validation and reliability(V&R)testing on the already verified trip computer software,integrated with the actual trip computer hardware,before the trip computers are delivered to site.Verification and validation(V&V)are performed to ensure that the trip computer software has achieved its SIL capacity.
IEC 61508-4 defines verification as the confirmation by examination and provision of objective evidence that the requirements have been fulfilled.Within the context of the software safety lifecycle,verification is the activity of demonstrating that the deliverables from each phase in the software safety lifecycle meet the objectives and requirements set for that specific phase.Verification may be performed by analysis and test.Some examples of verification activities are:
①Reviews on output documentation;
②Design reviews;
③White-box unit tests on system components;
④White-box integration tests where different parts of a system are combined.
IEC 61508-4 defines validation as the confirmation by examination and provision of objective evidence that the particular requirements for a specific intended use are fulfilled.Within the context of the software safety lifecycle,validation is the activity of demonstrating that the safety-related system under consideration meets in all respects the safety requirements specification for that safety-related system.Software validation means confirming by examination and provision of objective evidence that the software satisfies the software safety requirements specification.Validation activities may include:
① Black-box tests of functional requirements,such as ensuring that a trip output opens when the instrument loop signal exceeds its setpoint;
② Black-box tests of performance requirements,such as ensuring that a trip output opens within the allowable time after the instrument loop signal exceeds its setpoint;
③Statistical random tests as an optional means to demonstrate the software does not perform unintended functions and to quantify the software reliability.
The development of trip computers used in CANDU reactors meets the requirements specified in the Standard for Software Engineering of Safety Critical Software[1].This standard was prepared by Ontario Hydro and AECL.It is in general consistent with the IEC standards.
The V&V technology by AECL includes design reviews by independent verifiers,as well as mathematical verification techniques and rigorous justifications to ensure that the specified software requirements have been transferred correctly to the software design without the introduction of any errors during implementation.
The V&V activities include:
①Requirements review;
②Design verification;
③Code verification;
④Hazard analysis;
⑤Testing.
Four phases of testing are required for the safetycritical software:
①Unit testing;
②Integration testing;
③Validation testing;
④Reliability testing.
CANDU reactors feature two independent shutdown systems,which operate using different physical mechanisms.The shutdown system number one(SDS1)drops vertically mounted shut-off rods into the reactor to terminate the chain reaction.The shutdown system number two(SDS2)injects liquid“poison”into the moderator through horizontally mounted nozzles.Each shutdown system is capable of shutting down the reactor alone.The use of different physical devices reduces the probability of common-mode failures.
SDS1 and SDS2 each have independent,triplicated channels.This provides a fail-safe design while minimizing the risk of spurious trips.Each shutdown system contains three sets of independent instrumentation for the process signal loops,three trip computers and three sets of trip output contacts.At least two of the three trip output contacts must open before the system will trip the reactor.The triplicated logic allows testing and maintenance on a single channel without inhibiting the trip function.It also allows one trip computer to default to a safe,tripped state in the event of its associated signal failure without shutting down the reactor.
Design separation as a means to avoid potential common-mode failures is applied further to the trip computer hardware and software development.SDS1 trip computer development follows the integrated approach(IA),in which function block diagrams define both the software requirements and the software design.The software code is generated automatically from the function block diagrams,and is executed in a hardware platform with its operating system.SDS2 trip computer development follows the rational design process(RDP),in which requirements are specified in functional tables and code is written and compiled for execution in a general-purpose computer without an operating system.
Design independence is built into the structure of the team organization and the associated roles of team members to ensure that the chance of making mistakes common to development,verification and validation is minimized and the V&V processes can be performed effectively and objectively.No one can be a software designer on both systems.Similarly,a software designer on one system cannot be a verifier on the other.
The organizational set-up to support the software engineering process consists of three separate groups:system functional designers,computer designers and quality surveyors,with each group reporting to separate management.The V&R team reports to the system functional design manager,not the software design manager.
A generic V&R test platform consisting of a test computer was developed to help functional designers to verify the software functionality and performance,and to demonstrate its reliability.
The V&R tests are performed using the V&R test computer,the inputs and outputs of which are connected to the outputs and inputs of the trip computer under test(i.e.,the target computer).The essential acceptance criterion is for the target computer’s outputs,as measured by the V&R test computer,to be consistent with the expected response to the target computer’s inputs,as provided by the V&R test computer.
The sequence of testing is that validation testing is performed prior to reliability testing,to ensure that the target computer has met the functional and performance requirements.Validation testing consists of sets of deterministic tests,with each test checking a particular set of functions.In the majority of the tests,the value of a single target computer analog input is moved above and below a setpoint,while all other inputs are held constant.
After validation testing has been successfully completed,reliability testing is performed to ensure that the target computer’s reliability requirement has been met.During reliability testing,multiple analog inputs to the target computer are varied simultaneously to simulate a postulated plant accident as it would appear at the inputs of the target computer.Random effects such as signal noise and equipment failure are included in the simulation.The target computer’s responses to ten thousand(randomly selected)postulated plant accidents are evaluated.
The test computer selected is a National Instruments PXI controller with data acquisition boards.It can send and receive analog and digital data,and measure opening and closing times of digital outputs for performance testing.
The test computer itself is qualified before it can be used for testing the trip computer software.A series of tests are performed on the test computer using traditional calibrated test equipment,such as voltmeters and oscilloscopes to ensure that the test computer hardware and software are functioning as required.Tests include:
①Input and output tests to confirm that the test computer correctly transmits the desired test signals to the trip computer and correctly reads the trip computer output values;
②Test language tests to ensure that the test interpreter correctly reads and carries out the tests;
③ Test oracle tests,in which a subset of the validation tests to be performed on the trip computer are also performed on the test oracle to ensure that it covers the trip computer functions as specified in the trip computer functional specification.
Validation testing confirms that the software performs all functional and performance requirements as specified.The basic requirements of individual tests are outlined as test cases in high level,English language test documentation.The validation cross-reference document is prepared to show that every target computer functional and performance requirement is covered by at least one test case.
The detailed test instructions from the test cases are coded in test scripts and carried out by the test interpreter program running on the V&R test computer.
The first step in validation testing is to group similar functional and performance requirements together so that the target computer’s responses can be systematically and efficiently tested.Requirements are grouped by functional type or performance type,rather than by associated process variable,and assigned to test cases.For example,a target computer might receive different analog signals for pressure,temperature and flow rate,but with similar requirements,such as a check for signal irrationality,determination of a trip setpoint,and the trip action.One test case might cover testing that the irrationality check requirement is met for all inputs.Another test case might cover testing that setpoints are determined as specified.A third test case might cover testing the trip action,and a fourth might cover trip timing requirements.Isolating and grouping testing of similar functions allows similar test actions to be used on different signals,thereby increasing the tester’s efficiency.
A validation cross-reference document is prepared as a compliance document to show that every functional and performance requirement is covered by at least one test case.More than one test case can be listed for a given function,if different aspects of a requirement must be tested.For example,test cases applicable to the trip function might include:
①A basic trip test of each signal;
②Simultaneous trip tests on multiple signals to verify independent operation;
③Measurement of trip times for single and simultaneous trips.
The test cases are implemented in ASCII files called test scripts,which contain the detailed instructions for the test interpreter software running on the test computer.The high level test documentation lists the test scripts used to implement each test case.
Test scripts allow validation testing to be performed in much less time than would be required for equivalent manual tests,and provide complete documentation of the test actions.Furthermore,the test scripts can be stored in a software library,as the target computer application software is,so that the tests can be modified and repeated if the application software is revised.
The test interpreter is a software program that resides on the test computer and transforms the commands in the test scripts into the required test actions.The test commands are written in the ATLIN(AECL test language interpreter)test command language.
ATLIN allows the tester to assign meaningful names to I/O points.The names generally indicate the field signal connected to the I/O points at site.The I/O values can then be altered through structured English statements.ATLIN’s major advantage over commercial programming languages is that a system designer whose computer experience may be limited to being a user of programs rather than a developer of programs,can easily develop and run detailed automated tests.The tester does not need to call device drivers to set or read I/O values,or build executable files before tests can be run.
ATLIN supports most features of commercial programming languages.Variables are available for test calculations.Structures are defined for conditional and repeated commands(i.e.,if-else,loop),and subroutine calls.User interface commands allow data entry,visual confirmation of results,and menu-based selection of tests.
During validation testing,the test computer checks the target computer outputs against expected outputs assigned by the tester through the test script.The COMPARE command performs the comparison for all defined I/O points.
The results of the COMPARE command are stored in an ASCII log file,in which the word“PASSED”and the line number in the test script are logged for a successful comparison(one in which all expected and actual outputs match).The word“FAILED”and the details of the mismatched output points are logged for an unsuccessful comparison.Off-line analysis of the test results is simple;if the log file indicates no failed comparison,then the test was completed successfully.If any comparisons did fail,then the tester could determine the cause by analysis of the I/O values logged at the moment of the failure.
The test scripts are written with main lines to define test I/O and data,and subroutines to perform test actions.This minimizes the number of lines of test script by allowing actions common to several tests to be written only once.It also improves the maintainability of the test scripts,as changes to the value of a setpoint can be incorporate easily.
For example,the following main line test script could be used to check that a digital output(DO_LVL)is opened if a process level analog input(AI_LVL)drops below a setpoint(SP)of 10 meters.The DEFINE statement shows the ease with which the tester can assign familiar field names to I/O points,or assign test labels to I/O points for use in generic test subroutines.The main line script contains:
DEFINE AI_TEST AI_LVL
DEFINE DO_TEST DO_LVL
LET SP=10
LET MARGIN=0.1
CALL SUBTEST.SUB
The subroutine“SUBTEST.SUB”contains:
SET AI_TEST=SP+MARGIN
SET DO_TEST=CLOSED;expected state COMPARE;
SET AI_TEST=SP-MARGIN
SET DO_TEST=OPEN;expected state COMPARE;
SET AI_TEST=SP+MARGIN
SET DO_TEST=CLOSED;expected state
COMPARE;
RETURN
The value of the setpoint is specified in only one location,which greatly simplifies future changes,while the generic subroutine can be called for any analog input and associated digital output.Additions to the functional requirements(such as a hysteresis region around the trip setpoint)can be accommodated by modifying a single subroutine.
The log file,in addition to the COMPARE results,automatically records the names of the test scripts,user data entry and the branching of conditional or repeated command structures.The tester can also direct text to the log file using the command“WRITE”followed by a text string.The values of variables and I/O points can be included in the text by putting the variable or I/O point name in square brackets.
In addition,specific results can be directed to a separate ASCII report file by using the command“REPORT”followed by a text string.A report file differs from a log file in that it contains only data that is specifically requested.This makes it very useful in generating tables of results,such as for power-dependent setpoints,where the power is stepped through its range and the setpoint value recorded at every step.The automatic generation of report files eliminates potential human error in reporting.
While validation testing confirms functional and performance requirements by applying specific input values to the target computer and verifying that the appropriate output is generated,it does not confirm the target computer’s response to the interrelated and continuously changing input values encountered during actual operation,nor does it confirm if any unintended function exists.Reliability testing addresses this concern by deonstrating that the target computer meets its functional requirements when subjected to randomly generated,simulated plant accidents.Validation testing seeks out software errors and failures,while reliability testing builds confidence in the software by showing that it will work when most needed.
The key to producing high integrity software is to apply a rigorous software safety lifecycle that includes formal specification,“Information-Hiding”design,mathematical verification and black-box validation.
There is no industrial-wide consensus on measures of software reliability as a function of SRT(statistical random testing).SRT is accepted as good practice to supplement a rigorous software safety lifecycle,but is not considered a conclusive proof of software quality or reliability.
SRT is identified as an alternative approach for determining software SIL capacity for pre-developed software that was not developed in accordance with the required software safety lifecycle.SRT is not required for software that was developed following a rigorous software safety lifecycle,but is acceptable as a supplementary means of increasing confidence that the software is consistent with the SIL of the system.
The target computer’s responses to a simulated plant accident are modeled as a Bernoulli trial-one of a series of independent test,each with two possible outcomes.If the target computer response is consistent with the functional requirements,then the trial is successful.
The software reliability is interpreted as the probability“q”that the target computer will respond as specified to a simulated plant accident.The relationship between reliability target“q”,the number of simulated accidents“n”,and the resulting level of confidence“α”target has been met is:
Eq.(1)leads to a minimum of approximately 3000 trials required to demonstrate that the system reliability requirement of 0.999 has been met with 95%confidence.Ten thousand trials on each target computer are typically performed.
The operating profile is the set of operating conditions to which the target computer must respond.The operating conditions are created as simulated accidents by the profile generator software,which contains a mathematical model of a CANDU reactor,same as the model used for safety analysis.The profile generator produces a series of files,each file containing a table of time-indexed signal values.These values form the test profile which is applied to the inputs of the target computer during the simulated accident.
The accident parameters(the initial plant conditions before the simulated accident,the type of accident to be simulated,the size and location of a pipe break,et cetera)are randomly generated,and other random effects(such as signal noise and transmitter failures)are also applied by the profile generator.
The same ATLIN test interpreter program is used for both V&R testing,but with different modes of operation.In validation mode,the test interpreter compares the actual target computer outputs with the expected outputs in the test script.In reliability mode,the test interpreter compares the actual target computer outputs with values predicted by a test oracle program.
The test oracle is a background program which contains an internal model of the target computer.The same input values are applied to both the test oracle and the target computer,except that the oracle’s inputs are read from an internal buffer as opposed to physical devices.The test oracle then duplicates the target computer logic,and returns predicted output values through another internal buffer back to the test interpreter.
The ATLIN test language command LOADRTCASE loads the specified test profile from the file into memory and applies the first row in the table of signal values to the target computer and test oracle inputs.The command RUNRTCASE applies the remaining rows in real time.For example,the minimal ATLIN test script for a profile#721 would therefore be:
LOADRTCASE 721
RUNRTCASE 721
COMPARE
The above script is adequate if all COMPARE commands are successful.However,some comparisons may fail because of noise on the analog signals passing between the test computer and the target computer.If the value of an analog input is exactly equal to its setpoint,random signal noise can move the value read by the target computer around the setpoint.The target computer’s digital output will then“chatter”between the open and closed state.The test oracle’s digital output will not chatter,because the analog signals from the test interpreter are passed to the test oracle through an internal buffer,and are not subject to noise.
Some analysis may be required to ensure that any discrepancies between the actual outputs and the predicted outputs are due to allowable levels of signal noise.This can be done off-line,manually or with software tools,or it can be done automatically on-line.
The rigorous software safety lifecycle,as the key to producing high integrity software,is applied to the development of the safety-critical software used in the CANDU shutdown system trip computers.Techniques and measures are selected as appropriate for the safety integrity level of the system.The development processes fully meet the requirements specified in the standard[1].
The final phase in the software safety lifecycle is to perform V&R testing by the functional design group.Validation testing systematically confirms the target computer’s functional and performan-ce requirements are met in all aspects.Reliability testing demonstrates that the software does not perform unintended functions and that the software can be relied upon to respond as required under accident conditions.
Verification and validation reduce the risk of software faults and improve the level of confidence that the software will respond as required to plant events.
[1]CE-1001-STD Standard for software engineering of safety critical software[S].Revision 2.1999.