Automated Test Case Generation | Array of Engineers, MI

Author: Michael Lingg, PhD, Array of Engineers

Abstract

Software testing is a vital part of software development to ensure correct

functionality, and necessary in safety critical systems. Various methods have been

developed to ensure that tests verify that the software correctly implements the

requirements, but increased coverage results in increased time to develop tests.

Automation can be used to provide significant improvements in test development

time, particularly when used to augment, rather than replace, human test developers.

This paper provides a format to structure software requirements, and an algorithm

to parse the requirements into a logical model of the software. This logical

model can be used to analyze the behavior of the requirements, or automatically

generate test procedures. We show that the test generation, and the information

the logical model provides to test developers, can significantly accelerate the time

required for test development.

Citation: M. Lingg, T. Kushnier, R. Proenza, H. Paul, ”Automation of Test Case Generation and Software System Modeling,” In Proceedings of the Ground Vehicle Systems Engineering and Technology Symposium (GVSETS), NDIA, Novi, MI, Aug. 16-18, 2022.

Introduction

The use of software is ever expanding, and continuously evolving. In the past, changing the behavior of equipment and vehicles would require physical servicing to replace physical components and hardware. The use of software to control behavior means mission parameters can be completely redefined in the field in seconds, making software invaluable in an ever changing environment. However the benefits of software are not without cost. Poor software quality has been estimated to cost the US around $2.84 trillion dollars [1] in 2018 alone, with losses due to software failures exceeding a third of this cost. Costs due to bugs can climb upwards of billions of dollars as evidenced in the Soviet Gas Pipeline Explosion in 1982, reach to the hundreds of billions in cost to repair the Y2K bug [2], or impact the safety and lives of people, as in the Boeing MCAS software failure that lead to the loss of two Boeing 737 MAX aircraft, and 346 lives [3].

In addition to the cost of failing to discover software bugs, the cost of sufficient testing to identify software bugs can be very high. Development of a single test case by hand can take as much as an hour or two for simple behavior, and can easily exceed ten or more hours for more complex behavior. In systems with thousands of requirements, manual test development often reaches into man years. Any use of automation can provide significant cost benefits, along with the consistency provided by automating the process. There exists a broad range of testing methods [4]. Tests can cover different levels or components of the system, require different levels of knowledge of the implementation, and be based on different types of testing stimuli. This paper focuses on Modified Condition/Decision Coverage (MC/DC) of requirements based tests as defined in DO-178B/C. DO-178B/C provides guidelines for developing safety critical software, primarily used in civilian aircraft, but is being increasingly considered for use in unmanned aircraft [5].

To help with understanding of this work, the relevant portion of DO-178B/C MC/DC requirements based tests will be discussed below. For a complete description, the DO-178B/C standards are published by the RTCA (Radio Technical Commission for Aeronautics) [6], or see NASA’s detailed tutorial on MC/DC [7]. MC/DC provides a high level of path coverage in software testing. In a system with a lower safety level, reduced path coverage, such as only verifying each output of each path, or only verifying critical path coverage, can be used. These methods of reduced path coverage result in test selection that is a subset of the rules of MC/DC. The choice of coverage method would be selected based on the chosen standard such as the DO-178B/C criticality level, AOP-52 [8] formal test coverage, or the Join Software Systems Safety Engineering Handbook (JSSSEH) [9] path coverage testing, and could be documented to satisfy the MIL-STD-882E [10] safety verification task.

Background

DO-178B/C testing has historically used the waterfall method of software development, where first requirements are written, then the software is implemented, and finally tests are developed. With this approach, test developers are expected to have no knowledge of the implementation, ensuring independence of implementation and testing. This means tests are expected to be developed only using the requirements, and a specification of inputs and outputs that are accessible by the test. Typically a test covers a single requirement, and may include multiple cases (sets of inputs and expected outputs), in order to fully cover the requirement. While the automated test case generation algorithm was developed in this environment, we will discus in the Methods section how the algorithm integrates well to

other testing environments. MC/DC coverage defines a method of testing that ensures a known coverage of conditions within the software, by showing the independent effect of each condition within a decision. This paper will focus on two main points of MC/DC coverage [11]:

Each non-constant condition in a Boolean expression is evaluated to both True and False.
Every non-constant condition in a Boolean expression is shown to independently affect the expression’s outcome.

Let us take a look at what this means. First start with a basic comparison condition which could be considered as part of a larger expression:

input == 10

Next two tests cases are created, one with input set equal to the constant (10), and another with input set equal to the constant + 1 (11). This produces the truth table shown in table 1:

The first test case would set the condition to true, while the second would set the condition to false, satisfying out MC/DC subset condition 1. MC/DC condition 2 is also satisfied as only input was changed between the test cases, showing it independently affects the output. Further rules could be added, such as testing the constant - 1 to verify the condition was not implemented as ≤, but this paper is focusing on the limited subset of rules defined above. Now consider the case of two variables being compared with ≥, rather than a direct comparison:

input1 ≥ input2

Declaring that there are no constraints on the inputs and they are both numeric, a test case is chosen of both input1 being greater than input2 and a second test case reversing the values so input1 is less than input2 to satisfy our MC/DC condition 1. However this does not satisfy condition number 2. So the second case is changed such that input2 is equal to its value in the first test case, and input1 is less than the value of input2, showing that input1 independently set the output to False. Then a third case is added where input1 is equal to its value in the first case, but input2 is greater than input1, showing input2 independently set the output to False. This produces the truth table in table 2:

The test above covers both equivalences classes that are possible in the requirement, and tests the boundary conditions of input1 greater than input2 and input1 less than input2. The test could also require as a rule that the boundary condition of input1== input2 be tested to fully cover the boundaries of ≥, and ensure the implementation was not >.

Last the logical Boolean conditions of AND and OR will be considered. First the following condition of two comparisons ANDed together:

input1 == True AND input2 == True

The condition can only be true if input1 is True AND input2 is True. This will be the first test case, satisfying the True output. Setting both inputs to False satisfies the False output for MC/DC condition 1, but does not satisfy the input independence requirement of MC/DC condition 2. So instead a second test case will be created with input1 False and input2 True, showing input1 independently sets the output to false. Finally a third test case sets input1 True and input2 False, showing input2 independently sets the output to false. This is summarized in the truth table in table 3:

Because this paper is focusing on MC/DC condition 1 of each possible output, and condition 2 of each input independently toggles the output, expanding our condition to any number of values ANDed together is fairly simple. There is still only one true output case, and each new input simply needs to show it can toggle the output to False. Any number of conditions ANDed is shown in the truth table in table 4:

Table 4: n conditions ANDed. Test case written as TC for space

OR conditions operate very similar to AND conditions. The difference is that an OR can only be False if all inputs are False, so the first AND case that outputs True in the examples above would be replaced with all False inputs and a False output. Then each subsequent case showing the input independently toggles the output by having each case set a subsequent input to True, while all other inputs are False, sets the output to True.

With more complex logic that includes combinations of ANDs and ORs (which could be simplified to combinations of NANDs) the MC/DC cases can be determined by applying these rules to every pair of conditions until a truth table is created for the entire test. One way to tell that the proper set of MC/DC tests has been created for N Boolean conditions is one base test, plus a test for each Boolean condition to show each input independently toggles the output, or N + 1 test steps [12]. This approach provides full equivalence class testing for the possible conditions. Other types of operators, such as timers, can be processed in a similar manner as logical operators, with each operator defining the rules that apply to it.

Next will be a brief discussion on how tests are developed in the present system we are working with. For each requirement MC/DC coverage of the requirement inputs is tested, but the requirement inputs may be system inputs or internal inputs, and the requirement output may be an internal output or system output. This method of testing ensures that all behavior can be exercised by system inputs, identifying any dead code by finding any outputs that cannot be tested by system inputs. This also means the test developer cannot simply set the desired inputs directly, but must trace through the requirements to find the system input combinations necessary to set the requirement inputs.

Having provided a background on the test methodology and test selection criteria this paper is being limited to, the paper will look at this method of generating tests and producing a logical model from requirements. First will be a discussion on how the requirements are formatted and parsed to determine the logical behavior the requirement is identifying. Next a method to analyze the behavior of multiple requirements in the system to produce a model of how the system operates will be presented. This system model will be used to identify possible conditions that produce system conditions being investigated, including finding if the system can get into undesired states. Finally a method of generating tests that can be run by an automated system, such as Jenkins, will be described.

Methods

Requirements Parsing

To parse requirements the automated test case generator (simply referred to from here as the algorithm) uses a layered approach. The algorithm’s core parser handles requirements that are still human readable, but have all formatting stripped out, and require specific ordering of key words, variables, and constants. This allows the core parser to produce truth tables from the condensed requirements using a very limited rule set. The next layer up removes unnecessary formatting, such as white space, to produce the core format. The top layer has rules for converting customer specific formats into the algorithm’s internal format, as well as rules for converting excessive grammar, common variants, and typos. This approach allows the algorithm’s core parser to remain simple and stable, while the algorithm can easily switch between different customer preferred formats, and addressing the normal human variations in writing.

Next is an example of a generic customer requirement, with a simplified data dictionary. The data dictionary exists to define the types of each variable, allowing the algorithm to determine the necessary set of values to test. For example, Boolean variables are fully tested just by being set to both True and False. Alternately, numerical values may have a minimum and maximum value that define the boundary of the equivalence classes, and may be tested for data type minimum and maximum for robustness.

Requirement:
(condition1) The software shall set
	OUTPUT to True and set OUTPUT_VALID
	to True if the following is True:
(
	(INPUT is equal to True) AND
	(INPUT_VALID is equal to True)
)

(condition2) otherwise, set OUTPUT to
	False and OUTPUT_VALID to
	True if the following logic
	evaluates to True:
(
	(INPUT is equal to False) AND
	(INPUT_VALID is equal to True)
)

(condition3) otherwise, set OUTPUT to
False and OUTPUT_VALID to False

Data Dictionary:
INPUT {"Type": "Boolean"}
INPUT_VALID {"Type": "Boolean"}
OUTPUT {"Type": "Boolean"}
OUTPUT_VALID {"Type": "Boolean"}
Boolean {"Range": "True, False"}

The red text in the requirement are labels added to identify each output set of the requirement, and will be referenced below.

After the requirement is simplified to the format for the core parser, the requirement is split into three sections, one for the first set of outputs and conditions (condition1), one for the first otherwise output (condition2) and conditions and one for the unconditional otherwise’s output (condition3) set. The following shows (condition1) split out, and colorized to show key parts that will be discussed next.

The software shall set OUTPUT to True
	and set OUTPUT_VALID to True
	if the following is True:
(
	(INPUT is equal to True) AND
	(INPUT_VALID is equal to True)
)

The parsing first splits the section identifying what to set output variables from the conditions of the requirement. The text, if the following is True, tells the parser that the setting statements all precede this text, the condition follows the text. Since the condition is if the following is True, the outputs are only to be set per the setting statement if the condition is True, otherwise the outputs are not defined here. The text, set OUTPUT to True tells the parser this is an output setting statement, where: set is a constant keyword, followed by an output variable, followed by a constant keyword of to, ending with an constant or a input variable, where each piece is separated by a space. The output variable, and the constant, or input variable, are stored as represented by table 4, where the first column is the output variable being set, the second column is the value to set the output variable to, and the third column is the truth table condition where this output rule will be applied:

Table 5: Setting Statements (condition1)

Next the condition, colorized in green, is parsed. Conditions are enclosed in parenthesis to very clearly ensure order of operation, even where repeating standard order of operations. The condition is parsed into a postfix equation, with variables replaced with a class representing the variable type, text operands replaced with enumerations to represent the operation, and string named constants replaced with the appropriate value. The postfix equation allows the algorithm to apply the MC/DC rules from the innermost conditions first, expanding outward as the condition is processed. The equation for this example would look like the following:

{INPUT, True, ==, INPUT_VALID, True, ==, AND}

The postfix equation is then processed by the algorithm as described in the background, section 2, producing the truth table evolution as follows:

Tables 6 and 7 show the truth tables for the first requirement conditions involving INPUT and INPUT VALID, respectively. As each table has a single Boolean input, which is being compared to a Boolean constant, only two cases are necessary. TC Results in these is the evaluation of the condition of truth table for each row.

The final operation is an AND of the two comparisons, so the algorithm merges the two table per the MC/DC AND rules:

Table 8: AND the comparisons (condition 1)

As described in the Background, only N + 1 or three cases are necessary to show each possible output, and that the two inputs independently change the output. While the algorithm has now merged the sub-conditions into the complete condition truth

table, it continues to store the sub-condition truth tables. Cases exist where the child truth tables are not sufficient to create the combined truth table. Such a case exists if two inputs are integers with range 1-3, and the conditions are input1 == 1 OR input1 == 2. The truth table for each comparison may only set input1 equal to 1 or 2 in their two cases to produce both True and False outputs, but neither value will set the combined truth table False when ORed together. In these cases the algorithm identifies the child cases needed by the combined truth table, and adds them to the child tables, before creating the combined truth table. If a reduced path coverage method, such as only verifying each output path, is chosen as the test selection criteria, the algorithm would only produce test case 1 and test case 2 or 3. The resulting two test cases would fully cover the two possible output conditions of the branch.

Finally the algorithm applies the setting statements to the truth table. In this case the outputs can only be set when the truth table evaluates to true, when this is not the case the output is not defined. The undefined output will need to be defined by the other conditions of the requirement, otherwise this may indicate a problem with the requirement. Optionally the customer may prefer to maintain the value of the output when no output is defined. Table

9 shows the setting statements in table 5 applied to the (condition1) truth table 8.

Table 9: Complete condition 1 (condition 1)

Next the algorithm processes the first otherwise statement and its condition (condition2). Initially the algorithm parses this independently of the previous condition, and ignores the last otherwise (condition3), to produce a similar truth table shown in table 10:

Table 10: Complete condition (condition 2)

Now the algorithm merges the two conditions into a single table. In each test case, an otherwise condition is only considered if the condition prior to it evaluates to false, essentially making the otherwise case a lower priority truth table. Then each row from the otherwise table is compared with rows in the prior condition(s) table. If all inputs match, and the row from the previous condition evaluated to false, the rows are merged, using the outputs that are not unknown. If inputs do not match, a new row is created for this condition. For the two truth tables 9 and 10 the algorithm would perform the following:

Row 1 of condition1 matches row 2 of condition2, condition1 outputs are known and condition1 has higher priority, so row 1 of condition1 is used for the combined table.
Row 2 of condition1 matches row 1 of condition2, condition2 outputs are known and condition1’s output condition is False, so row 1 of condition2 is used for the combined table.
Row 3 of both condition tables do not match any row from the other table, so row 3 from each table creates an entirely new row in the combined truth table.

The algorithm also adds a truth table output, the table now includes condition1 and condition2, so it can be shown if the first or second condition triggered the output being set in the combined table. The truth table in table 11 shows the two conditions combined, red shows entries pulled from condition1 and blue shows entries pulled from condition2. Note that the algorithm has to go back and compute condition1 for case 3 and condition2 for case 4 as they were not previously computed in the child cases.

Table 11: Two conditions combined (condition1 + 2)

This case sets both outputs to false, and has no conditions. Effectively the truth table for this case is a single true case. When merging with the previous combined truth table, condition3 is the lowest priority, so it can be applied to any case where condition1 and condition2 was false. In then final merged table, any case where a higher priority condition was true, the second otherwise will show as a False case. This produces the final truth table shown in table 12 where values set by the second otherwise case are shown in green:

Table 12: Complete MC/DC Table (condition1 + 2 +3)

We have described how the algorithm parses basic logic requirements to produce a truth table from the requirement. This same method can be easily extended to apply new types of programming logic or rules, such as mathematical comparisons or operators, along with other programming concepts. For each new rule we define the core requirement format, the key elements and how the truth table is formed from the elements. To expand this algorithm to new customers we add new rules as described, and add new layers to simplify the requirement down to the core. Each layer of parsing handles customer specific formats, and human variations, bringing the simplified requirement closer to the core format.

System Modeling

In addition to the logical text of the requirement, each requirement lists the input and output signals of the requirement. This allows the algorithm to form a tree of all system and internal inputs that impact a given requirement. Each requirement in the tree is parsed to produce a truth table.

Figure 1 shows how requirement truth tables are connected in a basic requirement dependency tree. In this example, in values are system inputs, int values are internal signals and out values are system outputs. The two tables on the left are child requirements that

generate the outputs int1 and int2, respectively. The table on the right generates the variable out from internal inputs int1 and int 2. To identify all system inputs to set a given output, the algorithm identifies all cases that set that output. Then traces the inputs back to the requirements that are setting those outputs to the desired values, and combines all combinations of these cases.

For example, if a user wants to see what input conditions can set out to True, the algorithm finds the necessary inputs are int1 = True and int2 = True. int1 is set True only if both in1 and in2 are True. int2 can be set True by either in3 or in4 set True. So out can be set True by in1 and in2 and in3 set True, or in1 and in2 and in4 set True. Alternatively for creating a test case, a user may want to know what system input combinations are needed to set the inputs to the requirement under test. Here the algorithm would simply trace back the inputs of each test case.

This method can be extended to any size requirement dependency tree, but can become very expensive as the number of paths grow. To reduce the amount of time it takes to locate system inputs, the algorithm will limit the number of test cases considered to the number necessary to find one or more complete set of system inputs.

This element in particular provides value in the waterfall, agile, or V-model development environments. By providing a logical model of how the requirements work, system validation can begin without even having an implementation. Further, we can generate tests procedures, as described in the next section, and begin reviewing the procedures while the implementation is in progress. While this primarily benefits agile or V-model development, all development models will benefit by the algorithm providing test developers with the system inputs that impact their requirement.

Test Procedure Generation

Generating the test procedure is a fairly straightforward process. The test procedure is based on the truth table for the requirement, and the set of system inputs needed for each case. The following snippet shows part of a first test case that would be generated for figure 1.

Set inputs:
int1 is set to True by setting:
in1 to True
in2 to True
int2 is set to True by setting:
in3 to True
in4 to False
Verify outputs:
out is set to True

One additional consideration with automated tools is the cost to ensure the tool works right every time, or to formally certify the tool. If complete automation with no human review is expected, tool qualification is required by DO-330, as well as by many company’s policies, to provide proof that the automated tool does not miss required test conditions, or provide false path conditions. This qualification is unnecessary when independent human review of the tests is provided, however human review takes longer than automation. Striking a balance between the lower short term cost of human review vs the cost to develop fully automated software, and the lower per run cost of automation vs manual test development, can be very beneficial.

Much of the time/cost in manual test development is in the creation of the basic framework of the test procedure. Further, a large portion of generating the truth table, and finding system inputs, can be easily automated, while a small portion is often corner cases that is much more complex to automate. This means the automated tool can generate most of the test, leaving a much smaller portion to be manually completed. While this does not perform all of the test generation, the time reduction provided by the tool

can provide significant benefits, at a reduced cost.

Results

During manual development of tests we found that developing one fairly basic test procedure required 120 minutes to develop the test procedure, and 30 minutes to debug initial failures in the procedure, for a total time of 150 minutes to complete the test. The automated test case generation tool required 30 minutes to generate the test procedure, including the test developers time to complete the test, and only 5 minutes to debug the

test, for a total time of 35 minutes to complete the test. This is a little over 4x improvement in overall time to complete the test procedure. Part of the time included the time for the test to be generated, during which the test developer can be doing other work, so the reduction in test developer time would be even better than reported here.

These performance numbers are based on a single very simple example as we do not have a comprehensive analysis of time to generate test procedures manually vs using this tool. However we expect the improvement in time to complete tests to improve from this. As requirements get more complicated, most time during manual test development is spent identifying the system inputs necessary to set the test inputs. The system modeling provided can generate sets of system inputs in a fraction the time it takes a test developer to identify all of the possible paths. Further, simply identifying the requirement paths back to system inputs can help test developers reduce the time necessary to develop tests, and help in identifying gaps that need to be tested in the requirements. Even if increasingly complex tests do not show a linear reduction in time to develop the test, a reduction of nearly 2 hours to develop each test would represent a reduction in a man year for each 1000 requirements. This is very significant in systems with thousands of requirements.

Conclusion

In this paper we have presented an algorithm for parsing structured human written requirements into a system model using truth tables. The system model is used to identify necessary system inputs for setting a given output or set of inputs. Finally the truth table and set of system inputs are used to generate a test procedure to test the requirement with MC/DC coverage.

We have shown that time to develop a simple test can be reduced by 75%, or two hours. Further we have discussed how this tool can reduce test developer time by automatically generating a test procedure skeleton that would need to be created manually, as well as identifying paths to system inputs. This framework for parsing requirements also works well as a template for writing non-ambiguous, testable, requirements that can be parsed with no

possibility of error.

This method of parsing requirements has worked very well for us so far. Parsing logic for each set of rules is kept very contained, and adding new rules results in minimal impacts to the system. Still we look at machine learning as the future of the automated test case parsing. The initial stage would focus on natural language processing of requirements. Machine learning would give us the flexibility to simplify any customer requirements down to the algorithm’s core format that can be easily parsed into truth tables, while still being able to provide us an indication of which requirements the algorithm is more confident that it parsed correctly. Eventually the generation of truth tables and test cases could be moved into machine learning, and possibly the machine learning algorithm could replace the truth tables themselves.

Acknowledgements

The authors would like to acknowledge the work of Hattie Pimentel and Dominic Smith in developing the original prototype of this system, as well as William VanSolkema and Jessica Ricksgers for testing the system.

References

[1] H. Krasner, “The cost of poor quality software in the us: A 2018 report,” Consortium for IT

Software Quality, Tech. Rep, vol. 10, 2018.

[2] “11 of the most costly software errors in history,” Raygun Blog, March 2022 [Online]. [Online]. Available: https://raygun.com/blog/costly-software-errors-history/

[3] P. Johnston and R. Harris, “The boeing 737 max saga: lessons for software organizations,”

Software Quality Professional, vol. 21, no. 3, pp. 4–12, 2019.

[4] “Types of software testing: Different testing types with details,” softwaretestinghelp.com,

March 2022 [Online]. [Online]. Available: https://www.softwaretestinghelp.com/types-of-software-testing/

[5] O. Media, “Military embedded systems.” [Online]. Available: https://militaryembedded.

com/avionics/safety-certification/unmanned-stringent-certification-process

[6] “RTCA products,” softwaretestinghelp.com, March 2022 [Online]. [Online]. Available:

https://www.rtca.org/products/

[7] K. J. Hayhurst, A practical tutorial on modified condition/decision coverage. DIANE

Publishing, 2001.

[8] “AOP-52,” hcrq.com, March 2022 [Online]. [Online]. Available: https://www.hcrq.com/uploads/2/6/9/6/26962125/aop-52_-_guidance_on_software

safety_design_and_assessment_of_munition-related_computing_systems.pdf

[9] “Joint software systems safety engineering handbook,” hcrq.com, March 2022

[Online]. [Online]. Available: https://www.acqnotes.com/Attachments/Joint-SWSystems-

Safety-Engineering-Handbook.pdf

[10] “MIL-STD-882E,” dau.edu, March 2022 [Online]. [Online]. Available: https://www.dau.edu/cop/armyesoh/DAU%

Michael Lingg is a principal research engineer at Array of Engineers, where he leads various research and development initiatives for the medical, aerospace, and space industries. Michael has a P.h.D. in High Performance Computing from Michigan State University and has over 15 years of experience in software design, development, and testing.

White Paper: Automation of Test Case Generation and Software System Modeling