White Paper: Automation of Test Case Generation and Software System Modeling

Author: Michael Lingg, PhD, Array of Engineers

Software model of an aircraft


Software testing is a vital part of software development to ensure correct

functionality, and necessary in safety critical systems. Various methods have been

developed to ensure that tests verify that the software correctly implements the

requirements, but increased coverage results in increased time to develop tests.

Automation can be used to provide significant improvements in test development

time, particularly when used to augment, rather than replace, human test developers.

This paper provides a format to structure software requirements, and an algorithm

to parse the requirements into a logical model of the software. This logical

model can be used to analyze the behavior of the requirements, or automatically

generate test procedures. We show that the test generation, and the information

the logical model provides to test developers, can significantly accelerate the time

required for test development.

Citation: M. Lingg, T. Kushnier, R. Proenza, H. Paul, ”Automation of Test Case Generation and Software System Modeling,” In Proceedings of the Ground Vehicle Systems Engineering and Technology Symposium (GVSETS), NDIA, Novi, MI, Aug. 16-18, 2022.


The use of software is ever expanding, and continuously evolving. In the past, changing the behavior of equipment and vehicles would require physical servicing to replace physical components and hardware. The use of software to control behavior means mission parameters can be completely redefined in the field in seconds, making software invaluable in an ever changing environment. However the benefits of software are not without cost. Poor software quality has been estimated to cost the US around $2.84 trillion dollars [1] in 2018 alone, with losses due to software failures exceeding a third of this cost. Costs due to bugs can climb upwards of billions of dollars as evidenced in the Soviet Gas Pipeline Explosion in 1982, reach to the hundreds of billions in cost to repair the Y2K bug [2], or impact the safety and lives of people, as in the Boeing MCAS software failure that lead to the loss of two Boeing 737 MAX aircraft, and 346 lives [3].

In addition to the cost of failing to discover software bugs, the cost of sufficient testing to identify software bugs can be very high. Development of a single test case by hand can take as much as an hour or two for simple behavior, and can easily exceed ten or more hours for more complex behavior. In systems with thousands of requirements, manual test development often reaches into man years. Any use of automation can provide significant cost benefits, along with the consistency provided by automating the process. There exists a broad range of testing methods [4]. Tests can cover different levels or components of the system, require different levels of knowledge of the implementation, and be based on different types of testing stimuli. This paper focuses on Modified Condition/Decision Coverage (MC/DC) of requirements based tests as defined in DO-178B/C. DO-178B/C provides guidelines for developing safety critical software, primarily used in civilian aircraft, but is being increasingly considered for use in unmanned aircraft [5].

To help with understanding of this work, the relevant portion of DO-178B/C MC/DC requirements based tests will be discussed below. For a complete description, the DO-178B/C standards are published by the RTCA (Radio Technical Commission for Aeronautics) [6], or see NASA’s detailed tutorial on MC/DC [7]. MC/DC provides a high level of path coverage in software testing. In a system with a lower safety level, reduced path coverage, such as only verifying each output of each path, or only verifying critical path coverage, can be used. These methods of reduced path coverage result in test selection that is a subset of the rules of MC/DC. The choice of coverage method would be selected based on the chosen standard such as the DO-178B/C criticality level, AOP-52 [8] formal test coverage, or the Join Software Systems Safety Engineering Handbook (JSSSEH) [9] path coverage testing, and could be documented to satisfy the MIL-STD-882E [10] safety verification task.


DO-178B/C testing has historically used the waterfall method of software development, where first requirements are written, then the software is implemented, and finally tests are developed. With this approach, test developers are expected to have no knowledge of the implementation, ensuring independence of implementation and testing. This means tests are expected to be developed only using the requirements, and a specification of inputs and outputs that are accessible by the test. Typically a test covers a single requirement, and may include multiple cases (sets of inputs and expected outputs), in order to fully cover the requirement. While the automated test case generation algorithm was developed in this environment, we will discus in the Methods section how the algorithm integrates well to

other testing environments. MC/DC coverage defines a method of testing that ensures a known coverage of conditions within the software, by showing the independent effect of each condition within a decision. This paper will focus on two main points of MC/DC coverage [11]:

  1. Each non-constant condition in a Boolean expression is evaluated to both True and False.

  2. Every non-constant condition in a Boolean expression is shown to independently affect the expression’s outcome.

Let us take a look at what this means. First start with a basic comparison condition which could be considered as part of a larger expression:

input == 10

Next two tests cases are created, one with input set equal to the constant (10), and another with input set equal to the constant + 1 (11). This produces the truth table shown in table 1:

Table 1: Compare with constant

The first test case would set the condition to true, while the second would set the condition to false, satisfying out MC/DC subset condition 1. MC/DC condition 2 is also satisfied as only input was changed between the test cases, showing it independently affects the output. Further rules could be added, such as testing the constant - 1 to verify the condition was not implemented as ≤, but this paper is focusing on the limited subset of rules defined above. Now consider the case of two variables being compared with ≥, rather than a direct comparison:

input1 ≥ input2

Declaring that there are no constraints on the inputs and they are both numeric, a test case is chosen of both input1 being greater than input2 and a second test case reversing the values so input1 is less than input2 to satisfy our MC/DC condition 1. However this does not satisfy condition number 2. So the second case is changed such that input2 is equal to its value in the first test case, and input1 is less than the value of input2, showing that input1 independently set the output to False. Then a third case is added where input1 is equal to its value in the first case, but input2 is greater than input1, showing input2 independently set the output to False. This produces the truth table in table 2:

Table 2: Compare two variables

The test above covers both equivalences classes that are possible in the requirement, and tests the boundary conditions of input1 greater than input2 and input1 less than input2. The test could also require as a rule that the boundary condition of input1== input2 be tested to fully cover the boundaries of ≥, and ensure the implementation was not >.

Last the logical Boolean conditions of AND and OR will be considered. First the following condition of two comparisons ANDed together:

input1 == True AND input2 == True 

The condition can only be true if input1 is True AND input2 is True. This will be the first test case, satisfying the True output. Setting both inputs to False satisfies the False output for MC/DC condition 1, but does not satisfy the input independence requirement of MC/DC condition 2. So instead a second test case will be created with input1 False and input2 True, showing input1 independently sets the output to false. Finally a third test case sets input1 True and input2 False, showing input2 independently sets the output to false. This is summarized in the truth table in table 3:

Table 3: Two conditions ANDed

Because this paper is focusing on MC/DC condition 1 of each possible output, and condition 2 of each input independently toggles the output, expanding our condition to any number of values ANDed together is fairly simple. There is still only one true output case, and each new input simply needs to show it can toggle the output to False. Any number of conditions ANDed is shown in the truth table in table 4:

Table 4: n conditions ANDed. Test case written as TC for space

OR conditions operate very similar to AND conditions. The difference is that an OR can only be False if all inputs are False, so the first AND case that outputs True in the examples above would be replaced with all False inputs and a False output. Then each subsequent case showing the input independently toggles the output by having each case set a subsequent input to True, while all other inputs are False, sets the output to True.

With more complex logic that includes combinations of ANDs and ORs (which could be simplified to combinations of NANDs) the MC/DC cases can be determined by applying these rules to every pair of conditions until a truth table is created for the entire test. One way to tell that the proper set of MC/DC tests has been created for N Boolean conditions is one base test, plus a test for each Boolean condition to show each input independently toggles the output, or N + 1 test steps [12]. This approach provides full equivalence class testing for the possible conditions. Other types of operators, such as timers, can be processed in a similar manner as logical operators, with each operator defining the rules that apply to it.

Next will be a brief discussion on how tests are developed in the present system we are working with. For each requirement MC/DC coverage of the requirement inputs is tested, but the requirement inputs may be system inputs or internal inputs, and the requirement output may be an internal output or system output. This method of testing ensures that all behavior can be exercised by system inputs, identifying any dead code by finding any outputs that cannot be tested by system inputs. This also means the test developer cannot simply set the desired inputs directly, but must trace through the requirements to find the system input combinations necessary to set the requirement inputs.

Having provided a background on the test methodology and test selection criteria this paper is being limited to, the paper will look at this method of generating tests and producing a logical model from requirements. First will be a discussion on how the requirements are formatted and parsed to determine the logical behavior the requirement is identifying. Next a method to analyze the behavior of multiple requirements in the system to produce a model of how the system operates will be presented. This system model will be used to identify possible conditions that produce system conditions being investigated, including finding if the system can get into undesired states. Finally a method of generating tests that can be run by an automated system, such as Jenkins, will be described.


Requirements Parsing

To parse requirements the automated test case generator (simply referred to from here as the algorithm) uses a layered approach. The algorithm’s core parser handles requirements that are still human readable, but have all formatting stripped out, and require specific ordering of key words, variables, and constants. This allows the core parser to produce truth tables from the condensed requirements using a very limited rule set. The next layer up removes unnecessary formatting, such as white space, to produce the core format. The top layer has rules for converting customer specific formats into the algorithm’s internal format, as well as rules for converting excessive grammar, common variants, and typos. This approach allows the algorithm’s core parser to remain simple and stable, while the algorithm can easily switch between different customer preferred formats, and addressing the normal human variations in writing.

Next is an example of a generic customer requirement, with a simplified data dictionary. The data dictionary exists to define the types of each variable, allowing the algorithm to determine the necessary set of values to test. For example, Boolean variables are fully tested just by being set to both True and False. Alternately, numerical values may have a minimum and maximum value that define the boundary of the equivalence classes, and may be tested for data type minimum and maximum for robustness.

(condition1) The software shall set
	OUTPUT to True and set OUTPUT_VALID
	to True if the following is True:
	(INPUT is equal to True) AND
	(INPUT_VALID is equal to True)

(condition2) otherwise, set OUTPUT to
	False and OUTPUT_VALID to
	True if the following logic
	evaluates to True:
	(INPUT is equal to False) AND
	(INPUT_VALID is equal to True)

(condition3) otherwise, set OUTPUT to
False and OUTPUT_VALID to False

Data Dictionary:
INPUT {"Type": "Boolean"}
INPUT_VALID {"Type": "Boolean"}
OUTPUT {"Type": "Boolean"}
OUTPUT_VALID {"Type": "Boolean"}
Boolean {"Range": "True, False"}

The red text in the requirement are labels added to identify each output set of the requirement, and will be referenced below.

After the requirement is simplified to the format for the core parser, the requirement is split into three sections, one for the first set of outputs and conditions (condition1), one for the first otherwise output (condition2) and conditions and one for the unconditional otherwise’s output (condition3) set. The following shows (condition1) split out, and colorized to show key parts that will be discussed next.

The software shall set OUTPUT to True
	and set OUTPUT_VALID to True
	if the following is True:
	(INPUT is equal to True) AND
	(INPUT_VALID is equal to True)

The parsing first splits the section identifying what to set output variables from the conditions of the requirement. The text, if the following is True, tells the parser that the setting statements all precede this text, the condition follows the text. Since the condition is if the following is True, the outputs are only to be set per the setting statement if the condition is True, otherwise the outputs are not defined here. The text, set OUTPUT to True tells the parser this is an output setting statement, where: set is a constant keyword, followed by an output variable, followed by a constant keyword of to, ending with an constant or a input variable, where each piece is separated by a space. The output variable, and the constant, or input variable, are stored as represented by table 4, where the first column is the output variable being set, the second column is the value to set the output variable to, and the third column is the truth table condition where this output rule will be applied:

Table 5: Setting Statements (condition1)

Next the condition, colorized in green, is parsed. Conditions are enclosed in parenthesis to very clearly ensure order of operation, even where repeating standard order of operations. The condition is parsed into a postfix equation, with variables replaced with a class representing the variable type, text operands replaced with enumerations to represent the operation, and string named constants replaced with the appropriate value. The postfix equation allows the algorithm to apply the MC/DC rules from the innermost conditions first, expanding outward as the condition is processed. The equation for this example would look like the following:

{INPUT, True, ==, INPUT_VALID, True, ==, AND}

The postfix equation is then processed by the algorithm as described in the background, section 2, producing the truth table evolution as follows:

Table 6: Comparison 1 (condition 1)

Table 7: Comparison 2 (condition 1)

Tables 6 and 7 show the truth tables for the first requirement conditions involving INPUT and INPUT VALID, respectively. As each table has a single Boolean input, which is being compared to a Boolean constant, only two cases are necessary. TC Results in these is the evaluation of the condition of truth table for each row.

The final operation is an AND of the two comparisons, so the algorithm merges the two table per the MC/DC AND rules:

Table 8: AND the comparisons (condition 1)

As described in the Background, only N + 1 or three cases are necessary to show each possible output, and that the two inputs independently change the output. While the algorithm has now merged the sub-conditions into the complete condition truth

table, it continues to store the sub-condition truth tables. Cases exist where the child truth tables are not sufficient to create the combined truth table. Such a case exists if two inputs are integers with range 1-3, and the conditions are input1 == 1 OR input1 == 2. The truth table for each comparison may only set input1 equal to 1 or 2 in their two cases to produce both True and False outputs, but neither value will set the combined truth table False when ORed together. In these cases the algorithm identifies the child cases needed by the combined truth table, and adds them to the child tables, before creating the combined truth table. If a reduced path coverage method, such as only verifying each output path, is chosen as the test selection criteria, the algorithm would only produce test case 1 and test case 2 or 3. The resulting two test cases would fully cover the two possible output conditions of the branch.

Finally the algorithm applies the setting statements to the truth table. In this case the outputs can only be set when the truth table evaluates to true, when this is not the case the output is not defined. The undefined output will need to be defined by the other conditions of the requirement, otherwise this may indicate a problem with the requirement. Optionally the customer may prefer to maintain the value of the output when no output is defined. Table

9 shows the setting statements in table 5 applied to the (condition1) truth table 8.

Table 9: Complete condition 1 (condition 1)

Next the algorithm processes the first otherwise statement and its condition (condition2). Initially the algorithm parses this independently of the previous condition, and ignores the last otherwise (condition3), to produce a similar truth table shown in table 10:

Table 10: Complete condition (condition 2)

Now the algorithm merges the two conditions into a single table. In each test case, an otherwise condition is only considered if the condition prior to it evaluates to false, essentially making the otherwise case a lower priority truth table. Then each row from the otherwise table is compared with rows in the prior condition(s) table. If all inputs match, and the row from the previous condition evaluated to false, the rows are merged, using the outputs that are not unknown. If inputs do not match, a new row is created for this condition. For the two truth tables 9 and 10 the algorithm would perform the following:

  • Row 1 of condition1 matches row 2 of condition2, condition1 outputs are known and condition1 has higher priority, so row 1 of condition1 is used for the combined table.

  • Row 2 of condition1 matches row 1 of condition2, condition2 outputs are known and condition1’s output condition is False, so row 1 of condition2 is used for the combined table.

  • Row 3 of both condition tables do not match any row from the other table, so row 3 from each table creates an entirely new row in the combined truth table.

The algorithm also adds a truth table output, the table now includes condition1 and condition2, so it can be shown if the first or second condition triggered the output being set in the combined table. The truth table in table 11 shows the two conditions combined, red shows entries pulled from condition1 and blue shows entries pulled from condition2. Note that the algorithm has to go back and compute condition1 for case 3 and condition2 for case 4 as they were not previously computed in the child cases.

Table 11: Two conditions combined (condition1 + 2)

This case sets both outputs to false, and has no conditions. Effectively the truth table for this case is a single true case. When merging with the previous combined truth table, condition3 is the lowest priority, so it can be applied to any case where condition1 and condition2 was false. In then final merged table, any case where a higher priority condition was true, the second otherwise will show as a False case. This produces the final truth table shown in table 12 where values set by the second otherwise case are shown in green: