Multifactor Algorithm for Test Case Selection and Ordering

: Regression testing being expensive, requires optimization notion. Typically, the optimization of test cases results in selecting a reduced set or subset of test cases or prioritizing the test cases to detect potential faults at an earlier phase. Many former studies revealed the heuristic-dependent mechanism to attain optimality while reducing or prioritizing test cases. Nevertheless, those studies were deprived of systematic procedures to manage tied test cases issue. Moreover, evolutionary algorithms such as the genetic process often help in depleting test cases, together with a concurrent decrease in computational runtime. However, when examining the fault detection capacity along with other parameters, is required, the method falls short. The current research is motivated by this concept and proposes a multifactor algorithm incorporated with genetic operators and powerful features. A factor-based prioritizer is introduced for proper handling of tied test cases that emerged while implementing re-ordering. Besides this, a Cost-based Fine Tuner (CFT) is embedded in the study to reveal the stable test cases for processing. The effectiveness of the outcome procured through the proposed minimization approach is anatomized and compared with a specific heuristic method (rule-based) and standard genetic methodology. Intra-validation for the result achieved from the reduction procedure is performed graphically. This study contrasts randomly generated sequences with procured re-ordered test sequence for over '10' benchmark codes for the proposed prioritization scheme. Experimental analysis divulged that the proposed system significantly managed to achieve a reduction of 35-40% in testing effort by identifying and executing stable and coverage efficacious test cases at an earlier phase.


Introduction:
Whensoever evolution has occurred in the field of software, testing is required. Software testing is predominantly the operation conducted by testers to identify the defects or gaps and verify whether or not the system under consideration correctly complies with the client's specifications. During the software modification phase, the development teams, testers, and the stakeholders are more concerned about the authenticity and reliability of new features being worked on, not about the existing features that have been extensively tested and stable. As the latest piece of code is supposed to be incorporated with the existing features, during this time, it is exceedingly possible that any functionality may have been broken in the existing code. To ensure that the final product performs well even after the latest improvements have been pushed, regression testing must be executed. This notion also results in the formation and execution of a sizeable number of test cases and makes regression testing economically expensive in terms of maintenance, exhausting the testing budget to an approximation of up to 80% (1).
The unarguable reality that there are always thousands of variations and potential explanations why anything might go wrong is synonymous with testing. Sometimes, testers with a vivid imagination are merely unable to spot each one of them, particularly if the launch's delivery date is getting closer. Also, there is never sufficient time and resources for all alternative test conditions to be found and tested. So it is necessitated to reduce or prioritize the test conditions to retain the testing process (2). Thence, this study's focal point is to emphasize the prime issue in software testing research, i.e., optimization of test cases ('Minimization' + 'Prioritization' strategies). A large body of research exists for Test Case Minimization, which generally executes fewer test cases depending upon some criteria. Fundamental delineation of the problem of selecting a reduced set of test cases could be (3,4): Definition 1: A test suite T, a series of testing requirements r 1 , r 2 ……r n that must be tested in order to have an appropriate testing coverage in accordance with the program and a list of subsets of T, one associated with each of the requirements (iterating from 1 to n) such that any one of the test cases T i belonging to the subsets of T can be used to test the requirement r j . Problem: Find a representative set of test cases T i that will satisfy all of the r j 's.
Various approaches have been hypothesized to minimize the test suites. For instance, Harris and Raju (5) expounded an idea for diminishing the test suite size by plying an uncomplicated approach that concentrates on test metrics (i.e., desideratum and size coverage) and accordingly proposed a CBTSR (Coverage Based Test Suite Reduction) algorithm. The principal contribution of their modus operandi embraced the construction of test cases and desiderata through data flow testing that aimed to inspect the physical framework of the program and to discover the sub-paths which were being traversed by variables. Lin et al. in (6) emphasized and empirically estimated Greedy-based strategies (i.e., cost-aware Greedy tactics and the auxiliary Greedy) using gzip space, siemens, and ant applications. The result of their estimation indicated the accomplishment of higher proficiency of fault detection and lesser cost for regression testing with cost-aware procedures.
With progression in production code, test suites can congregate redundancies overtime. Vahabzadeh et al. in (7) focused on fine-graining the test minimization procedure and thus proposed a model for the statement-level analysis of test cases. A technique accompanying a tool (named Testler) was presented to lessen substantial redundancies in test statements of test cases. Many empirical studies also articulated the degradation of FDE (Fault Detection Effectiveness) due to the reduction mechanism. Jeffrey and Gupta (8) focused on improving this fault detection capability by affixing a concept of selectively retaining those test cases that are fault revealing but reduced because of being redundant.
The work discussed in (9) offered a solution for  complications of regression test suite optimization,  named FCBTSO (Fault Coverage-based Test Suite  Optimization), which was formulated on HGS (Harrolds-Gupta-Soffa) test suite minimization strategy. Singh et al. in (10) propounded an algorithm to equilibrate the tradeoff between the time needed for test suite execution and their FDE.
A. Lawanna (11,12) described the design based technique for test cases, resulting in refinement of test case selection procedure and devised efficacious algorithms with the embedded concept of filtration, classification, and selection of germane test cases. Research regarding test case selection also deployed linear programming model to extract the subset of test cases, to rerun (13). The apprehension of these selection procedures was improvised in the study (14), where the weighted average sum of quantifiable aspects served as a base for the selection framework. Testing cost, code coverage, FDE of test suites, and code alter data were mentioned aspects of the study. Over the years, researchers also exercised NSGA-II (Non-dominated Sorting Genetic Algorithm II), a customary multi-objective approach, for reduction scheme. A variant of NSGA-II is presented in study (15) titled MORE+ (Multi-Objective test suite REduction).
Apart from reduction approaches, TCP (Test Case Prioritization) also proved to be efficacious, which optimally arrange the set of test cases for attaining certain criteria such as fault detection capability as expeditiously as possible. In one or the other way, this technique acquires two objectives, that is, re-ordering of test cases according to some criteria and detecting faults at the earlier stage, resultantly reducing testing time with much smaller overhead (16).
Along with the minimization of test cases, prioritization also encompassed a large body of research. Beena (19)(20)(21).
Many other researchers precisely worked on improving the prioritization of test cases by concentrating on real-world aspects, i.e., focused on practical priority features (22). The present study observed that the historical execution data is also significant as that data conveniently reveals how the test cases failed previously and to what extent the test cases are likely to fail later. Khalilian et al. in (23) deployed historical execution data for the computation of prioritization equations and modified them to possess dynamic coefficients. These enhanced mathematical equations were composed of execution history, test case priority, and historical FDE. Research presented in (24) ameliorated the history-based approach by applying it on each altered line of code, i.e., prioritizing the modified lines first and afterwards followed up with concerned test cases. Moreover, the data also depicted that some test cases have execution relations among them, i.e., the execution history of one test case predicts the other, therefore mining such execution relations among the test cases based on historical execution data would improve the optimization approach further (1,25). According to Tanzeem Bin Noor and Hadi Hemmati (26), practically, it is not necessitated that a failing test case would always be exactly identical to the test case being failed previously, viz. the failing test case could be a slightly altered version of the former failing test case to reveal a fault that is being undetected.
In the view of studies surveyed and the need for enhancement in optimization tactics, this research addresses some significant issues in the abovementioned existing conventional systems for optimizing the test cases as:  The existing traditional methods or the genetic process deployed for test case minimization exploits a single parameter that is unjustified, resulting in non-fulfilment of either the objective or the requirements to be procured during software testing. A Genetic algorithm (GA) is the widely used population-based approach inspired by the natural phenomenon of survival to the fittest. An extended version for applicability of Genetic algorithm in test case optimization could be understood from (27)(28)(29)(30)(31)(32). The current study aims to manage the issues that occur during test case analysis by incorporating GA as a base structure and prioritizing test cases by suggesting key variables and strategies for the notion of the tied test cases.
The rest of the paper's organization is as follows: Subsequent section deals with the technique proposed for test case optimization (Section 2). For a concise overview of how the methodology progresses, this section includes three subsections. Section 3 addresses the confirmation of the findings and the possible future enhancements for the proposed method. The final section (Section 4) of this paper presents the conclusion.

Proposed Method:
The regressive test procedure makes it almost impossible to perform all probable and preferred tests. That is why the significant challenge is to choose an adequate test for code. If this critical step is not accomplished, the code's important characteristic may not be covered by the testing process. Additionally, it is essential to prioritize the tests that are likely to emphasize issues and are paramount to code functioning. In order to deal with the complexities of testing procedures, this research suggested and implemented the techniques for optimizing test cases. The broad visual perspective is represented in Fig. 1.
The production of the proposed model requires multiple stages, which are described below.

Investigation and mining of test case linkage:
The conceptual framework of the proposed solution starts from here. In general, during regression testing, a background data depository is maintained, which is a storage for historical details of every test case. It includes the number of times a particular test case is being deployed, the faults disclosed by the test cases, and the severity of detected defects. In the course of progression and planning of the present work, it is observed that test cases' past performance provides insight for fail/pass verdicts of test cases. Any association was not considered before executing the test cases, but results manifested the link among the test cases after execution. The variables guided by the historical execution relationship of test cases were the current study's core concept.  The pruned rules ( Test case Dependency Score (TDS) (1) clarifies the number of specific test cases whose outcome would be determined by implementing T i using the stated failure and pass rules that are extracted from the historical data (Table 2). For example, it is only possible to use the execution of test case T 1 to determine the execution result of test case T 9 formulated on rules 1 and 2 of Table 2, and thus, the TDS value of T 1 will be '1'. Conversely, no test case can be determined based on the execution outcome of T 5 , so the TDS value of T 5 will be '0'.

Reduction mechanism based on GA notion:
The concept proceeds with scheming the initial population for every execution cycle (i.e., for every source code). The gene formation for every execution cycle will describe the statement coverage by the particular test case, i.e., the gene value '0' will depict no coverage by a processing test case for the statement. In contrast, '1' will illustrate that the test case covers a particular execution cycle's statement. Algorithm 2 (Fig. 3) describes how to structure the initial population in context to test coverage. The current work explained the detailed methodological account with a case study comprising of ten test cases and five execution cycles as: Initial Population: The 0-1 matrix will be formed depicting the statement coverage by the test cases respectively, where '0' and '1' are supposed to be a gene, while the complete coverage information by a test case for a particular cycle will be a chromosome (Fig. 4). Each information residing in the execution cycle is assigned some random weightage ranging in between [0-1]. The weightage factor discloses the criticality of the data and conditions that these statements are holding (Table 3).

Table 3. Showing the initial population (Statement coverage by test cases) with respective weightage values for execution cycle1
Weight for every statement (W j ) Statements (S j ) for execution cycle '1' Test cases where 'p' denotes the number of execution cycles; for the considered case study, the value of 'p' is 5, CV (S j ) is the statement coverage value by the test case whose fitness is being computed, i.e., either '1' or '0' by the test case, W j is the weight assigned to every specific statement, 'm' defines the total number of statement. The above-stated algorithms, i.e., algorithm 3 and 4 ( Fig. 5 and 6) elucidate the procedure as to how the reduction methodology works and how the genetic operators would aid in extracting the minimal amount of test cases for every execution cycle with at least 75% of initial termination criterion. The final termination requirements are set to 50% to extract the best test sequence from the original test suite for every execution cycle. Following is the elaborative working of genetic operators.
Selection: This study deploys a random selection scheme to initially select two test cases from the test suite and perform an Ex-OR operation between the randomly selected test cases. The consequent chromosome would be analyzed on coverage factor, i.e., the coverage percentile of the resultant chromosome would either be equal to or exceeds 75% (initial termination), to be in the reduced test sequence of the processing execution cycle. Within the modules, a knock-out based selection scheme is followed. Coverage for every test case would be enumerated as: where S covered = number of statements executed by the test case and S total = total number of statements in the processing execution cycle. Crossover: If the selected pair of test cases would not satisfy the initial termination criterion, then crossover is carried out. Those two test cases' genes are swapped from the position where both test cases are covering the same statement to the point where neither of the two test cases covers a statement (Fig.  7). This swapping is performed only once in between the genes of two test cases.

Figure 7. Crossover Operation
Ex-OR is performed betwixt the newly formed chromosomes, and coverage will be evaluated. If the coverage attained through the crossover mechanism gives unsatisfactory results, i.e., below 75%, then the outcome acquired through the crossover step is mutated.
Mutation: This refers to a slight change in the chromosome. The '0' bit is flipped to '1' according to the weightage assigned to the statements (single bit mutation).
Suppose the initial termination criterion is not satisfied by any of the three operators for the two test cases. In that case, from one of the two test cases, the test case with a low fitness value will be placed in a waiting queue. So, a waiting queue is maintained every time the pair of test cases would not satisfy the coverage criterion (Fig. 8). If the combination of randomly selected test cases fulfils the genetic loop coverage criteria, then those two test cases would be included in the processing cycle test sequence. The test sequences are further maintained in the final repository, and the selected test cases are excluded from the original test suite (T). Therefore for every execution cycle, a separate reduced test sequence is formed.
Whenever the inclusion of fitted pair of test cases in the reduced test sequence for a particular cycle occurred, the waiting queue is being inspected for the presence of any test case. If any test case exists, it will be extracted from there, and the procedure continues with the same extracted test case by pairing it with some other randomly selected test case from the original test suite. The last test case in the reduced test sequence will be fitness based only. If no test case satisfies the coverage criterion (initial termination), then the initial termination criterion should be dropped (relaxation). This genetic loop is repeated until the final termination criterion is met, i.e., 50%. The reduction methodology denouement will result in a catalogue embedded with individual test sequences for every specific EC, i.e., FR. Further, CFT scrutinizes FR to have overall one best test sequence among all the test sequences existing in FR, which would be applicable to every execution cycle, i.e., FMT. Every test case in their respective execution cycles are covering a number of statements and also has some random weightage value for those statements. CFT utilizes these particulars for the assessment of the cost factor of every test case residing test sequences of FR as: For example, {T 5 , T 2, T 6 , T 4 , T 1 } is the sequence for execution cycle '1' of FR. The cost of each test case residing in this sequence would be computed as:  Final termination criterion and highly valued test cases from final cost are being considered together for procuring FMT (Fig.9).

Figure 9. Outcome of the case study considered for reduction mechanism (FMT)
If the final cost of any test case collides with some other test case while selecting it in a finally minimized test suite, then the occurrence factor is considered for those two test cases. The occurrence of a specific test case would be computed by summation of the position of that test case in every execution cycle. If two test cases had the same final cost while selecting, then the test case with the highest occurrence value would be preferable.
The test cases in FMT would result in high coverage compared to the test cases in the test sequences residing in FR. Through the reduction methodology of this proposed approach, the test cases got lessened. Still, the lessened test cases' prioritization would re-order them into a sequence that would be more optimal in detecting more faults at an earlier phase.

Prioritization mechanism for lessened test cases:
The prioritization structure would incorporate some indispensable features and the test cases that got minimized due to the reduction mechanism. These aspects would be the coverage, cost, DU-pair, requirement covered by the test cases, including the historical data that would exploit the same features during the prioritization mechanism, i.e., eight features as a whole. Test cases coverage precisely relies on S covered being elucidated in equation (2)  cost computation of test cases for prioritization is devised as: where C(T i ) is effectively defined in equation (3), to be utilized in equation (4), for simplification in calculating decimals, equation (4) is multiplied by 10. DU-pair is the abbreviation of Definition-Use pair, a dataflow-dependent adequacy criterion, utilizing either predicate-use or computational-use of variable, in a manner that there would have had at least one definition clear path in between the definition and the use of the variable. For instance, let say variable 'a' has a DU-pair [2,6], which would elucidate that '2' in the pair is the statement number defining the variable 'a' and '6' in the pair is the statement number using the variable 'a' (the use could be either p-use or c-use).
Another aspect is the requirement data, being prepared by the desideratum coverage of every test case. For instance, the features could be depicted for a certain code in consideration as: The FMT test cases acquired five different positions; therefore, every test case in the FMT would have five positions to be re-ordered. Every position possesses some threshold value, which is being fixed at the time of re-ordering the test cases. These threshold values will help during the evaluation, as of which test case would be best suited for which particular possie. These test cases could be arranged in 'n' number of sequences. That is, there could be 'n' combinations regarding the test cases in FMT, but this methodology targets at achieving an optimal outcome that incorporates a highly preferable sequence of test cases with a high possibility of earlier fault detection.   65  25  60  25  60  26  51  45   85  33  75  34  81  49  75  70   90  34  81  35  82  50  75  71   91  35  82  35  83  51  80  72   92  37  90  35  90  59  85  73   Table 5 delineates the percentile depiction fixed for the five positions. Variation within the threshold values of diverse parameters (present in Table 5) is notable for every time the test case evaluation is done for every specific possie. Different positions would have a distinct or slight equal threshold for every aspect being considered for assessment. This variability in threshold percentile indicates that the significant information from previously ordered test cases is weighed to find out more optimal test cases at subsequent positions. Then, while reckoning a test case for succeeding positions, percentile varies from previous ones. The du-pair and requirement feature could be estimated through: (5) reveals DU-pair Matrix particulars, where d covered is the number of du-pair covered by that test case, and d total is the total number of du-pair. For practicable demonstration, d covered details are taken into account. In equation (6), R(T i ) denotes the test case requirement calculation, and r j is the requirement value attained by every statement of the cycle in execution. Algorithm 5 (Fig. 10) reveals the best-suited test case for position '1' while Fig. 11 depicts the eight features being plotted for position '1', with considered test sequence {T 1 , T 2 , T 4 , T 5 , T 8 }. The red-colourhighlighted plots elucidate that those test cases satisfy the threshold criteria, which is being set for the first position. Test case T 8 satiated all the aspects and extracted as common among all the test cases meeting the threshold of the features. After being prioritized at the very first, T 8 would then be used as the previous test case for all other T i 's left in 'n' sequences where all ['n' sequences -{T 8 }] would be served as input for the second iteration of the prioritization procedure. For this mechanism, this study took '5' sequences into consideration; therefore, every data of test case T 8 would be utilized for every feature in prioritizing the test cases being left in the sequences, at the specific position. The computational procedure for every aspect present in Table 5 would be updated for the test case being prioritized at the next position to T 8. The equations of the features to be utilized at succeeding positions would be: where T i is the test case in processing, (T i-1 ) is the previously selected test case; T i n(st) is the number of additional new statement covered by the processing test case concerning the coverage of the previously selected test case, and T i n(d) is the number of additional du-pair covered by the processing test case pertaining to the previously selected test case.
History features will utilize the same equations, i.e., 7-10, for evaluating the integrated features in it. The five sequences would assess all the features respectively, and the test case that is found to be frequent in all the sequences would acquire the succeeding position with reference to the former test case. Some amount of relaxation in the threshold will be expected if the commonality notion is not satiated. This whole procedure for prioritizing the test cases continues until a final optimal sequence of '5' test cases is procured.
The graphs for the sequence {T 1 , T 2 , T 4 , T 5 } are portrayed precisely in Fig. 12 that reveals T 5 as the test case to be prioritized at position '2'. Similarly, the three different sequences considered for the study, i.e. {T 2 , T 4 , T 5 , T 1 }, {T 4 , T 5 , T 1 , T 2 }, {T 5 , T 1 , T 2 , T 4 } would be plotted, the fourth sequence comes out to be identical to the first sequence and the common test cases from these sequences extracted. From those extracted test cases, from every sequence, the commonality notion is preferable to have a test case for that processing possie. After affixing test case T 5 at possie '2', test cases T 8 and T 5 would be considered as previous test cases for evaluating the aspects of left test cases (i.e., {T 1 , T 2 , T 4 }) in order to prioritize them at succeeding positions.
Evaluation of left test cases depicts a tie among them for the succeeding positions. Hence, the tied test cases are prioritized further by allocating priority or preference to each considered feature. According to the priority-levels, the individual value of every feature corresponding to the tied test cases is examined. The test case, which would have higher figures for the features, is extracted among the tied test cases and is re-ordered at the appropriate position.   Table 6 explains that T 4 is the test case consisting of at most five features to be high valued while two features with equal value compared with test cases T 1 and T 2 , consequently prioritizing T 4 at the third position. A similar strategy would be applicable for the two test cases, i.e., for T 1 and T 2 . The final optimal outcome, i.e., the prioritized sequence of test cases, would be: Figure 13.

Final outcome of the proposed methodology
The test cases acquiring at first in Fig. 13 are coverage effective, i.e., more faults would be revealed earlier, and the least coverage effectual test cases are at last in Fig. 13, deducing a notion that more fault detection would be procured at earlier phase only.

Results and Discussion:
Experimental evaluation of the proposed Test Case Minimization methodology: To validate the significance of the strategy proposed in this study for lessening of test cases, the proposed system was compared with some previously stated and existing algorithms and processes from the studies, i.e., rule-based methodology and conventional genetic algorithm. For experimentation analysis, few benchmark exemplar codes were taken and inspected for the parameters, such as the number of test cases reduced and statements covered by the respective algorithm. The altered version of GA (proposed algorithm) attained up to 80% (maximum percentile) statement coverage, with only 50 % of test cases (Maximum reduction achieved) during analysis (Table 7). The test case sequence attained after the reduction procedure (i.e., FMT) ( Fig. 9) was compared with the test sequences for the five execution cycles (i.e., with (FR)). According to the case study considered in this research, the embedded test sequences in FR and their graphical comparison are presented in Fig. 14. Three out of five cases have shown promising results regarding the proposed reduction methodology. The outcomes of the comparison between FMT and test sequences of FR comes out to be either higher or equal in three cases for FMT (in terms of cost) (Fig. 14).

Performance analysis of the proposed Test Case Prioritization scheme:
The prioritized test cases (Fig. 13) were collated with a randomly generated sequence of these '5' test cases, i.e. {T 2 , T 1 , T 5 , T 4 , T 8 }. This was done to scrutinize the accuracy of prioritized test cases at their specific position, following the threshold set in Table  5. The prioritized test sequence proved to be more scrupulous exact, i.e., highly accurate when collated with a random sequence. The graph below delineates the performance analysis of the two sequences (

Experimental evaluation of the proposed Test Case Prioritization scheme:
Effort calculated during this phase of the analysis showed that the test sequence that was achieved as an outcome (Fig. 13) required only 60% of exertion. In comparison, the test sequence that was generated by randomly shuffling the tied test cases with the remaining test cases requires 100% effort (Fig. 15). To understand this nature and verify the system's robustness, an analysis was performed with ten benchmark exemplar codes (Table 8). Finding the area of a parallelogram 60% 100% 75% The presented results and analysis suggest the significance of reduction and prioritization methodologies proposed in this study. Earlier studies demonstrated relevant findings regarding the test case optimization problem but lack an appropriate strategic solution for handling the tied test case problem together with optimizing them. Although this current study proposes an optimal solution for test case reduction and re-ordering problem, there are certain threats to validity. Earlier detection of fault revealing test cases reduces the cost and effort required for the testing process. For this, the current study suggests an algorithmic framework; however, with increased complexity, supervised machine learning models such as Support Vector Machine (SVM) should be practiced for full automation. These models with kindred algorithms will provide a strong mathematical base (separating hyperplane) to classify faulty and non-faulty test cases. Further, unsupervised machine learning tasks such as clustering will aid in forming the clusters of those test cases that will be going to collide on the same position while prioritizing and hence will provide a futuristic view of test case tie.
An increment in the line of codes will directly affect the length of the population generated; that is, it will increase drastically. Therefore more intelligent optimization methods such as Particle Swarm Optimization (PSO), Grey-Wolf Optimization must be exercised to reduce population or to handle independent paths. Moreover, this study was evaluated with a limited number of programs and can be examined with much more complicated codes to get better insights.

Conclusion:
This paper exhibits a novel strategic approach towards regression test case optimization with certain predetermined objectives. This study proposes an algorithmic rule (basic structure) for processing data obtained from the test case execution history at the earliest stage of the technique. Refinement of this data gives a visual illustration of the existence of relationships among different test cases. These test cases proceed at the reduction stage, where genetic operators are merged with factors that uncover fault and dependency ratios, further, on obtaining the reduced repository, the filtration stage aids in procuring the most favourable test cases.
Additionally, to effectuate this study's objective, the prioritization stage is initiated that discloses the optimal order for the filtered test cases. Analysis records clarified the effectiveness of the proposed system as compared to the randomly generated test sequence.
For future work, the multifactor algorithm propounded in this research can be collated with the concept of artificial intelligence for robustness and can be evaluated with many programs.