A Heuristic Approach to the Consecutive Ones Submatrix Problem

: Given a (0, 1) − matrix, the Consecutive Ones Submatrix (C1S) problem which aims to find the permutation of columns that maximizes the number of columns having together only one block of consecutive ones in each row is considered here. A heuristic approach will be suggested to solve the problem. Also, the Consecutive Blocks Minimization (CBM) problem which is related to the consecutive ones submatrix will be considered. The new procedure is proposed to improve the column insertion approach. Then real world and random matrices from the set covering problem will be evaluated and computational results will be highlighted.


Introduction:
The C1S problem on a (0, 1)-matrix is a generalization of the Consecutive Ones Property (C1P). The later has been proposed many decades ago. Fulkerson and Gross 1 suggested it as follows. Given an incidence matrix , is it possible to rearrange the columns so that all the 1's in each row are together?
In combinatorial optimization, the C1P property is important since it indicates that the problem utilizes a matrix with this property simpler to solve than the original model. Indeed, such a matrix is totally unimodular. It appears in plenty of applications including computational biology, railway optimization, file organization, and scheduling. The C1P property is also used for ancestral genome reconstruction 2, 3 .
In graph theory, it helps detecting interval and circle graphs 4,5, 6 .
C1P has been extensively investigated. Kendall 7,8 indicated that the first study of the property was introduced by an archaeologist "Flinders Petrie" in the 19th century. Some heuristic approaches were established for this problem before the first polynomial complexity solution that was proposed by Fulkerson and Gross 1 . Tucker 9 showed a substructure characterization of the problem. He used a graph theoretic method to characterize matrices by using forbidden consecutive ones submatrices. In 1976, Booth and Lueker 10 provided a linear-time algorithm for it. They found a permutation that transforms a (0, 1) −matrix into one with C1P. Their linear-time sequential algorithm is based on the PQ-tree data structure. Binary matrices can have the property if and only if their PQ-tree exists.
Let ( , ) −matrices be the (0, 1) −matrices that are having at most 1′ and 1′ in per column and per row, respectively. For the problem with C1S, Hajiaghayi and Ganjali 11 solved it for the (2, 2) −matrices in polynomial time and found that the problem for (2, 4) −matrices is NP-Hard. These results post the issue of whether the C1S problem is NP-complete for the (2, 3) and (3, 2) matrices. Tan and Zhang 12 answered the question and showed that the two decision versions are NP-complete. They proved the problem is 0.8 −approximable for (2, 3) −matrices that have no two similar columns. In addition, they illustrated that it can be 0.5 −approximable for (3,2) and (2, ∞) matrices. Finally, they showed that the problem's approximation for matrices of type (∞, 2) is NP-Hard within a factor of when > 0.
Preliminaries Definition 1. Given a (0, 1) −matrix , a set of consecutive 1 elements (0 elements) in a row of is known as a block of ones (block of zeros) respectively. Definition 2. A (0, 1) −matrix is said to have the consecutive ones property if the columns are permuted such that a consecutive block of 1's occurs per row 13 . The C1P is satisfied for the columns by matrix transposition. Definition 3. Let be a (0, 1) −matrix, finding largest set of columns in that construct a submatrix has the C1P is called the C1S problem 12 . Definition 4. Given a (0, 1) −matrix , a columns' permutation of which leaves the 1 entries consecutive in all the rows is called a valid permutation. If A is rearranged by this permutation then it contains the consecutive ones property.
The C1P is demanded as it often provides effective algorithms. Large attention for modifying and transforming a (0, 1)-matrix into a matrix satisfying the C1P has been presented recently. These transformations can be delivered as the following problems 14 . 1. The problem of finding a maximal number of columns in a (0, 1)-matrix which induces a submatrix contains C1P is called the Max-C1S-C (Consecutive Ones Submatrix by Column).

2.
The problem of finding a maximal number of rows in a (1, 0)-matrix which induces a submatrix contains C1P is called the Max-C1S-R (Consecutive Ones Submatrix by Row). 3. The problem of deleting the minimal number of columns which produces a matrix contains C1P is called the Min-C1S-C (Consecutive Ones Submatrix by Deleting Columns). 4. The problem of deleting the minimal number of rows that produces a matrix contains C1P is called the Min-C1S-R (Consecutive Ones Submatrix by Deleting Rows). 5. The problem of finding the minimal set of 1's in a matrix that can be changed into 0 results in a matrix with C1P is called the Min-C1-1E (Consecutive Ones by Flipping 1-Entries).
The third and fourth problems are equivalent to the first two. Generally, all the cases are also NPhard for quite sparse matrices. Traditional methods rely on finding the Tucker forbidden submatrices 9, 14 . This paper introduces an evolutionary method to solve the C1S problem.
A related problem is the so-called Consecutive Block Minimization (CBM). The goal is to reduce the blocks' number of ones per row by reordering the matrix's columns 15 . Haddadi et al. 13 , proposed a polynomial time heuristic to solving the problem. Leonardo CR and others 16 proposed the most recent heuristic work on the problem of CBM. They introduced a heuristic relied on a traditional algorithm in graph theory. They designed a graphical representation to address the CBM problem and detect the reduction of the CBM in the traveling salesman problem. Abo Alsabeh 17 suggested a metaheuristic approach for the C1S and CBM problems.
Another related problem is the Simultaneous Consecutive Ones Property (SC1P) where a (0, 1)matrix contains the C1P for rows and columns simultaneously. Subashini et al. 18,19 , suggested the classical complexity and fixed parameter tractability of Simultaneous Consecutive Ones Submatrix (SC1S) and Simultaneous Consecutive Ones Editing (SC1E). They proved that the decision cases for the two problems are NP-complete.
Heuristic methods have been applied to enormous problems such as 20, 21 . The paper is arranged as follow. Studying some previous solution methods in Section 2. Illustrating the suggested column insertion algorithm to handle the problem in Section 3. Computational experience in Section 4. Conclusion in Section 5.

Previous Solution Procedures 13 Polynomial Time Local Improvement Heuristic for CBM
The decision problem of the CBM is NPcomplete when restricted to (0, 1) −matrices including two ones per row. Nevertheless, Haddadi provided a polynomial time algorithm that finds a permutation where the number of consecutive blocks and the optimum do not vary further than 50%. Haddadi et al. 13 provided a polynomial time local search algorithm for the problem. For a binary × −matrix , they proposed two ( 2 ) −sized local neighbourhoods search such that the blocks' number of a neighbour is found in ( ) time.  Improvement by two columns interchange: Let be a matrix associated with a permutation in which the entire number of blocks is . Suggest the ( 2 ) −sized neighborhood ( ) to be all the permutations that are produced from by swapping two columns. Exploring the ( ) for a permutation to give less blocks; if such permutation is not found, the algorithm stops. If an improvement is reached by interchanging two columns ( ) and  Improvement by column-shifting: Let ′ ( ) be the permutation set that results from by inserting one column. In this method, the column is moved from its location and placed between two other columns. ′ ( ) is searched for a permutation to produce fewer blocks and the search stops when no such permutation exists. Consider that an improvement is reached by inserting a column ( ), and let the new permutation be ′ ( ). If all the required updates are carried out, the algorithm is repeated for ′ ( ). The two procedures are shown below in Algorithms 1and 2.

New Solution Approaches An Alternative Column Insertion Procedure
A columns' permutation of a matrix that improves the size of the submatrix with C1S is wanted here. So a local improvement heuristic is suggested, which is polynomial in time. Suppose a × −matrix with two submatrices, × −submatrix 1 (it will be called the remainder submatrix) and × ( − ) −submatrix 2 which has the C1P. The columns of 1 will be inserted one by one between the columns of 2 . Consider investigating the matrix associated with a permutation where the entire number of C1S columns is . Suppose the neighbourhood ′′ ( ) to be all the permutations that are produced from by moving one column, where the column is moved from its location in 1 and placed between two columns in 2 . ′′ ( ) is explored seeking a permutation to increase the number of columns with C1S property. This procedure ends if there is no such permutation. The process starts by choosing the first left column from 1 , say ( ) for 1 ≤ ≤ . This column is inserted between two adjacent columns of 2 say ( ), and ( + 1) for = 1, … , − and checked. If the column ( ) improves the C1S columns in 2 , update 2 and choose the second column ( + 1) to insert in the new matrix. If ( ) fails to provide an improvement then it is returned to the right-hand side of 1 and ( + 1) is chosen for insertion.
If number of columns of 2 is greater than 2, then another destructive case appears. That is when the sequence of columns ( ) ( ) ( + 1) is 0 1 0 and there is another block of ones in the same row (say 1 0 1 0), this row will lose the C1P property. Table 1 shows the checking of the different cases. The destructive cases are referred to as false and other cases that do not destroy 2 as true. Column  If an improvement is achieved by inserting a column ( ) in 2 , the new permutation is called . The matrix will be updated to ′ and the process repeated with ′ . The number of blocks which result from inserting an arbitrary column between two other columns can be evaluated in ( ) operations, see 13 . The size of searching ′′ ( ) costs time ( ), where is clarified below. That is because there is one possibility to insert ( ) between ( ) and ( + 1). If it fits, then ( + 1) needs two possibilities, and so on. It can say that the size of possible columns for insertion is ≤ ( − 2)( − 1)/2. The integer number starts with the − columns of 2 . For every neighbor, the calculations of checking the cases for all the rows cost ( ). As long as the number of C1S columns cannot be greater than , and no less than − columns, the finiteness of the inserting procedure is guaranteed and the number of improvements is at most − ( − ) = in the worst case. ∎ Algorithm 3: Column insertion procedure 1 Input Positive integers m, n, a total number of blocks , binary × −submatrix 1 , × ( − ) −submatrix 2 , with the C1P, and permutation ; 2 for = 1, … , 3 for = 1, … , ( − ) − 1 4 Insert column ( ) between ( ) and ( + 1); 5 Check ( ) ( ) ( + 1); 6 if the result of the check is true for all 7 ← + ; 8 Move ( ) between ( ) and ( + 1) and update 2 ; 9 end if 10 end for 11 end for 12 Return Permuted matrix.

Improving the number of blocks
The procedure of column insertion that is used to search the neighbourhood ′′ ( ) for a permutation to maximize the C1S submatrix also minimize the number of blocks. Mentioning analogous arguments as in Lemma 1, the complexity of inserting the columns to find the minimum number of blocks is ( ( − )), where is the number of 1's in . Also, since the number of blocks cannot be greater than , and no less than , the finiteness of the insertion method is ensured and the number of improvements is at most − in the worst case. This algorithm can be implemented directly on the matrix by extracting the C1S matrix then fill the reminder columns.

Computational Experience Implementing Column Insertion Algorithm
The matrix is separated into two submatrices, one having C1S (columns ≥ 2), then the column insertion algorithm is applied. From our experiments, it is found that the column insertion algorithm gives better results on larger C1S matrices. The results are promising: there are fewer blocks, and more columns having the C1P obtained in a reasonable time.
Overall, the heuristic is applied to many various matrices from real world data or generated by random. The generated square nonsymmetric matrices of the Set Covering Problem (SCP) are not tested for the C1P, then their optimums are not recognized. Therefore, the quality of the outcomes cannot be discussed. Randomly 10 matrices for different sizes are generated, the algorithm is performed then the average of the results of each size is taken. The remaining data, the real world  Table 2, come from the stop location problem, supplied by a German railway company 22 . This problem is written as SCP problem. The instances provide binary matrices which are supposed to contain almost C1P. The matrices B and C of small size that are generated by Ruf and Schöbel 22 , are sparse and almost contain the consecutive ones property with density 3% and 5%, respectively, the results are shown in Table 4. The heuristic is tested by the following ways: Directly extract the C1S submatrix of a given matrix, and then apply Algorithm 3. Concerning the five real-world matrices, only two columns are separated as a C1S submatrix then the algorithm is applied. Results are in columns 5 to 8 of Table 3 and columns 4 to 7 of Tables 4 and 5.
Note that Tables 3, 4, and 5 show the average number of blocks and columns. The results of Table   3 show that Algorithm 3 gives a larger number of columns with C1P and fewer blocks than the original matrix. The results of Table 4 show that almost (30-50)% of the columns of the and matrices have C1P. Table 5 includes the outcomes on the real world matrices. Matrix 1 has the C1P where final blocks' number is equal to and the columns' number is equal to n, hence optimal. Concerning the remainder matrices of real world data, the last numbers of blocks and columns are close to the lower bound m and upper bound n respectively. One can say that the column insertion algorithm is fast with respect to computation time. In comparison with the results of the CBM problem from 13 , with respect to the number of blocks, it is found that the results do not differ a lot from theirs. The last column shows   Their results. Also, the numbers of columns with C1P are not far from the columns' sizes of the matrices.
The three tables illustrate that: 1. The outcomes of Algorithm 3 do not entirely depend on the size of the C1S; they as well rely on the matrices' structures.
2. Computation time relies on the size of the matrix and the density. The algorithm performs well on matrices with more sparsity. 3. Reducing the blocks does not lead to producing a sizeable C1S submatrix. It is noticed that developing the C1S enhances the CBM. However, the reverse may not be correct since the submatrix size depends on the location of the destructive column.

Conclusion:
A heuristic method for solving the C1S problem is represented. A column insertion procedure was suggested to address the problem. The CBM problem is solved by Algorithm 3 and the maximum C1S submatrix is improved by using a polynomial time local algorithm with a complexity of ( ). The same algorithm is used for solving the minimum consecutive blocks problem in which the complexity for finding only the CBM is ( ( − )). The algorithm is applied to a set of real world matrices and randomly generated matrices from set covering. The outcomes present that large submatrices having the C1P can be detected. However, as the optimums are unknown for these matrices, it is impossible to say how far the solutions got by our procedure are actually resulted from them.