Wireless Propagation Multipaths Using Spectral Clustering and Three-Constraint Affinity Matrix Spectral Clustering

: This study focused on spectral clustering (SC) and three-constraint affinity matrix spectral clustering (3CAM-SC) to determine the number of clusters and the membership of the clusters of the COST 2100 channel model (C2CM) multipath dataset simultaneously. Various multipath clustering approaches solve only the number of clusters without taking into consideration the membership of clusters. The problem of giving only the number of clusters is that there is no assurance that the membership of the multipath clusters is accurate even though the number of clusters is correct. SC and 3CAM-SC aimed to solve this problem by determining the membership of the clusters. The cluster and the cluster count were then computed through the cluster-wise Jaccard index of the membership of the multipaths to their clusters. The multipaths generated by C2CM were transformed using the directional cosine transform (DCT) and the whitening transform (WT). The transformed dataset was clustered using SC and 3CAM-SC. The clustering performance was validated using the Jaccard index by comparing the reference multipath dataset with the calculated multipath clusters. The results show that the effectiveness of SC is similar to the state-of-the-art clustering approaches. However, 3CAM-SC outperforms SC in all channel scenarios. SC can be used in indoor scenarios based on accuracy, while 3CAM-SC is applicable in indoor and semi-urban scenarios. Thus, the clustering approaches can be applied as alternative clustering techniques in the field of channel modeling.


Introduction:
Clustering is a process that analyses data by classifying groups with similar structures. Clustering aims to categorize the data into several clusters such that points in the same group are similar while that of the other groups are dissimilar. Datasets for feature selection, intrusion detection, white blood cell classification and wireless sensor networks have been clustered over the years (1)(2)(3)(4).
Clustering of wireless propagation multipaths gained interest due to the widespread application of multiple-input multiple-output (MIMO) antennas in wireless communications systems (5,6). MIMO systems are developed to increase data rates and ensure wireless transmission reliability (7). Clusterbased channel models are used to develop the MIMO propagation channel (8).
Inaccurate clustering of the multipaths leads to incorrect channel models. This results in the degradation of the performance of the MIMO system. An accurate clustering approach is needed to cluster the multipaths correctly. Over the years, various clustering approaches have been used to cluster wireless multipaths. The K-power means (KPM) which is based on K-means, incorporates the multipath power and minimizes the distance between the cluster centroids (9). The Kurtosis measure (KuM) overcomes the sensitivity of KPM to the input number of clusters by detecting the time-of-arrival of the multipaths and partitioning them into clusters (10). Kernel power density (KPD) uses the kernel density and the power of multipath components (MPCs) to identify the local density variations of MPCs (11). The Gaussian mixture model (GMM) combines the covariance structure and the mean information of the channel multipaths to reveal their similarity (12). However, these clustering approaches give only the number of clusters without considering the membership of the clusters. The problem of this clustering process is that it is possible that the number of clusters is correct but not the membership of the clusters.
Simultaneous identification of the number of clusters and the membership of clusters is done to solve the problem of giving just the number of clusters.  used simultaneous clustering and the model selection matrix affinity (SCAMSMA) which represent the data as the product of the data and an affinity matrix to solve the number of clusters and the membership of clusters of wireless multipaths simultaneously (13,14). Blanza, Materum, and Hirano (2020) applied deep divergence-based clustering to solve the membership of the clusters, and the cluster count is calculated according to the membership of the multipaths to their clusters (15). However, the results of SCAMSMA and DDC show that there is a need to increase clustering accuracy by using other clustering approaches. Hence, SC and 3CAM-SC are used to improve the performance of clustering the multipaths.
This study aimed to solve the number of multipath clusters and the membership of multipath clusters simultaneously. Also, the objective of the study was to enhance low clustering accuracy. The main contributions of this work are as follows: • Applying SC and 3CAM-SC to cluster multipaths for the first time.
• Adopting the three-constraint affinity matrix (3CAM) to improve the accuracy of SC.
• Conducting a performance evaluation to show that SC and 3CAM-SC are comparable with state-of-the-art clustering approaches. The paper is organized in the following way. Section Two presents the related work. Section Three discusses the methodology. Section Four shows the results of the clustering approaches. Section Five elaborates the findings. Section Six concludes the work.

Background and Related Studies:
The evolution of wireless communications has seen the development of novel technologies that improved the efficiency, reliability and popularity of the wireless communications systems. The likes of MIMO antenna arrays, high capacity base stations, fast mobile stations, multiple core processors and large storage memories have enhanced the speed, bandwidth, and capacity of the Internet, cellular communications, WiFi, Bluetooth and the likes. These new technologies comprise the physical layer of the wireless communications systems, as shown in Figure 1. They are expensive, intricate and tedious to create; so proper planning and designing them are essential before their implementation. Channel modeling eliminates the need to build right away the infrastructure of wireless communications systems.

Wireless Propagation Channel Models
The simulation of the propagation channel allows the performance evaluation of the communications system before the construction of the physical layer (17). The design of the communications system can be adjusted to optimize the potential of the channel to: increase the speed of the system, utilize the bandwidth appropriately, and maximize the capacity. It minimizes cost by using appropriate equipment and removes the delay of implementation by adequately designing the communications system.
The channel impulse response (CIR) is essential in modeling a communications system as it characterizes the channel being designed. Among the popular channel models are Saleh-Valenzuela (SV)), 3 rd Generation Partnership Project (3GPP), Institute of Electrical and Electronics Engineers (IEEE) 802.15.4a, Wireless World Initiative New Radio (WINNER) II, and Cooperation in Science and Technology (COST) 2100 (18)(19)(20)(21)8). The communications signals propagate in multiple directions as they move from the transmitter to the receiver. The MPCs are grouped in clusters, as can be seen in Figure 2. The MPCs in different environments must be characterized to determine the accuracy of the channel model. Multipath clusters having similar parameters of the MPCs such as delay, azimuth, and elevation of arrival and departure are considered to describe the propagation channel accurately using a clustering technique (22). Traditionally, clusters are identified through human visual inspection (20)(21)(22)(23)(24)(25). It works well when the dataset is small. However, this approach is subjective and tedious for large datasets (26,9). This is the reason why automatic clustering approaches became popular to remove the bias of visually clustering the multipaths and eliminate the problem of accurately clustering large sets of data.

Multipath Clustering Approaches
To overcome these concerns, different clustering techniques that automatically determine the clusters have been introduced over the years. Among them is KPM which uses K-means in clustering the multipaths (9). The powers of the multipaths are included, and the distance between cluster centroids is minimized to determine the number of clusters.

Figure 2. Multipath Clusters in Mobile Wireless Communications (27)
KPM though needs the initial number of clusters as a priori. KuM overcomes the sensitivity of KPM to the input settings by detecting the timeof-arrival of the multipaths and partitioning them into clusters (10). KuM is independent of the channel and is still applicable even without prior knowledge of the impact of the environment on the CIR.
Ant colony clustering (ACC) combines the decaying amplitude and the time of arrivals of MPCs (28). Clusters are identified based on the population and the positive-feedback collaboration of the evolution of the ant-agents. Automatic cluster identification (ACId) improved the mean cluster distance of K-means by iteratively assigning MPCs to a cluster as long as the cluster distance is within a threshold (29). The cluster centroid position is dynamically updated and reassigns MPCs that might be closer to existing clusters.
The Sparsity-based method (SBM) is built on the SV model feature, that with increasing delay, the power of the MPCs exponentially decreases (30). SBM does not need prior knowledge of clusters, such as the number and initial locations of the clusters, because it incorporates the expected behavior of clusters into the clustering framework.
The K-power means of scattering points (SBKPM) is based on the geometry of the scattering points of the measurement-based ray tracer (30). The scattering points are clustered using KPM. The cluster-centroid scattering points of successive snapshots are compared to track the clusters.
KPD utilizes the kernel density and power of multipaths to identify the local density variations of MPCs. A heuristic approach of cluster merging is used to improve the performance of the clustering approach (31). GMM relates the covariance structure with the mean information of the multipaths to reveal their similarity (12). A compact index is used to validate the close relationship between the GMM clustering mechanism and the multipath propagation characteristics.
The above-mentioned clustering approaches give only the number of clusters of MPCs. They do not consider the accuracy of the cluster membership. Thus, the number of clusters may be correct, but it does not necessarily mean that the correct members are in the clusters. This problem can be solved by simultaneously determining the number of clusters and the membership of clusters.
Blanza and Materum (2019) applied simultaneous clustering and the model selection matrix affinity (SCAMSMA) in clustering the multipaths (6,14). SCAMSMA represents the data as the product of the data and an affinity matrix to solve the number and membership of multipath clusters simultaneously. Blanza, Materum, and Hirano (2020) applied deep divergence-based clustering to solve the membership of the multipath clusters (5,15). The cluster count was then calculated according to the membership of the multipaths to their clusters.
The clustering approaches solved the problem of simultaneous identification of the number of clusters and the membership of clusters. However, the results of SCAMSMA and DDC show that there is a need to improve clustering performance by using other clustering approaches. Hence, SC and 3CAM-SC are used to improve the accuracy of clustering the multipaths.

Methodology:
The overall methodology of the study is outlined in Figure 3. There are two frequency bands: band 1 (B1) and band 2 (B2). The environment can be indoor or semi-urban. The transmission can be line-of-sight (LOS) or non-lineof-sight (NLOS). The communication link can be single or multiple. The dataset is clustered using the SC and the 3CAM-SC clustering approaches (6).

Pre-Processing of COST 2100 Channel Multipaths
The original form of the multipaths from the COST 2100 channel is not suitable for clustering. The multipaths need to be transformed first before they can be clustered. The pre-processing is detailed below.

Creation of COST 2100 Channel Multipaths
The European Cooperation in Science and Technology (COST) 2100 channel model (C2CM) can replicate the stochastic properties of MIMO wireless propagation channels (8). Multipath clusters characterize C2CM. Groups of MPCs with similar delay and angles comprise a multipath cluster. An MPC is classified based on the delay, angle of departure (Azimuth of Departure (AoD), Elevation of Departure (EoD)),angle of arrival (Azimuth of Arrival (AoA) and Elevation of Arrival (EoA)).
A CIR that is changing with time (designated by ) is the group of MPCs from all the multipath clusters according to the location of the mobile station (MS) with the base station (BS). The CIR is based on the delay and direction domain and is given as where , is the complex amplitude of the pth MPC in the kth cluster, K is the set of visible cluster indices, , is the direction of arrival (AoA, EoA) of the MPC, and , is the direction of departure (AoD, EoD).
Eight different channel scenarios generate the multipaths that serve as the input data for preprocessing. The eight channels are as follows: There are thirty trials for each channel scenario. Each trial has a different number of multipaths and multipath clusters. They represent the common propagation settings in a wireless communications system.
The study uses the MATLAB implementation of the COST 2100 channel (32). The generation of the eight-channel scenarios has the following initializations:  the network characteristics are based on the parameterization of the COST 2100 channel model (33, 34)  the BS location is at the geometric reference point (0, 0, 0),  The MS position is randomized at a given distance from BS with a maximum distance to ensure that the cluster measurements are nontrivial (greater than 2) and that the MS position is within the cell radius of the network,  the MS elevation is randomized for the indoor and semi-urban channel scenarios with a random height difference for BS of up to 15 meters for the semi-urban environment and 9 meters for an indoor environment and  the MS velocity is randomized to be either standing still or at the average walking speed of 1.1 m/s in any random direction through the pseudorandom generator Mersenne Twister.

Extraction of Wireless Channel Multipaths
The clustering procedure of Xu and Wunsch (2005) begins with feature selection (35). For the double-directional radio channel developed by Steinbauer, Molisch, and Bonek (2001) (36), the parameters τ, φ AOD , θ AOD , φ AOA , and θ AOA are extracted and generated using MATLAB to serve as the raw data which can be expressed as The extraction process concurs with the COST 2100 channel. Each snapshot generates X RAW with a dimension of 5. There are thirty sets of X RAW data per channel scenario for clustering. The parameters obtained are representations of each multipath that is pre-assigned to a particular cluster. The multipaths are filtered to get only those that are visible in a single snapshot. LOS components with the highest amplitude and the least delay are removed as they do not constitute multipaths.

Transformation of Input Data
The input data from the COST 2100 channel is transformed using the directional cosine transform (DCT) and the whitening transform (WT). The problem with the circular nature of the angular domain is solved by the directional cosine Cartesian equivalents. The result is the transformation of Eq. (2) from 5 dimensions to 7 dimensions, which can be expressed as Dip-dist examines the cluster ability of the transformed data where data with two or more clusters can be clustered while data with only one cluster cannot be clustered (37,38). WT follows to standardize the data since they have different units from the dimensions, angle and delay. The whitened data serves as the reference data in calculating the Jaccard index. WT eliminates unwanted noise resulting in a more efficient clustering of data.

Processing of the Input Data
The transformed data are clustered using SC and 3CAM-SC to enhance the accuracy of the existing clustering approaches (39,40). SC is a data analysis technique that reduces complex multidimensional datasets into clusters with fewer dimensions. The goal is to cluster the data based on their similarity. SC accepts the similarity matrix ∈ with k clusters as input. The similarity graph is constructed with W as the weighted adjacency matrix. The normalized Laplacian L is computed, followed by the k eigenvectors. The points are then clustered using K-means to give the clusters as the output.
3CAM-SC is a modified SC through using 3CAM to formulate the similarity matrix (41). 3CAM depends on three constraints: pairwise, binary and proximity. The pairwise constraint is based on the absolute distance between the corresponding pair of data for all dimensions. The binary constraint takes on the sum of the values of the pairwise constraints of all dimensions. It returns a value of one (same cluster) if the sum is greater than or equal to a predefined value or zero (not on the same cluster) otherwise. The proximity constraint combines all the data points to form clusters around the main diagonal, which form a 0-1 block diagonal of the similarity matrix. The rest of the procedure for SC follows to calculate the output clusters.
The clustered data serve as the calculated data in computing the Jaccard index. Using common data for the clustering algorithms, standardizes their clustering performance. The Jaccard index, which serves as the similarity measure, is calculated as = | ∩ | | ∪ | = 11 11 + 10 + 01 where | ⋅ | refers to cardinality, ∈ , = | | is the number of multipath clusters, is the reference clusters, and is the calculated clusters. 11 is the total number of multipath clusters for the accuracy on the number of clusters or the total number of multipaths for the accuracy on the membership of the clusters in that is the same as in . On the other hand, 10 is the total number of multipath clusters for the accuracy on the number of clusters or the total number of multipaths for the accuracy on the membership of the clusters in that are not in . Lastly, 01 is the total number of multipath clusters for the accuracy on the number of clusters or the total number of multipaths for the accuracy on the membership of the clusters in that is not in . The Jaccard index ranges between 0 and 1, with one being the highest. A Jaccard index of 1 means that the reference multipath clusters are the same as the calculated multipath clusters. The membership of the reference multipath clusters is the same as the membership of the calculated multipath clusters. On the other hand, a Jaccard index of 0 means no calculated multipath cluster is the same as the reference multipath clusters. The membership of the calculated multipath clusters is not equal to the membership of the reference multipath clusters.

Evaluation of Results
The clustering algorithm results are evaluated through factor identification, accuracy, computational duration and robustness. Performance analyses on these areas show the strengths and weaknesses of the clustering algorithms quantitatively.

•
Factor identification discovers the theoretical and practical features in the field of radio wave propagation that affects the clustering process and leads to the obtained results. • The accuracy of the clustering algorithms is evaluated using the Jaccard index. Thirty sets of data, each with seven dimensions, are generated and clustered. The indices are assessed with each other and compared with the results of the state-of-the-art clustering approaches.

•
Computational duration is when an algorithm clusters the data from the press of the start button until the results are displayed. Since there are thirty sets of data for each algorithm, the mean serves as the basis. A short duration means that the algorithm is straightforward to compute, while a more prolonged period means that the algorithm is more likely to be computationally complex.

•
Robustness is based on the performance of the clustering algorithms on the eight-channel scenarios. A clustering algorithm is said to be robust when it performs consistently well for all channels. Robust ness is assessed objectively by the standard deviation of the Jaccard indices. ANOVA is also applied to evaluate the consistency of the performance of the clustering algorithm. If the F-statistic p-value is smaller than the significance level (0.05), then the test rejects the null hypothesis that all group means are equal and concludes that at least one of the group means is different from the others.

Results:
The results of clustering the C2CM dataset using SC and 3CAM-SC are presented. The Jaccard indices of SC and 3CAM-SC for both the number of clusters and the membership of clusters for the eight-channel scenarios are shown in Table 1, while, the mean computational duration of SC and 3CAM-SC for each channel scenario is displayed in Table 2. Figure 4 presents the box plots of SC and 3CAM-SC for the number of clusters in indoor scenarios, while Figure 5 is for the semi-urban scenarios. Figure 6 shows the box plots for cluster membership in indoor scenarios, while Figure 7 is for the semi-urban scenarios.
The performance comparison of SC, 3CAM-SC, SCAMSMA and DDC on the number of clusters for indoor and semi-urban scenarios is shown in Figure 8. In contrast, Figure 9 illustrates the performance comparison of the clustering approaches on the membership of clusters.

Discussion:
The results of SC and 3CAM-SC were assessed and analyzed. Factorization, clustering accuracy, computational complexity and robustness were the areas of evaluation. The clustering performance of SC and 3CAM-SC was compared with that of SCAMSMA and DDC (6,5).

Factor Identification
The clustering accuracy of SC and 3CAM-SC is shown in Table 1. The clustering approaches performed better in indoor scenarios than in semiurban scenarios. The reason for this is that indoor channel scenarios generated a smaller number of multipaths and multipath clusters due to limited reflections of the enclosed wireless signals. The clustering approaches had fewer multipaths to classify, resulting in a more superior clustering performance. On this basis, the indoor channel scenarios had better accuracy for both the membership of clusters and the cluster count compared with the semi-urban channel scenarios. On the other hand, the semi-urban channel scenarios generated a higher number of multipaths and multipath clusters by the wider surroundings due to more interacting objects reflecting the signals. Consequently, lower accuracy for both the membership of clusters and the cluster count was attained in semi-urban channel scenarios.

Clustering Accuracy
3CAM-SC outperformed SC in all channel scenarios for both the number of clusters and membership of clusters, as shown in Table 1. The reason for this is that 3CAM-SC improved the formulation of the affinity matrix of SC, resulting in a better clustering performance (41).
The performance of SC and 3CAM-SC are assessed with that of SCAMSMA and DDC. The clustering approaches can be compared since all solve the number of clusters and the membership of clusters simultaneously. Figure 8 displays the bar graph for comparing the accuracy of the number of clusters. SC has better accuracy than SCAMSMA by 74.19% and DDC by 76.98% in semi-urban scenarios but is less accurate in indoor scenarios, 34.10% for SCAMSMA and 44.98% for DDC. 3CAM-SC has the best performance among the clustering approaches in both channel scenarios. 3CAM-SC is 16.32% more accurate than the next best DDC in indoor scenarios and 91.67% more accurate than the second-highest SC in semi-urban scenarios. Figure 9 shows the bar graph for comparing the accuracy for the membership of clusters. SC again fared better than SCAMSMA and DDC in semi-urban scenarios but has lower accuracy in indoor scenarios. SC is more accurate than SCAMSMA by 40.23% and DDC by 19.62% in semi-urban scenarios. However, it is less accurate than SCAMSMA by 9% and DDC by 15.89% in indoor scenarios. Still, 3CAM-SC has the best clustering performance among the clustering approaches for both channel scenarios. It is more accurate than the second-best DDC by 16.22% in indoor scenarios and 72.35% than the next highest SC in semi-urban scenarios.
SCAMSMA and DDC are superior to SC in clustering a lesser number of multipaths (indoor scenarios). On the other hand, SC has better performance than SCAMSMA and DDC when multipaths are increased (semi-urban scenarios). Nevertheless, 3CAM-SC has the best clustering accuracy due to the improved representation of its affinity matrix.

Computational Complexity
The mean computational duration of the thirty sets of data per channel scenario using SC and 3CAM-SC is presented in Table 2. The simulations were performed in MATLAB 2019a on a Dell 7730 mobile workstation with Windows 10 operating system, Intel Xeon E2186M 2.90 GHz CPU and 64 GB memory. The computing time is dependent on the number of multipath components and multipath clusters. The duration indicates that the higher the number of multipath components and multipath clusters, the longer the computational time is. That is why the indoor scenarios have short computational durations due to fewer multipath components and multipath clusters. In contrast, semi-urban scenarios, especially the multiple links with more multipath components and multipath clusters, have long computational durations.
3CAM-SC clusters the multipaths faster than SC due to the improved affinity matrix. The clustering approaches have almost the same computational duration for the indoor scenarios due to the low number of multipaths and multipath clusters. However, when the number of clusters and multipath clusters increased, the case for the semiurban scenarios, 3CAM-SC clustered the multipaths faster than SC.

Robustness
3CAM-SC has the best clustering performance among the four clustering approaches. It is the most robust clustering approach as it registers a mean Jaccard index of 0.9732 for the membership of clusters for the eight-channel scenarios. Also, 3CAM-SC is consistent in clustering the membership of the clusters across all channel scenarios, whether indoor or semi-urban. Lastly, it registers the highest clustering accuracy for the number of clusters in all channel scenarios.
The consistency of the clustering performance of SC and 3CAM-SC can be visualized using box plots. Figure 4 shows the box plots for the number of clusters of the indoor scenarios. The p-value is 0.0636, which indicates that the mean Jaccard indices of SC and 3CAM-SC are not significantly different. The box plots for the membership of clusters of the indoor scenarios are presented in Figure 5. There is no significant difference between the mean Jaccard indices as the p-value is 0.0773, which is above the significant level of 0.05.
The box plots for the number of clusters of the semi-urban scenarios are displayed in Figure 6. Since the p-value is 8.17 x 10 -6 , there is already a significant difference between the mean Jaccard indices of SC and 3CAM-SC. Figure 7 illustrates the box plots of the membership of clusters of the semi-urban scenarios. There is a significant difference between the mean Jaccard indices of the clustering approaches since the p-value is 1.42 x 10 -7 .
The box plots and the p-values show that 3CAM-SC is more robust between the two clustering approaches since it has a more consistent clustering performance in all channel scenarios.

Conclusion:
The work introduced SC and 3CAM-SC in clustering the COST 2100 channel dataset. The clustering approaches are used to solve the number and membership of multipath clusters simultaneously. The results of SC and 3CAM-SC in clustering wireless propagation multipaths are presented and compared with SCAMSMA and DDC. The results show that the performance of SC is comparable with SCAMSMA and DDC. The three clustering approaches can be applied to indoor scenarios based on accuracy.
Moreover, 3CAM-SC outperforms the three clustering approaches in all channel scenarios. Thus, 3CAM-SC can be used in channel modeling as an alternative clustering approach to identifying the membership of clusters. Still, improvements are necessary to increase further the multipath clustering accuracy of 3CAM-SC on the number of clusters.