A Hybrid Method of 1D-CNN and Machine Learning Algorithms for Breast Cancer Detection

Breast cancer is a health concern of importance, and it is crucial to detect it early for effective treatment. Recently there has been increasing interest in using artificial intelligence (AI) for breast cancer detection, which has shown results in enhancing accuracy and reducing false positives. However, there are some limitations regarding accuracy in detection. This study introduces an approach that utilizes 1D CNN as feature extraction and employs machine learning (ML) algorithms such as XGBoost, random forests (RF), decision trees (DT) support vector machines (SVM) and k nearest neighbor (KNN) to classify samples as either benign or malignant aiming to enhance accuracy. Our findings reveal that the XGBoost algorithm with feature extraction (1D CNN) achieved an accuracy of 98.24% on the test set. This study highlights the feasibility of employing machine learning algorithms and deep learning (DL). This study uses a dataset of Wisconsin breast cancer (WBC), for detecting breast cancer. The proposed approach has a good detection and improving outcomes via shows accurate and reliable tools for diagnosing breast cancer.


Introduction
Breast cancer is a significant global health issue, with timely identification and diagnosis playing a key role in enhancing patient prediction.In recent developments in technology, ML and DL have shown a good tools in the fight besides breast cancer.These techniques have shown capable results in the prediction of breast cancer, Assisting healthcare professionals in making well-informed choices regarding patient treatment 1 .
Published Online First: March, 2024 https://doi.org/10.21123/bsj.2024.9443P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal images and patient details, enabling the recognition of breast cancer attributes and potential dangers 2 .DL is a type of ML that utilizes artificial neural networks to model complicated relationships between inputs and outputs.Large datasets may be utilized to train DL algorithms, which makes it possible to automatically identify and classify breast cancers this to increases identification accuracy and reduces the need for manual evaluation 3 .
The core benefit of utilizing DL and ML for breast cancer detection is the ability to deal with large datasets contained from demographic, clinical, and image data based on a number of risk variables, models that correctly evaluation the possibility of detection breast cancer might be created with this data.These techniques work for the complex and dynamic type of breast cancer as they can also supervise nonlinear connections between variables.
ML and DL have shown an effective tool in the detection of breast cancer, so it's important to provide healthcare with valuable information to make informed decisions about patient care.The issue of accuracy stays to be a major challenge in the application of DL and ML in the prediction of breast cancer.However, by combining different ML algorithms, combining previous knowledge and domain-specific information, and evaluating and validating the algorithms in the clinical setting, it is possible to improve the accuracy of these techniques and enhance their reliability and generalizability.Utilizing deep learning and machine learning in breast cancer prediction holds great promise for improving patient outcomes and reducing the burden of breast cancer worldwide.To address the issue of accuracy in breast cancer prediction using deep learning and machine learning, this study proposed.used a combination of different ML algorithms (XGBoost, SVM, DT, RF, and KNN) to achieve better performance and accuracy in this study and proposed a novel 1D CNN with ML algorithms to predict breast cancer.

Related work
Arshad. 4 this research holds significance in enhancing the precise prediction and assessment of breast cancer, a prevailing form of cancer that ranks among the foremost causes of female mortality across the globe.ML methodologies have demonstrated potential in promptly detecting and foretelling breast cancer.The investigation employs the WBC Diagnostic dataset to assess the efficiency of ensemble classifiers and ML, specifically RF, Logistic Regression (LR), AdaBoost, and Xtreme Gradient Classifier.The primary aim is to ascertain the optimal ensemble and ML classifiers for accurately detecting and diagnosing breast cancer, with a focus on achieving the highest level of Accuracy.
Harika et al. 5 the primary emphasis of this investigation revolves around harnessing ML to aid in the diagnosis of cancer, particularly in the anticipation of malignant neoplasms through fine needle aspiration.The study assesses six distinct classification techniques, with an emphasis on precision, objectivity, and reproducibility.These methods encompass Multilayer Perceptron, DT, RF, SVM, and Deep Neural Network (DNN).To conduct this evaluation, the research leverages the University of Wisconsin Hospital database, a repository containing thirty attributes that intricately delineate the nucleus properties of breast masses.
Elsadig et al. 6 this research delves into the contribution of AI in enhancing the prompt identification of breast cancer.The investigation examines a range of eight classification models, comprising both individual and ensemble classifiers, while also employing five distinct techniques for feature selection.This process culminates in the creation of a reliable dataset containing a mere 17 features.The experimental findings reveal that among the classifiers assessed, namely the multilayer perceptron, SVM, and stack models, three exhibit superior classification accuracy in comparison to their counterparts.
Chen et al. 7 this study aims to establish various ML models, including XGBoost, RF, LR, and the K-NN, to classify and predict breast cancer for early diagnosis.The evaluation index is the recall, with precision, accuracy, and the F1 score also considered.The dataset was standardized, and 15 features were selected using the Pearson correlation test.The K-NN model used cross-validation to select comparative analysis of classification accuracy among four distinct ML algorithms: KNN, DT, Naive Bayes (NB), and SVM.The primary aim is to identify the most precise supervised ML algorithm for diagnosing breast cancer.The findings demonstrate that, within the given dataset, NB exhibits the highest accuracy, surpassing KNN, SVM, and DT.In light of these outcomes, the research proposes that the integration of data mining and ML techniques can empower practitioners in formulating tools for early breast cancer detection.
The key limitation behind their studies lies in the dependency on using machine learning or deep learning focused on accuracy performance detection, and it needs to improve accuracy.This study proposed 1D-CNN as a feature extraction with machine learning algorithms such as SVM, KNN, DT, RF, and XGBoost.This finding implies the effectiveness of proposing 1D-CNN with a ML algorithm, for the detection of breast cancer.

Research Methodology
This research paper explores the efficacy of employing 1D-CNN architectures as feature extractors in tandem with diverse ML classifiers.The aim is to assess their collective potential for classifying and predicting breast cancer within a given dataset.This study starts with data acquisition, followed by a preprocessing stage that encompasses three sequential steps: data cleansing, attribute selection, and target role assignment.Subsequently, the focus shifts to feature extraction utilizing the 1D-CNN technique.The extracted features are then harnessed to create machine learning algorithms qualified of predicting breast cancer based on new measurements.For the purpose of evaluating algorithm performance, the model is subjected to new data with connected labels.This evaluation typically involves partitioning the labeled dataset into two segments using the Train_test_split method.The data is employed to create the machine learning model, establishing the training set 70%, while the staying 30% is reserved for evaluating model efficiency, forming the test set.Upon accurate testing of the models, the results are compared to discern the algorithm that yields the highest accuracy, thereby identifying the most predictive approach for breast cancer detection.The proposed method's workflow is shown in

Pre-processing
Before training machine learning models, this proposed applied min-max scaling to input features to ensure that they were on a similar scale.Min-Max scaling is a common pre-processing technique that scales input features to a specified range (usually [0, 1]) 11 .This can be important in machine learning, as features that are on vastly different scales can cause problems during training and may result in some features having a disproportionately large influence on the model's predictions.Through the process of scaling proposed input features to a uniform range, this study achieved the equitable consideration of each feature during training.This approach ensured that the proposed models could effectively discern significant patterns within the data without undue bias stemming from any single feature.

Splitting dataset
In this study, the dataset into 70% training and 30% testing.This proposed utilized stratified sampling to ensure the training and testing subsets were representative of the entire dataset.This technique is utilized in ML to keep the target variable distribution in both datasets, so enhancing facility and position in the data analysis process.

Features extraction
In this proposed utilizing 1D-CNN for features extraction from the input data rather than utilizing for classification or prediction.These features extraction can later be fed into another ML model, providing valuable input for its performance 12 .The 1D-CNN is trained on a dataset utilizing a supervised learning approach to learn relevant relationships in the data during training, the network automatically learns a set of filters that capture local patterns which identify important features in the data 13.Once the 1D-CNN is trained, this study can utilize it to extract features from new data that wanted to classify or predict.After that, the new data into the 1D-CNN and the output of one or more of the intermediate layers, which represent the features learned by the network.These features can then be used as input to another ML model, such as an SVM, XG-Boost, KNN, DT and RF as a classifier, to perform the final classification or prediction.The input data consists of 30 columns of numeric features.Each column represents a specific measurement or characteristic related to the cell nuclei samples in our dataset.These features are used as input to a 1D Convolutional Neural Network (1D-CNN) for the purpose of feature extraction.

Machine learning classifiers
This section describes the ML classifiers proposed to classify breast cancer.This study used a combination of different ML algorithms (XGBoost, SVM, DT, RF, and KNN) algorithms using random_state=42 this used to set the random seed for reproducibility.It ensures that the same results can be obtained when the code is run multiple times with the same dataset.This study uses the default settings for the Decision Tree Classifier and XG-Boost and SVM, which means you are not specifying any hyperparameters explicitly.While n_neighbors is set to 3, indicating that the classifier considers the labels of the three nearest neighbors to make predictions.No additional hyperparameters or settings, so the default settings for KNN, This study uses several hyperparameters that control the behavior of the Random Forest classifier.n_estimators is set to 10, meaning that your Random Forest consists of 10 decision trees and max_depth is set to 15, which means that each tree can grow to a maximum depth of 15 nodes.Decision Tree DT constitutes a graphical representation employing branching techniques to portray potential courses of action and their respective outcomes.This technique accommodates both categorical and numerical variables, eliminating the necessity for presumptions about data distribution or classifier configuration.DT excels in furnishing precise and streamlined classifications, even when handling extensive datasets 14,15 .

XG-Boost
XGBoost stands as an ensemble technique that amalgamates numerous decision trees for prediction purposes.Its operational principle involves a stepwise inclusion of decision trees into a model, aiming to rectify errors introduced by prior trees.This sequential progression persists until the targeted level of accuracy is attained.Notably, XGBoost distinguishes itself by its capacity to enhance the performance of each decision tree through the application of gradient boosting techniques 16 .

Support Vector Machines
SVM is a widely adopted machine learning method employed primarily for binary classification endeavors.SVM strives to determine the hyperplane that optimizes the separation between two classes, with the margin denoting the distance between this hyperplane and the nearest data points from each class.SVM has demonstrated efficacy in breast cancer binary classification undertakings.In the context of breast cancer, SVM aims to precisely forecast whether a tumor possesses malignancy or benign characteristics, leveraging diverse tumor attributes 17,18 .K-Nearest Neighbor Classifiers KNN algorithm categorizes unlabeled data by associating it with the closest labeled data of similar characteristics.Renowned for its straightforwardness and effectiveness, KNN is extensively utilized for supervised classification in scenarios involving multiple variables.The KNN classifier is solely influenced by one parameter: the choice of the number of nearest neighbors to be taken into account, denoted as K aimed at mitigating challenges like overfitting and underfitting 19 .Random Forest Classifier RF is a common ensemble ML technique suitable for both classification and regression assignments operating on the foundation of DT principles.It combines many decision trees, each enhanced on a separate subset of training data, to prepare predictions.It's can helping high-dimensional datasets full with various features, rendering it notably accurate in contrast to conventional classification methods 20 .The process of RF algorithm contains the deliberate selection of subsets from the training data and features in a random mode to build multiple decision trees.These trees are combined to get a final prediction through a consensus of their individual forecasts.This strategy effectively mitigates overfitting concerns and enhances the overall adaptability of the model for more generalized outcomes.

Evaluation
Accuracy, F-score, precision, and Recall are usually used for metrics evaluating the efficiency of machine learning models.These metrics provide valuable insights into different aspects of model performance.Recall estimates the model's capability to detect all positive instances out of the total TP, showing a low false negatives(FN) rate 22 .
Accuracy, measures the whole correctness of the model via calculating the ratio of correct predictions as a true negative (TN) and TP to the total number of predictions made 23 .

Accuracy =
(TP + TN) Lastly, F-measure combines precision and recall to provide a single metric that balances together measures 24 .
F − measure = 2 × (Precision * Recall) Precision + Recall 4 In the presented equations, the different metrics are expressed via individual calculations that rely on the values of FN, TN, FP, and TP.This values are found from the models the actual truth and predictions 25 .

Results and Discussion
Breast cancer was classified into benign and malignant tumours using five classification methods.The outcomes of the accuracy assessment can be juxtaposed in Fig. 3, encompassing all employed models.Notably, a considerable uptick in accuracy was observed across nearly all models, underscoring their efficacy.

Figure 3. A Comparison Results
It is necessary to compare the proposed method against state-of-the-art methods as shown in Table 3 and Fig. 4.

Figure. 4 A comparation related works
Arshad 4 used an ensemble classifier and machine learning to detect breast cancer, which was proposed to have achieved an accuracy of 98.1%.Fabiano Teixeira et al. 5 used MLP, SVM, RF, DT and DNN to extract Breast cancer and acquired an accuracy of 0.92%.Elsadig1 et al. 6 have used an MLP, SM, and stack to extract Breast cancer and acquired an inaccuracy of 97.7%.Chen et al. 7 which utilized various machine learning models has obtained an accuracy of 97.4%.Sakib1 et al.The accuracy of breast cancer detection utilizing ML models is affected via factors such as model selection, dataset quality, size, and preprocessing techniques so the differences in reported accuracies can be recognized to these variations.ML advancements and larger datasets may contribute to improved accuracy rates.
ML and DL techniques are valuable tools for breast cancer detection, and the proposed method using 1D-CNN stays competitive in the field, depending on variations in reported accuracies among studies.

Conclusion
The early detection of breast cancer stays an important and constant focus in the field of scientific research.This study applied an assessment of classification accuracy through the evaluation of five ML algorithms like KNN, DT, RF, XGBoost, and SVM.The principal aim was to enhance the precision and effectiveness of classification algorithms.The findings underscore the pivotal role of ML in augmenting the prediction and diagnosis of breast cancer, a paramount outcome of this investigation.Notably, the research reveals that the XGBoost algorithm surpasses 1D-CNN in terms of feature selection accuracy, when compared to the other algorithms utilized.Furthermore, the proposed approach demonstrates its efficacy in both the identification and prognosis of breast cancer, achieving the highest accuracy of 98.24% through the utilization of XGBoost.In the Future work uses Ensemble Models to Investigate the potential of ensemble models that combine the strengths of multiple machine learning algorithms to achieve even higher accuracy in breast cancer detection.

Fig. 1 .Figure 1 .
Figure 1.Research Methodology : March, 2024 https://doi.org/10.21123/bsj.2024.9443P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal Finally, by using a pre-trained 1D-CNN for feature extraction, this can benefit from the knowledge learned by the network on similar tasks, which can improve the performance of the proposed ML model based on the new features, ML algorithms classify breast cancer from the diagnosis dataset into Malignant or Benign.As shown in Fig. 2 feature extraction via 1D CNN model.This study utilizes TensorFlow and sets random seeds for reproducibility.Define the input layer based on the shape of the training data.Build a 1D Convolutional Neural Network (CNN) model with the following layers:  Conv1D layer with 64 filters and a kernel size of 3, using ReLU activation. MaxPooling1D layer with a pool size of 2.  Flatten the layer to transform the 1D feature map into a 1D vector. Dense layer with 64 units and ReLU activation.
proposed ML and DL techniques to detect breast cancer and acquired an accuracy of 96.66%.The previous study used the same dataset.This study proposed a hybrid method as an ID-CNN feature extraction and machine learning (ML) algorithms such as KNN, DT, NB, and SVM classifier, to detect breast cancer and acquired an accuracy of 98.24%.

Table 1
description of features on the dataset.

Table 2 . Comparation algorthims
XgBoost and RF, SVM, DT, and KNN, and found that XGBoost was the best in diagnosing breast cancer, with XGBoost being the dominant classifier.