Deep Learning Techniques in the Cancer-Related Medical Domain: A Transfer Deep Learning Ensemble Model for Lung Cancer Prediction

Problem: Cancer is regarded as one of the world's deadliest diseases. Machine learning and its new branch (deep learning) algorithms can facilitate the way of dealing with cancer, especially in the field of cancer prevention and detection. Traditional ways of analyzing cancer data have their limits


Table 1. Cancer statistics (A) Indian 2018 statistics (B) Global 2018 statistics (C) Global 2020
Statistics. According to 9 , lung cancer was the deadliest cancer in 2020. Liver, stomach, and breast cancer were also the next three deadliest cancers, with percentages of 8.3%, 7.7% and 6.9%, respectively. Fig 1, shows the global cancer-death statistics through 2020 9,10 .

Figure 1. Cancer statistics
The Cancer Facts and Figures report estimated the number of cancer cases in 2022 at 1918030 cases. The report indicated that the deadliest men-related cancer from 1930 to 2019 was lung cancer. The top three cancers are stomach, colon, and prostate cancer. On the other hand, for women, the cancer percentages were lower than for men. However, lung cancer also recorded the most death cases among women. The next three most common cancers for women were breast, stomach and colon 11 .
The main contribution of the current study can be summarized as follows: -Solve the problem of the low accuracy of lung cancer prediction systems by proposing ensemble and fusion techniques.
-Introduce a new medical support tool for lung cancer prediction. -Take into account the main lung cancer classes and classify them accordingly. -Use small and efficient deep-learning models. The rest of the paper will be organized as follows: First, the related work will be listed and compared. Then, the proposed materials and methods will be introduced and illustrated. After that, the results will be included and a detailed discussion will be introduced. The limitations of the paper and the conclusion will be listed at the end of the paper. The recommendation and future work will also be included in the conclusion section.

Related Work
Because of the huge amount of multimodality data that has come in over the past ten years, the use of data analysis in health information systems has grown a lot.
In the field of medical health, the interest in developing machine learning (ML) models to manipulate and process this huge amount of medical data has increased significantly 12 .
In recent years, Deep Learning (DL), a method built on artificial neural networks, has emerged as a high-performance machine learning methodology that holds the potential to transform the field of artificial intelligence 12 .
Using DL in medical fields is very effective and has recorded many achievements that were previously hard to handle. DL presents different types of networks with many capabilities that can handle a huge amount of medical data (textual information, audio signals, medical images, and videos). These DL networks (models) provide a very powerful tool for many medical platforms [13][14][15] .
Many DL models are used in the medical domain. The nature of the medical field, the size of processed information, and the aim of research define the type and architecture of the DL network. Table 2 lists the most commonly used DL networks in the medical domain and their properties. Hundreds of studies are introduced in the medical domain every year. Many of these researches use the ML and DL capabilities 34 . Table   3 includes the most recent studies that use deep learning models in the cancer prevention and detection fields. Cancer research statistics between 2014 and 2022 were obtained through a "Google Scholar" search, which proved to be useful. Fig 2, demonstrates the increasing interest in utilizing deep learning for cancer research. It also shows that lung cancer gets better attention than breast cancer. Breast and lung cancer have the highest ratio in this study. All these statistics were collected by Google Scholar on August 23, 2022, at 7 p.m.

Figure 2. Deep learning cancer research between 2014 and 2022
Related work summary Table 3 illustrates that there are some gaps in the previous studies. The low accuracy in some studies is due to unsuitable methods or inappropriate parameter selection. Some studies used highly computational models. Most studies used one or two performance metrics, which are not sufficient to judge the models and evaluate their performance.
However, in the current study, advantage of ensemble learning and transfer learning and the low-computational efficiency of some specific deep models will be taken into account in order to achieve good performance with a lowcomputational model.

Convolutional Neural Network (CNN)
CNN is a deep neural network that accepts its input as a 2D image and produces classes or class probabilities as an output. CNN can be used in many applications, like medical disease diagnosis, human recognition, image classification, etc. 30,59 .  The convolutional layer applies the convolution process in which the image of size M*N is convolved using a kernel of a specific size K*K. The kernel slides on the image starting from the left upper corner to the lower right corner. Each pixel's neighborhood is defined and multiplied by the kernel pixels, and the sum of the multiplication is used as the result of the convolution. The output of the convolutional layer is called the activation map; whose size differs according to the number of filters.
The convolution process is used in the convolutional layer, which takes an image of size M*N and combines it with a kernel of size K*K. The kernel slides on the image starting from the left upper corner to the lower right corner. The neighborhood of each pixel is defined and multiplied by the kernel pixels, and the sum of the multiplication is used as the result of the convolution. The output of the convolutional layer is called the activation map; whose size differs according to the number of filters. Many parameters define the final convolution size, including the stride and padding. While stride (S) represents the size of the sliding window (kernel), the padding (P) refers to the number of rows and columns that are added to convolve the boundary pixels. For example, if the kernel is of size 5*5, the padding will be 2 (add 2 columns and 2 rows). If the kernel size is 7*7, the padding will be 3. The output size of the convolution layer is calculated as (W-F+2P)/s+1, where W is the size of the image, F is the size of the kernel, S is the stride, and P is the padding. The output of the convolution layer is then moved to the pooling layer, in which the image is reduced by a specific rate. Two different pooling methods can be used in CNN: max pooling and the average pooling. Average pooling takes the average of pooled pixels, while the max pooling takes their maximum value.
The Fully Connected Layer (FC) is connected to all the neurons in the previous layer. The sums of all the weighted products of all the neurons of the previous layer constitute one value of a neuron in this layer. In standard CNN networks, a combination of convolution and pooling layers is used, and then a non-linear activation function is used to eliminate the noisy pixels. Many activation functions can be used, like Sigmoid, Tanh, and RelU. These functions are used after each convolutional layer and before the pooling layer. A flattening layer is usually used before the FC layer in order to rearrange the final convolution results to be consistent with the FC layer.

Proposed transfer learning models
In the current study, three types of CNN architectures that are already pertained models are used. The three models chosen are ResNet50, ResNet101, and EfficientNetB3 because of their efficiency and high performance in image classification. Transfer learning is the concept of using previously trained models for a new problem that is somehow different from the original problem, as shown in Fig. 4-A. ResNets and EfficientNet models are already trained on the ImageNet dataset. In this study, these models will be used for lung cancer prediction.
ResNet50 is another type of CNN that uses the residual units that were first invented by He et  Fig 4-B). By going deeper, the gradient minimizes, and after going too deep, the gradient becomes very small or vanishes. In the ResNet architecture and by using the residual units, there will be connections skipping two or more convolutional layers (3 in ResNet50) preventing the gradient from vanishing.
The architecture of the ResNet50 consists of 50 convolutional layers starting with a convolutional layer of 64 filters of size 7*7 using a stride of 2. The next layer is the max pooling layer (stride = 2) to minimize the convolution size. After that, there are three convolution layers with 64 filters of size 1*1, followed by 64 filters of size 3*3, and 256 filters of size 1*1. The next four convolutional layers consist of 128 filters of size 1*1, followed by 128 filters of size 3*3, and 512 filters of size 1*1. The next layer has 256 filters of size 1*1, followed by 256 filters of size 3*3, and 1024 filters of size 1*1 (this combination is repeated 6 times). The final convolution layers contain 512 filters of size 1*1, 512 filters of size 3*3, and 2048 filters of size 1*1. The final layer of the ResNet50 is the FC layer, or the average pooling layer which includes 1000 samples (the final feature vector) with a "Softmax" activation function to classify the image into the corresponding class. RenNet101, on the other hand, has 101 layers and is trained on the ImageNet dataset. It includes 44.5 million training parameters.
Tan and Quoc 33 proposed the idea of EfficientNet based on CNN architecture and the concept of scaling all dimensions (depth, width and resolution) using the compound coefficients. As a result, they created a family of EfficientNet architectures with high accuracy and smaller size. EfficientNet proved its computational efficiency which exceeded all other previous models (ResNets, Xception, NasNet, Inception, etc.). Compound scaling (Fig 4-C) is used to uniformly scale the three dimensions of the network, allowing the model to act in a dynamic way according to the input size (the bigger the input size, the deeper the network).

A.
B. C.

Figure 4. Deep network main concept: A) Transfer learning concept, B) Residual unit of ResNet50 network, C) EfficientNet compound scaling concept Dataset
The chosen dataset is the Chest CT-Scan images dataset available from Kaggle 60 .The dataset consists of three separate folders: the training dataset, the validation dataset and the test dataset with a separation ratio of (70% for train, 20% for validation and 10% for testing). The training dataset The dataset is used to classify lung cancer into different categories (which is the main challenge of this dataset). There are 4 different classes in this dataset, including adenocarcinoma, large cell carcinoma, squamous cell carcinoma and the normal case. Fig 5, displays examples of the training dataset from various categories and illustrates the similarity between them (such as adenocarcinoma and large cells), so distinguishing between those two types requires a robust classifier (which is the main reason for choosing deep learning). The similarity between classes is the main challenge of this dataset. However, its size is low, and to address this problem, the data augmentation process will be used.

Normal Adenocarcinoma
Large cells Squamous cell

Proposed architecture and parameter selection
The main steps of the lung cancer diagnosis system are described in Fig. 6. First, the lung CT image dataset is obtained. To be consistent with the input layer of the deep learning networks used, the training, validation, and test sets are pre-processed using many image processing steps, including RGB conversion and resizing into 224*224. The training set is also manipulated using the data augmentation step, which rotates, flips, and zooms the lung CT image to obtain different versions of the same CT image (this step aims to increase the number of training images and learn the model on different degradation levels of the same image, which can prevent overfitting and improve the training stage). Flipping is applied using horizontal flipping, zooming is applied using a zoom range of 0.05, and rotation is applied using a rotation range of 0.05. The training and validation sets are then fed into three different models (ResNet50, ResNet101 and EffecientNetB3). The reason for choosing these models is their efficiency in image classification tasks (EffecientNetB3, ResNet50 and ResNet101 are very common types of deep models as Table 2 shows). EffecientNetB3 is considered a lowcomputational deep model. The transfer learning approach is applied in order to retrain the same deep learning pre-trained models on a specific problem (The lung cancer diagnosis problem). The transfer learning will be applied with extra layers to the deep network architecture.
The architecture of the three proposed deep learning models includes the following layers: 1. The base model (which will be one of the following: (ResNet50, ResNet101, EfficientNetB3)). 2. Batch normalization layer 3. Dense layer (fully connected layer) with 256 neurons, and 'Relu' activation function. 4. Dropout layer with a dropout rate of 35%. 5. Classification layer (Dense layer) with 4 neurons representing the targets, and a 'Softmax' activation function. The selected training parameters are:  All models will be compiled using the Adam optimizer (learning rate of 0.01).  The categorical cross-entropy loss function is used (since the problem is a multi-class classification problem).  The accuracy is chosen as the performance metric.  The used batch size is 50.  The patience factor is 5 (the number of epochs to wait before stopping the training process if the monitored metric does not improve). The monitored metric is validation accuracy.  The reduction factor for the learning rate is 0.5.
The performance evaluation process includes computing the training accuracy, validation accuracy, test accuracy, training loss, validation loss, test loss, training time per epoch, precision, recall, and F1-score.
After training the three different models, transfer ensemble learning is used to fuse the trained models together in order to get the best performance of all models. The stacking method is used, and the performance of the resulting deep ensemble models is evaluated.

Proposed training scenarios
The previous experimental part leads us to the following training and evaluation scenarios: 1. Training ResNet50-Dense-Dropout model using the training set and evaluating it using the evaluation set.
2. Test the trained ResNet50-Dense-Dropout model using the test set and evaluation metrics.
3. Training the ResNet101-Dense-Dropout model using the training set and evaluating it using the evaluation set.
4. Test the trained ResNet101-Dense-Dropout model using the test set and evaluation metrics.
5. Training the EfficientB3-Dense-Dropout model using the training set and evaluating it using the evaluation set.
6. Test the trained EfficientB3-Dense-Dropout model using the test set and evaluation metrics.
7. Apply score level fusion of the three trained models and evaluate the fused model.
9. Test the trained ensemble model using the test set and evaluation metrics.

Experimental results
All models are trained according to those previous scenarios. The training and validation accuracy, along with the training and validation loss, is computed through the training epochs. The best validation value for each scenario is also computed.         Table 4 proves that the EffiecientNetB3-Dense-Dropout model achieves the best results with 94% average accuracy. However, using the scorelevel fusion of all models increased the precision value by 1%, whenever the recall and F1-score remained the same. Table 4 also proves that the best class precision corresponds with the "Normal" class. The best recall value is related to the "Squamous" class, while the "Normal" class achieves the best F1-score. In all practical scenarios, the performance of ResNet101 is better than the corresponding performance of ResNet50.  Table 5 proves that the ensemble model has the best validation accuracy (99.44%) with an enhancement of 6.44% compared to the EfficientNetB3 and ResNet101 models. However, the ensemble model performance exceeds the performance of ResNet50 by 18.44%. The efficientNetB3 and ResNet101 models have similar validation accuracy, which is also the same for the fused model. However, making an ensemble of all these individual models achieves a validation accuracy of 99.44%. Table 6 includes a comparison between the proposed methods and the related work. The comparison proves the high performance and efficiency of the current system among other state-of-the-art studies.

Limitations
Despite the improvement in lung cancer prediction provided by the current study, there are some limitations, including the use of small data sizes and the use of specific pertained models. Lung images need some preprocessing steps like image segmentation in order to extract the region of interest (ROI) or lung tissues.

Conclusion
The current research introduced theoretical and practical studies of cancer-related deep-learning methodologies. The theoretical part introduces an analysis and comparative study of the previous deep-learning cancer-related research. Many different types of cancer are also considered (lung, breast, colon, stomach, brain, skin, and so on). The study also compares different types of cancer datasets and their contributions to cancer research. Cancer prediction, cancer prevention, cancer diagnosis, cancer classification, and many other applications of deep learning models are also studied and discussed.
To address the problem of the low accuracy of the current lung cancer prediction systems, a new ensemble transfer learning and score-level fusion of three powerful deep learning architectures was implemented and tested. Ensemble learning was chosen in order to improve the performance of lung cancer prediction systems. A multi-class dataset, including four different classes, was suggested in order to make the trained model more reliable.
In the first step, the CT lung image dataset was pre-processed and the data augmentation process was applied in order to increase the dataset. Then, three different deep learning architectures were designed based on the EfficientNetB3, ResNet50 and ResNet101 models. The dense layers, dropout layers and classification layers were also added to each individual model. The training set (70% of the entire data) and the validation set (20% of the entire data) were used to train and validate the three models. After that, the score level fusion was used to fuse the decisions of the three models. Finally, an ensemble of the three models was built and trained using the stacking methodology.
Experiments show that ResNet101 and EfficientNetB3 have similar performance in all training scenarios. However, ResNet59 has a lower accuracy. The score-level fusion increased some lung cancer class accuracy but the overall accuracy was almost the same as EfficientNetB3. The ensemble learning increased the accuracy by 6.44%.
Future studies can benefit from the theoretical comparison of the cancer deep models. This information can be used as a guide for future studies. The practical study can also be used by physicians as a medical support tool for the prediction of lung cancer based on CT scan images.
The current study's main limitation is the small data size. Lung images need some preprocessing steps to extract the ROI of lung tissues.
In future work, other deep learning models can be used in the same ensemble, and their performances can be compared. The next study can also focus on increasing the data size and comparing the current methodology to other types of cancer. Other future work can focus on applying some preprocessing steps like image segmentation in order to concentrate the deep learning on the effective parts of images and not the entire image.