Classification of Diseases in Oil Palm Leaves Using the GoogLeNet Model

The general health of palm trees, encompassing the roots, stems


Introduction
The oil palm industry plays an essential role in the economy, while its product serves as a fundamental ingredient in cooking oil and enhances communal welfare through exportation which creates more employment opportunities.In 2016, Indonesia exported 24.15 million tons of crude palm oil (CPO) for USD 14,744 million 1 , surpassing the value of all other commodities sold.The oil palm plantation sector provides livelihoods for approximately 16.2 million citizens, constituting a primary source of income.To sustain the quality and quantity of palm product, contributing significantly to the GDP of the country 3 .
In the oil palm industry, effective plant management is crucial for increasing productivity and income.However, the control of pests and diseases persists as a conspicuous challenge within cultivation activities.The advent of diseases poses an impediment to the growth and development of palm trees, leading to decreased productivity.Recent research by Satia, Firmansyah, and Umami has highlighted the importance of shrubs in regions where palm trees are grown.Additionally, palm trees thrive in tropical climates with consistent precipitation as annual rainfall patterns influence the growth and yield of their fruit 4 .Disease in oil palm trees manifests in three distinct forms depending on the location of the symptoms, such as the roots, basal stems, or leaves.This research primarily focuses on detecting diseases that predominantly affect leaves compared to other parts.However, exploration of the basal stem is important due to its potential to inflict substantial damage on the plants, specifically through basal stem rot disease, which can be detected using image processing techniques 5 .The early infestation stages usually display signs on the leaves, where fungi, parasites, or viruses incubate before outward symptoms become evident 6 .Photosynthesis, a crucial process determining palm productivity, mainly occurs in the leaves.On the contrary, these leaves are susceptible to invasion by pests and disruptive organisms, which fundamentally compromise the level of productivity 7 .The well-being of the trees is essential in achieving maximal yield considering that diseases often impede oil production.Diseases can infiltrate oil palm trees at any developmental stage, but they are commonly recognized in mature plants 6 .To sustain high yield, adequate plant maintenance and disease control are essentially required 8 .Diseases manifesting on leaf surfaces can lead to reduced oil palm fruit production, culminating in economic losses.On the other hand, a substantial proportion of farmers lack adequate awareness concerning prevalent diseases and their mitigation strategies 9 .General plant disease identification relies on visual symptoms recognizable by agricultural experts, facilitating an effective treatment process.There is a need for urgent development of novel field-based diagnostic techniques in locations with no readily available experts 10 .Although farmers have access to various information about oil palm leaf diseases, their direct comprehension remains confined to diseases manifesting within the plant 11,12 .This research aimed to rectify the existing identification errors by using image-based analysis to detect diseases or pests afflicting the leaves.Furthermore, it focuses on creating an expert system for disease detection through visual symptoms and data input, compared to previous investigations incorporating the agricultural Expert System for Identifying Diseases of Oil Palm Plants 12 .By employing image processing in conjunction with the Support Vector Machine (SVM) method, a highprecision solution is offered for identifying diseases in oil palm leaves, providing both diagnoses and control strategies.This approach involves capturing leave images, after which the system deciphers patterns based on training data. 13.In this research, although the Convolutional Neural Network (CNN) method has been deployed to classify diseases in oil palm leaves, the results failed to consistently meet anticipated accuracy levels 14 .This situation triggered the necessity for a refined approach or model to accurately detect diseases in oil palm trees.Among diverse deep learning models, the GoogLeNet architecture created by Google within the Convolutional Neural Network (CNN) domain has emerged as a promising option.The GoogLeNet architecture, due to its training on millions of images, has secured victory in the ILSVRC competition in 2014 15 and also achieved high accuracy reaching 99.35% 16 in previous investigations related to Leaf Plant Recognition and Disease Detection.Foliar diseases in oil palm trees can be categorized through image-based analysis using GoogLeNet Architecture with meticulously selected hyperparameters.Therefore, this research aimed to determine the superior architecture for classifying captured foliar images, enabling the recognition of diseases impacting oil palm leaves based on their distinctive textures.

Research Architecture
The architectural model employed in this research is presented in Fig. 1.

Figure 1. Research Architecture
As shown in Fig 1, the architectural model devised for this research commenced with data collection, which applied the technique explained in the subsequent paragraph.The next was pre-processing, aimed at segregating appropriate and unsuitable images to be used as samples.This step helped to identify and eliminate unsuitable images that did not meet the established criteria.The goal was to guarantee accurate samples aligning with the research objectives.
Pre-processing might encompass actions such as data cleaning, noise/interference removal, contrast or brightness normalization, and image cropping or resizing to fit the pre-set requirements.Additionally, it ensured the used images were of high quality and relevant for the analysis to be conducted.In the step of data splitting for training and testing, the GoogleNet architecture was employed to construct a deep learning model for classifying disease types in oil palm leaves following pre-defined requirements.The generated model was subjected to thorough evaluations using Eqs 1-4.

Data Collection Technique
To collect data for this research, images were captured at a 30cm distance using the camera of a Vivo Y35 mobile phone.This process was conducted through direct observations, specifically by photographing oil palm leaves within the primary tree plantations at Dolok Baja, Tanah Jawa District, Simalungun Regency, North Sumatra Province.Insights from conversations with professionals working in the plant protection sector at Pusat Penelitian Kelapa Sawit (PPKS) further contributed to the dataset.The collected data encompassed healthy, bagworminfested, and fire caterpillar-infested oil palm leaves.After gathering the necessary materials, each leaf type was photographed.The methodology employed for capturing images at a 30cm distance involved initially positioning the leaves on HVS paper.Various caterpillar species, including the polyphagous bagworms (Cremastopsyche pendula), infest and cause damage to oil palm plantations.Similarly, Metisa plana commonly infect palm trees alongside cocoa, sago, acacia, coffee, tea, and alzazia leaves.The surfaces of leaves are often directly covered by these caterpillar sacs 17,18 .The fire caterpillar species, such as Setohosea asigna, Setora nitens, Darna trima, Darna diducta, Darna brodley, Susi malayana, Birthose bisura, Thosea vetusta, and Olona gater, pose a substantial threat to young oil palm plantations by devouring their leaves 19 .

Data Analysis
Three different categories of oil palm leaves were identified as healthy, bagworm-infested, and fire caterpillar-infested leaves.Following the data collection process, it was discovered that each category contributed 410 samples, forming a total of 1,230 palm leave images suitable for this research.The dataset was divided into two sections to accommodate both training and testing data.A comprehensive comparison of the data split percentages at a 70:30 ratio is presented in Table 1.

Hyperparameter Initialization
Hyperparameters constitute pivotal elements for optimizing deep learning models.Table 2 presents the array of hyperparameters applied in this research.

Performance Measures
The confusion matrix stands as a fundamental tool to assess the accuracy of a predictive model.This matrix, which is compared against the initial input class, elucidates the actual and predicted classification results, and the representation can be seen in Fig 3 21,22 .
The accuracy of the method reflects the precision of the projected values 23 .Precision denotes the repeatability of the measurement, or the proportion of accurate forecasts, often expressed as a percentage 24 .The recall indicates the level of correct responses identified 25 .To provide a balanced average result, precision and recall are combined to yield the f1-score.These metrics are calculated using the following formulas, where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively [26][27][28] .Batch size 32 achieved the highest accuracy at 0.932249 (M4), while the lowest performance was observed at 0.558266 (M3).In batch size 64. the highest accuracy was attained at 0.831978 (M16), while the lowest was at 0.349593 (M7).In the context of optimizer selection, RMSprop exhibited greater reliability than Adam.The best accuracy value produced by Adam was 0.883469 (M11), and the lowest was 0.349593.(M7).The maximum accuracy for RMSprop was 0.932249 (M4), and the lowest was 0.417344 (M6).A comparison of the employed learning rates, 0.001 and 0.009, indicated that 0.009 generated superior results.The accuracy gain between both rates was discovered to be 0.932249 (M1) and 0.742547 (M10), respectively.

Discussion
In this research, deep learning was closely related to the computational time required for model creation, where the GoogLeNet deep learning model was created using the free version of Google Colab and a GPU runtime.Among the experimental models, M3, M6, and M8 exhibited the shortest computational time at 2 minutes.However, these three models yielded unsatisfactory accuracy results, with the highest value being attained by M3 at 0.558266.Across the 16 experiments, computational time remained within a 10-minute range, with the dominant duration ranging from 2 to 7 minutes.Despite extended computational times for M5 and M11 compared to other experimental models, the highest accuracy achievement was obtained in M11, reaching 0.883469.Additionally, the precision achieved within 6 minutes of experimentation for M12 and M15 was 0.775068, which was good but not sufficiently adequate.These results were consistent with the previous models (M7, M10, and M14) tested within 5 minutes of computation, achieving the best accuracy at 0.742547.The performance of GoogleNet was deemed satisfactory with a maximum accuracy of 0.831978 in M16 while requiring only 4 minutes of computation for M9, M13, and M16.M4 delivered the highest accuracy compared to other experiments, reaching 0.932249 in a duration of 3 minutes.

Conclusion
In conclusion, this research provided new insights into the role of the Adam Optimizer as a development optimization model, which was a combination of RMSprop and Stochastic Gradient Descent with momentum.These observations indicated that Adam did not perform better than RMSprop.However, the results of the experiment conducted computational time as a secondary priority in achieving optimal accuracy.The number of epochs used in the model training significantly influenced the identification of optimal conditions for achieving maximum accuracy during the training and validation stages.The results further showed that epoch 15 produced better performance than 25.Moreover, a combination of hyperparameters including a batch size of 32, RMSprop optimizer, and a learning rate of 0.009, found in Epoch 15, was identified as the optimal model.This research revealed the critical role of hyperparameters in optimizing deep learning model performance.The results equally emphasized the importance of selecting the appropriate hyperparameter combination to achieve superior accuracy within efficient training periods.These could provide valuable guidance for future research in developing deep-learning models for classifying diseases and pests in oil palm leaves, enhancing general performance and efficiency.Subsequent investigations might involve expanding and refining the model to classify a broader type of diseases and pests.Incorporating data from various geographical locations and agricultural settings tended to enhance the generalization capability of the model.Further exploration of transfer learning methods and the use of other techniques to improve model accuracy could also be conducted.The implementation of this model in the field and its adaptation to the specific needs of farmers and oil palm plantation managers might feature in future research.

Fig 7 Figure 7 .
Fig 7 shows the results of 16 experiments applying the confusion matrix, a tool for assessing the performance of classification models or algorithms.Within this context, the confusion matrix was employed to evaluate the results of 16 distinct experiments aimed at classifying data.This process facilitated a comprehensive assessment of the competency of GoogleNet in data classification, as Asrianda et al.,14 evaluated the performance of a CNN model in classifying palm leaf diseases into six types, comprising curvularia sp, cochiobolus carbonus, capnodium sp, drecshlera nutrient deficiency, and healthy leaves.The dataset used in this investigation consisted of 60 samples, with 10 samples for each type.The results showed a relatively low accuracy of approximately 69%.In contrast, this current research demonstrated remarkable success in classifying pest-infested oil palm leaves, achieving an impressive accuracy of 93.22%, which held significant practical relevance in oil palm plantations.Comprehensive data preprocessing was also conducted in this research, including data augmentation, to enhance dataset quality and diversity.The use of GPU runtime on Google Colab aided in accelerating the model training.Diverse hyperparameter variations in the GoogleNet model, such as learning rate, activation function, and batch size, were meticulously adjusted to determine the optimal combination that could yield the best results.Ensuring a balanced representation of each disease class and healthy leaves within the dataset was essential to prevent bias and guarantee accurate classification.The obtained results demonstrated the capability of the model in accurately identifying and classifying pest-infested oil palm leaves.This achievement positions the model as a potential tool to support pest management in oil palm plantations.The ability of the model to classify various diseases and pests would assist farmers and plantation managers in promptly addressing plant health issues and improving productivity.

Table 3 ,
yielding 16 models when combined, each with different results.

Table 3 . Experiments Results GoogLeNet Model with a Combination Hyperparameter
-Conflicts of Interest: None.-We hereby confirm that all the Figures and Tables in the manuscript are ours.Furthermore, any Figures and images, that are not ours, have been included with the necessary permission for republication, which is attached to the manuscript.