An Effective Hybrid Deep Neural Network for Arabic Fake News Detection

: Recently, the phenomenon of the spread of fake news or misinformation in most fields has taken on a wide resonance in societies. Combating this phenomenon and detecting misleading information manually is rather boring, takes a long time, and impractical. It is therefore necessary to rely on the fields of artificial intelligence to solve this problem. As such, this study aims to use deep learning techniques to detect Arabic fake news based on Arabic dataset called the AraNews dataset. This dataset contains news articles covering multiple fields such as politics, economy, culture, sports and others. A Hybrid Deep Neural Network has been proposed to improve accuracy. This network focuses on the properties of both the Text-Convolution Neural Network (Text-CNN) and Long Short-Term Memory (LSTM) architecture to produce efficient hybrid model. Text-CNN is used to identify the relevant features, whereas the LSTM is applied to deal with the long-term dependency of sequence. The results showed that when trained individually, the proposed model outperformed both the Text-CNN and the LSTM. Accuracy was used as a measure of model quality, whereby the accuracy of the Hybrid Deep Neural Network is (0.914), while the accuracy of both Text-CNN and LSTM is (0.859) and (0.878), respectively. Moreover, the results of our proposed model are better compared to previous work that used the same dataset (AraNews dataset).


Introduction:
This Information or claims that have been verified as incorrect are called fake news. This phenomenon is a serious problem because its spread is rapid and thus threatens societal peace 1 . In recent years, interest in addressing fake news (misinformation) and reducing its problems has attracted the attention of many researchers by employing artificial intelligence techniques 2 . Although many studies are conducted to identify English fake news, detecting Arabic fake news remains rather underdeveloped. This is due to the lack of available dataset and the difficulty of dealing with the challenges of the Arabic language 3 .
Predicting the probability that an article, story, or publication is intentionally misleading is called Fake News Detection" FND". The primary concerns of the Natural Language Processing (NLP) research community are the detection of misinformation 4 . Within the tasks of NLP, the news article is analyzed and thus classified into false or true, or it can be classified according to the nature of the dataset (according to the concept of text classification) 5 . On the other hand, many studies rely on user-features (Writer-based features) and content-features (Content-based features) in the news article classification process 6 .
Because fake news is a source of concern to individuals, communities, and governments, it is constantly discussed. Before the Internet era, this news was transmitted and distributed through journalism and magazines, with a focus on exciting news such as rumors, crime, events, and others 7 . With the advent of the Internet, the threat has become greater to this phenomenon, as people intentionally or unintentionally tend to spread some false news 8 . On the other hand, the original propagators of false news are the ones who seek to target the innocent 9 . For all of the above and to reduce the problem of fake news, it is necessary to detect fake news at an early stage. Unlike previous studies that use traditional machine learning methods to detect Arabic fake news, this study focuses on deep learning techniques to predict Arabic misinformation. The proposed Deep Neural Network in this study uses the Text-CNN architecture to extract relevant features and then focuses on the text sequence through the LSTM architecture. The results proved that the performance of the proposed model is better compared to previous studies and other deep learning methods. Briefly, the contributions of this study can be listed as follows: 1. Employing deep learning techniques instead of traditional machine learning methods to predict Arabic fake news. 2. Building a Hybrid Deep Neural Network based on the characteristics of both CNN and LSTM. 3. Improving the accuracy of predicting Arabic fake news compared to previous studies which used the same dataset.

Paper Outline:
This study is organized as follows: Section 2 reviews the works related to fake news detection. Section 3 presents the theoretical background for the used techniques. In Section 4, the proposed method is explained and introduces the details of the experiments conducted in this work. The results and discussion of the methodology are shown in Section 5. The conclusions are stated in Section 6.

Related Work:
Although many types of research and contributions have been made in building systems based on artificial intelligence to automatically detect fake news, only few of them have been performed to identify Arabic fake news. On the other hand, the use of deep learning techniques is a difficult task in identifying Arabic fake news because there is not enough dataset to build the network. Recently, a large dataset called the AraNews dataset was launched. By examining this dataset, it has been found that it corresponds to the characteristics of deep learning techniques that require large dataset.
The references as 1 , the AraNews dataset was utilized by the authors to build the models. Term Frequency-Inverse Document Frequency (TF-IDF) technique was implemented in this study to extract word vectors or features. Next, Random Forest Classifier, Naive Bayes, and Logistic Regression were applied to predict fake news. The accuracy of the Random Forest Classifier is 0.866 and represents the best results, whereas the accuracy of the other two models is 0.844 and 0.859 respectively. Compared with this study, which used the same dataset adopted in our study, it was found that our proposed model clearly outperformed in terms of accuracy.
The authors in 3 have used machine learning methods and Natural Language Processing (NLP) to determine Arabic misinformation. In this study, Twitter platform was used to collect nearly 1,862 tweets. The study depends on the user-based features, the text-based features, and the content of the tweet in the process of prediction. One of the strong points of this work is that it was not rely only on the content of the text, but also dealt with other important aspects such as the characteristics of the writer or user. On the other hand, the results were not promising due to the dataset was not large enough.
As for the work in 6 , the researchers provided models to identify untrusted tweets using artificial intelligence techniques. This study relied on the Twitter platform to collect the dataset, after which it was labeled manually. Finally, a set of machine learning methods have been applied to predict fake news. Due to the lack of training data in this work, the models did not achieve the required accuracy.
The references as 7 , deep learning techniques and transfer learning (pre-trained model) were applied to detect fake news. The study made use of an English dataset. This dataset is large and sufficient to efficiently build deep learning models as it includes many topics such as politics, economics, culture, and others. Given the characteristics of this dataset, this work achieved promising results through the use of LSTM neural network and Global Representation (GloVe).
The study in 8 involves fake COVID-19 news that is being released on social media platforms, which are exposed to deep learning techniques. In this study, a set of deep learning algorithms was applied to a data set consisting of (10700) news articles that were collected and labeled from Twitter. The results showed that fine-tuning BERT (Bidirectional Encoder Representations from Transformers) outperformed other models and also exceeded other research works that used the same dataset in terms of accuracy.

Theoretical Backgrounds: Dataset
In order to distinguish true news and fake news and for deep learning techniques to be used, it is necessary to have a sufficient dataset. The dataset that has been used in this study is called AraNews dataset and is available at kaggle.com 1 . The AraNews dataset is a large and generally Arabic fake news dataset that was collected from many newspapers on many topics. This dataset was gathered from 15 Arabic countries, as well as the United States of America, and the United Kingdom 10 . No additional features were used in this dataset, and therefore NLP was performed on the textual content of each article. Some common words in the dataset are shown in Fig. 1.

Figure 1. Word Clouds
Text Classification Data mining with documents, articles, or texts is usually called text mining. Text classification is the primary task in the concept of text mining, which is the process of separating texts into a pre-defined category for each document or article 11 . In other words, text categorization is the process of identifying the target class of the text (document, article, tweet, and others) based on machine learning algorithms. Fig. 2 illustrates the framework of the text classification concept 12 .

Text-Convolution Neural Network
Convolution Neural Network (CNN) is a class of deep learning and feed-forward artificial neural networks where connections between nodes do not form a cycle 13 . This network is generally used in computer vision and image processing; however, it has recently been applied in NLP such as text classification and sentiment analysis 14 . As in image processing, CNN is reliable in the process of extracting features from the text by determining the kernels and then selecting the most important features through max-pooling process 15 . In brief, Text-Convolution Neural Network (Text-CNN) uses the following steps to classify the sequence data: 1) using pre-trained models such as GloVe to convert word into a word feature; 2) defining kernel and perform convolution process to obtain the local features; 3) applying max over time pooling to find the most important features; and 4) utilizing a fully connected layer to classify the text into fake or real.

Long-Short Term Memory (LSTM)
Recurrent Neural Network (RNN) is one of the most important methods for processing sequential data, so it is commonly used in NLP. This network is called recurrent because the output of the node or neuron at the current time (t) is returned to the node at the next time (t+1) 17 . An RNN suffers from several problems such as vanishing gradient, so many modifications have been made to it 18 .
Long Short-Term Memory (LSTM) is the most important improvement for RNN which largely treated the vanishing gradient problem 19 . This network is better for dealing with long-distance sequences and therefore has better successes in NLP as compared to the RNN. LSTM behavior depends on the use of gates in the network that decide what is to be kept and what should be ignored from the sequence, as shown in Fig. 4 20 . According to the LSTM network, each vector (network parameters) is calculated according to Eq. 1, 2, 3, 4, 5, and 6 below 21 : are hidden layer vectors. is input vector. is forget gate at time t.
are input and output gates at time t. w is weight vector. Tanh is activation function. σ is sigmoid activation function.

Word Embedding
The main goal of word embedding is to convert any word in the text into a vector that represents the input to the deep neural network. Unlike traditional methods (e.g., one-hot encoder or bag of word method) which do not include semantic meaning in the representation process, word embedding is a learned representation for text, whereby words that have the same meaning have a similar representation 21 . A recent trend in NLP is the use of pre-trained models such as GloVe, Fast Text, BERT and others. These models are important in representing words as semantically similar words have similar representation, so they solved the problem of a small dataset. Fig. 5 shows the steps for obtaining the words vector 22 .

Methodology:
In this study, the AraNews dataset is utilized to predict Arabic fake news. As shown in Fig. 6, which represents the methodology of this study, the process of predicting fake news and achieving the goal of this work is carried out through four stages. In the first stage, the dataset is handled. A set of steps have been performed to clean the data which are: removing punctuations, deleting special characters, ignoring white space, deleting any article with less than 10 words, removing Arabic stop word, and Arabic stemming.
The second stage represents the process of converting words into vectors (features) to be entered into the deep neural network. At first, the article is divided into tokens, and the maximum length of the article is set to be 50. Next the padding and truncating are performed for all articles to be the same size. Finally, word embedding is used to get a vector for each token because the semantic meaning of the word is included.
The third step is to build the proposed network architecture. Two architectures of deep learning technologies are combined: Text-CNN and LSTM. The objective of the Text-CNN in the proposed model is to extract the most important features. These important features represent the input to the LSTM architecture. More details about this step are explained in the next subsection.
Finally, this Hybrid Deep Neural Network is evaluated for its quality in predicting fake news. At the same time, the networks (Text-CNN and LSTM) from which this proposed method was created are evaluated separately to find out the effectiveness of the Hybrid Deep Neural Network as compared to these networks. Algorithm 1 sums up all the details mentioned above

Hybrid Deep neural Network Construction
In order to improve predictions of fake news, both Text-CNN and LSTM features are combined. The first layer in the network is the embedded layer through which each article is represented as a row of vectors. It has been identified the maximum length of each article to 50 to ensure that the network receives a fixed length of all articles and also the embedding values of each vector (length of the vector to each word) were selected to be 100. After that, the concept of truncating was applied to each article that exceeds 50 words, and similarly, the padding with zeros is performed for articles less than 50 words thus this layer is a matrix of size 50 * 100.
Then, a convolutional layer was added to extract local features. The number of kernels used in this layer is 32, with a size of 3 for each one (i.e. three words are considered in each convolution). Through these kernels and Rectified Linear Unit Activation Function (ReLU activation function), the convolution layer produces many features and thus the max-pooling operation is performed. The importance of the max-pooling layer is to divide input tensors or features (from the convolution layer) into subtensors of n dimension. Then the highest value in each subtensor is selected to create the most important features.
The traditional method is that the input to the LSTM deep neural network is vectors generated by the embedding layer. Instead, the most important features created by the max-pooling layer are entered into the LSTM to measure the long-term dependency of sequences. It is important that the number of neurons in the LSTM network is compatible with the input, so it was determined to be 256 neurons.
Finally, three layers have been added. The first layer is a fully connected layer consisting of 100 neurons with a hyperbolic tangent activation function (Tanh activation function). In this layer L2 regularization (L2=0.02) has been adapted to reduce the overfitting problem. It was optimized the network by adding a dropout layer with a probability equal to 0.2. Similar to the concept of the regularization, this layer reduces overfitting. The last layer is the output layer to classify the article as a fake or not-fake. In the output layer, the sigmoid activation function has been used.

Results and Discussion:
To fairly evaluate the quality of the proposed system, the experiment has been conducted on AraNews dataset. This dataset consists of 20, 300 articles which are labeled as fake and non-fake. After performing the preprocessing, the dataset was reduced to 16,600 articles, of which 8406 are fake and 8194 are not-fake. Then the dataset is randomly divided into approximately 0.2 testing set and 0,8 training set. The tokenization and padding were then conducted to obtain an equally sized sequence for each article. Table 1 illustrates the details of the corpus. The Hybrid Deep Neural Network has been implemented through Keras. Seven layers were constructed: Embedding layer, Convolution layer, Max pooling layer, LSTM layer, Dense layer (Fully connected), Drop-out layer, and output layer. Three activation functions were employed: ReLU activation function for the third layer (convolution layer), Tanh activation function for the sixth layer (Dense layer), and Sigmoid activation function for the seventh layer (output layer). Next, the network was trained for 10 epochs using the Adam optimizer for loss function with a batch size equal to 32. Table  2 reviews the main parameters and Table 3 shows a summary of the proposed Hybrid Deep Neural Network.  Both LSTM and Text-CNN have been used for comparison because the Hybrid Network is a combination of both networks. The results showed that the proposed method outperformed the Text-CNN and LSTM networks in terms of accuracy, as shown in Table 4.  In order to prove the effectiveness of our proposed model, it is necessary to compare it with previous studies. Not many studies have been conducted on the AraNews dataset, however, it is found that, compared to the study in 1 that used the same dataset, our proposed network is better in terms of prediction accuracy.
Finally, to display the results visually, Fig. 8, explain the convergence of the training accuracy and validation accuracy for the Hybrid Deep Neural Network, Text-CNN, and LSTM.

Conclusion:
It has become necessary to focus greater emphasis on the reduction of this issue due to the rising fears about the spread of fake news, which in turn poses a threat to the security of societies. With the development of technology, it is possible to rely on deep learning and NLP to detect fake news. This study focuses on Arabic fake news detection based on text-features. A Hybrid Deep Neural Network is proposed which represents a mixture between a Text-CNN architecture and an LSTM architecture. In our proposed model, the Text-CNN is used to extract the relevant features from news article, whereas the long-term dependencies of the sequences are considered by LSTM. Then the network was trained on an Arabic fake news dataset called AraNews dataset. During the performance evaluation stage, the results showed that the prediction accuracy of the Hybrid Deep Neural Network is much better as compared to the Text-CNN, the LSTM, and previous studies which used the same dataset. In addition, this study has processed many limitations and challenges. Among these challenges is that most of the Arabic news articles are written in slang language, Lack of resources, and others.
In future works, Attention architecture can be used to predict Arabic fake news. In addition, the next aim is to use word embedding methods or socalled pre-trained models such as GloVe, Fast Text,