A Crime Data Analysis of Prediction Based on Classification Approaches

: Crime is considered as an unlawful activity of all kinds and it is punished by law. Crimes have an impact on a society's quality of life and economic development. With a large rise in crime globally, there is a necessity to analyze crime data to bring down the rate of crime. This encourages the police and people to occupy the required measures and more effectively restricting the crimes. The purpose of this research is to develop predictive models that can aid in crime pattern analysis and thus support the Boston department's crime prevention efforts. The geographical location factor has been adopted in our model, and this is due to its being an influential factor in several situations, whether it is traveling to a specific area or living in it to assist people in recognizing between a secured and an unsecured environment. Geo-location, combined with new approaches and techniques, can be extremely useful in crime investigation. The aim is focused on comparative study between three supervised learning algorithms. Where learning used data sets to train and test it to get desired results on them. Various machine learning algorithms on the dataset of Boston city crime are Decision Tree, Naïve Bayes and Logistic Regression classifiers have been used here to predict the type of crime that happens in the area. The outputs of these methods are compared to each other to find the one model best fits this type of data with the best performance. From the results obtained, the Decision Tree demonstrated the highest result compared to Naïve Bayes and Logistic Regression.


Introduction:
Crime is an offense against the society that is often pursuing and punishable by the law. Criminals have been known to commit crimes in a variety of locations and any manner. All over the planet, criminal activity has posed a threat to society. Law enforcement authorities generate a huge volume of crime data per year, and it is a major challenge for researchers to find an effective model or technique to manage such complicated data to implement decisions for preventing potential 1 .
Geo-location services, combined with new approaches and techniques, can be extremely useful in crime investigation. It promotes a more holistic approach to criminal investigation, mapping, proactive decision-making, and crime prevention 2 . Machine learning provides powerful techniques and algorithms for this action. It is the science of instructing machines to make decisions without the use of humans. Machine learning is being used by law enforcement to better evaluate crime data and try to predict potential future events based on crime pattern recognition. Where predictive analysis is a statistical method for creating models that forecast future events 3,4 .Typically, these predictive models are evaluated using a set of metrics. speech recognition 5 , industry 6 , optical network 7 ,medical 8 are all examples of how machine learning has been used lately The crime rate in Boston has risen significantly in recent years, especially in cases of property crimes such as burglary, theft, and vehicle jacking. Boston is the largest and most populous city in the United States, with several districts. As a result, the FBI's Uniform Crime Reports (UCR) currently rank it as the most dangerous city in the country 9 .The aim is focused on comparative study between three supervised learning algorithms, which are decision tree, logistic regression, and Naive Bayes based on the result of the models to predict the type of crime in the area to be chance for police to take necessary actions and also hope to raise people's awareness about the security in a certain area.
This contribution helps in obtaining better results in terms of time and effort instead of the manual traditional methods followed in the police stations themselves, and this case has prompted many crimes due to the lack of information available to this security cadre. A speedy implementation for crime problem can be provide or creates a great degree of safety for the citizen.
The questions that the research answers are how the crime type can be predicted from the available data? What are the theoretical concepts of modeling methods that applied in the field of crime prediction?

Related Work
Tahani Almanie et al. 10 focused on finding temporal and spatial criminal hotspots using a set of real-world datasets of crimes include Los Angeles and Denver cities. Certainly, identifying ties between elements of crime will significantly help to predict possible hazardous hotspots at a clear point in the future. The strategy was therefore aimed at concentrating on three main elements of crime data, which are the kinds of crimes, the timing of incidences, and the place of crimes. Using the Apriori algorithm on datasets to classify all possible patterns of crime often regardless of the type of crime committed, then there was using the Naïve Bayesian Classifier and Decision Tree Classifier to construct two separate classification models, to forecast the possible form of crime in a particular place over a particular period in the future. It achieves an accuracy of 51 percent in Denver crime prediction concerning the Naïve Bayesian classifier, while it hits 54 percent for Los Angeles. On the other hand, with 42 percent for Denver and 43 percent for Los Angeles, the Decision Tree Classifier records less prediction accuracy.
Félix Mata et al. 11 emphasis was on designing mobile information systems in urban environments for routing and urban planning. It generates a hybrid solution using semantic analysis and classification algorithms to find safe routes based on data from social media and official police reports. The Bayes algorithm uses data submitted by the mobile application (origin and destination points) to return a path that avoids locations where crime has happened. Jakaria Rabbi et al. 12 the linear regression model is used to predict Bangladesh's potential crime patterns. The actual crime dataset is compiled from various sources of the Bangladesh Force Police. The model of linear regression is trained on the real dataset. Crime forecasting for, robbery, murder, persecution of women and children, abduction, theft, and other crimes in Bangladesh's various regions is carried out after training the model. This work is beneficial for Bangladesh's police and law enforcement agencies to anticipate, prevent or address potential crime in Bangladesh.
Bhavna Saini et al. 13 developed module that offers an interactive image to navigate hither and thither the crime scene using Google Maps and can aid the analyst evaluates the protection of an area by showing locations that can be the focus for the nearer attacks. The methods of classification used to forecast crimes are K-Nearest Neighbor and Naïve Bayes to supports law enforcement agencies. The data is acquired from the official United Kingdom (U.K) website. The dataset used for the work is accurate, true, and credible, it includes 11 attributes in total.
Atharva Deshmukh et al. 14 provided an application that used high-level machine learning information at wholly different times of day and night to crime average during the zones of the city. With the aid of the latest crime data collection, the application will be able to predict new crime trends in the space. Predicting the crime hotspots primarily aimed at helping people to distinguish between safe and dangerous areas when traveling. Django Rest framework and React Native were used to implement the application.

Methodology
The dataset used in this analysis is a collection of records from the crime incident report database that spans half of 2015 to the first half of 2018 which classifying the sort of incident as well as providing details about when and where it occurred 15 . It's in comma-separated values (CSV). The following Tab.1 demonstrates the various machine learning methods applied to the Boston dataset.

Pre-processing Phase
The data pre-processing stage is one of the model's most important steps. From the standpoint of machine learning, this phase is critical, as data pre-processing accounts for 60 to 80 percent of the entire analytical pipeline in a typical machine learning project 16 . Data is pre-processed from missing data by using the mean of all values of that attribute and then converted into a dimensionless shape using the normalization technique in the proposed solution. Using the feature scaling method, the raw data sets were normalized in a scale range of 0 to 1. The normalization of data is defined by the equation below 17 : is the raw value of the chosen sample in the corresponding data series x.
is the raw data value with the highest value in the respective data series x.
is the smallest raw data value in the given data series x.

Machine Learning
Model building is one of the most crucial tasks in a phase of machine learning methodology. To build the predictive model, Decision Tree(DT), Logistic Regression(LR), and Naive Bayes(NB) are trained and evaluated. For training and testing, the classification model used percentage split methods. In this method, where 80% of the data is used as training and the remaining 20% testing. This research focuses on prediction using: Logistic Regression is a supervised learning method. It can be used to model and forecast continuous variables. When dealing with a classification problem, logistic regression is used. It generates a binomial result by measuring the likelihood of an occurrence occurring or not occurring based on the values of input variables. The following are some of the benefits of logistic regression: ease of implementation, computational efficiency, training efficiency, and regularization ease. Input features do not need to be scaled 18 .
Where 0 is intercept and 1 , 2 are a slopes against independent variable 1 -.
Naive Bayes is a simplistic probabilistic classifier that constructs a set of possibilities by counting the frequency and combinations of data values. The Bayes Theorem was used to estimate the likelihood that a given feature set belongs to a specific label 13 . Mathematically it can be stated ( ℎ ⁄ ) the probability of event (x) occurring if (h) is true.
(ℎ) and ( ) are of probabilities of observing of (x) and (h) independently of each other. Decision Tree is a supervised learning method of classification. In DT the dataset is divided into smaller parts, and the classification model creates a tree from it. Each leaf represents an outcome in a tree where each node symbolizes a feature, and each branch denotes a decision(rule). Low-importance features (attributes) are found in the lower levels of trees 19 . At each stage of the procedure, with the support of two functions, the DT selects a feature that best splits the data 20 :  Gini impurity calculates the likelihood of incorrectly classifying a random sample.
 Information gain aids in deciding which feature to split next. The value of information gained can be calculated using entropy, which is defined as: where symbolize the percentage of each feature present in the child node after a split.

Crime Prediction
Security plays a main role in any society and should always be guaranteed to help people work in efficient and effective ways. Crime prediction process can be done using the old crime records where detection of the crime types are used to identify and analyze the crime that occurred in area, this process is used to provide the information to reduce those crimes 21 .

Result and Discussion:
methods of percentage split are used here in which the dataset is divided into two group the train and the test. Then, the evaluation parameters are computed to present the overall performance of the system. Evaluation is done by comparing the predicted class labels with the actual class labels are used to estimate a classifier's success. Evaluation measurements that used are (precision, recall and f1-measure). Performances of each classifier model are presented in Tab. 2. The recall, precision, and F1score were calculated in the evaluation process, with TP denoting true positives, FP denoting false positives, and FN denoting false negatives. These classification metrics are as follows: Precision=TP/(TP+FP) Recall= R=TP/(TP+FN) F1score= 2*Recall*Precision/(Recall + Precision)

Figure1. Graph chart of performance measures
The graph chart in Fig.1 that could be created from Table-shows that DT algorithm has better results than other, which can be easily noticed as Precision, Recall and F1-measure values using DT algorithm are greater than other. The results reveal best option after experimenting with various modeling combinations, in which case will get a fairly robust tree that uses longitude and latitude. This is reasonable because the amount and type of crime are strongly linked to the location. This implies that only the crime scene's location can be used to create an ideal model.

Conclusion:
Three machine learning techniques namely DT, NB, and LR were applied to forecast types of crime. The results show that DT outperformed the other machine learning techniques as shown by its metrics values. These basic findings have encouraged us to relate crime with location factors more than ever and in near future, look forward to exploring their connection. For the time being, we sincerely hope that our findings will be a way to prevent crime or reduce the crime rate that occurs with a specific location, which will help police enforcement operations and thus maintain the safety of everyone.
In the future, other factors related to the crime can be adopted, for example determining the gender or identity of the offender.