Interior Visual Intruders Detection Module Based on Multi-Connect Architecture MCA Associative Memory

: Most recent studies have focused on using modern intelligent techniques spatially, such as those developed in the Intruder Detection Module (IDS). Such techniques have been built based on modern artificial intelligence-based modules. Those modules act like a human brain. Thus, they should have had the ability to learn and recognize what they had learned. The importance of developing such systems came after the requests of customers and establishments to preserve their properties and avoid intruders’ damage. This would be provided by an intelligent module that ensures the correct alarm. Thus, an interior visual intruder detection module depending on Multi-Connect Architecture Associative Memory (MCA) has been proposed. Via using the MCA associative memory as a new trend, the proposed module goes through two phases: the first is the training phase (which is executed once during the module installation process) and the second is the analysis phase. Both phases will be developed through the use of MCA, each according to its process. The training phase will take place through the learning phase of MCA, while the analysis phase will take place through the convergence phase of MCA. The use of MCA increases the efficiency of the training process for the proposed system by using a minimum number of training images that do not exceed 10 training images of the total number of frames in JPG format. The proposed module has been evaluated using 11,825 images that have been extracted from 11 tested videos. As a result, the module can detect the int ruder with an accuracy ratio in the range of 97%–100%. The average training process time for the training videos was in the range of 10.2 s to 23.2 s.


Introduction
Entry detection is defined as the detection of a person or vehicle attempting to gain unauthorized access to a protected area. The Intrusion Discovery Systems (IDS) are divided into internal IDS and access control systems. The input system will be used to detect boundary entry, internal movement sensors, and proximity sensors. The interior is the body's most effective defense against internal threats. By using the access system, an alarm can be generated by unauthorized actions or the unauthorized presence of internal or external parties. Internal monitoring and monitoring is a key objective of internal IDS 1,2 .
Normal IDS system performance is limited under low light conditions. Although many ways to improve the image of low light have recently been proposed, at 3,4 , the IDS function does not work when there is no light. High power consumption is inevitable when lighting an empty indoor space to get high-quality photos. Common non-visual sensors such as passive infrared (PIR) and heat sensors can be used to detect objects under low light conditions 5,6 .
The study suggested that machine reading be used to identify attackers. There are two main types of machine learning: supervised and unsupervised learning. Supervised learning is based on useful information in labelled data. Separation is a very common activity in supervised learning (and also widely used in IDS). However, labelling data manually is expensive and time-consuming. Common machine learning algorithms used in IDS are shown in Fig. 1. IDS machine learning models (shallow models) typically include an Artificial Neural Network (ANN). The design concept of ANN is to mimic the way the human brain works. ANN contains the installation layer, a few hidden layers, and the output layer. Nearby layer units are fully connected. ANN contains a large number of units and can measure improper operations; therefore, it has a strong input capability, especially in indirect operations. Due to the complex model structure, training ANNs is time-consuming. It is noteworthy that ANN models are trained by a back-end broadcast algorithm that can be used to train 8,9 deep networks. Therefore, to overcome the above IDS limitations using ANN, this study proposed using Multi-Connect Architecture (MCA) associative memory neural network as a new trend.
Typically, MCA is a single-layer neural network that uses auto-association functions that work in two phases, namely the learning phase (as shown in Algorithm 1 of 10,11 ) and the convergence phases (as shown in Algorithm 2 of 10, 11 ). The MCA associative memory uses a multi-connect architecture, giving the possibility of using just four learning weights with a size 3×3 (see Fig. 2). Thus, MCA can work well in real-time with a small neural structure (which has three nodes) with small learning weights of size 10,11 . 060 As shown in Algorithm 1, the learning phase will be implemented by dividing the training patterns into vectors (v) of length three. Since the training patterns are represented by a bipolar, the number of these vectors (v) will be eight (-1-1-1, 1-1-1... 111). One of the characteristics of these vectors (v) is that four of them are orthogonal to the other four. The learning algorithm has four weight matrices (w) that will be initially assigned. Each weight matrix (w) was calculated using Eq. 1. In addition, initial values will be assigned to the elements of the energy function matrix (E), which is calculated based on the previously mentioned vectors (v) and weight matrices (w) using Eq. 2. As a result of the learning algorithm, the Stored Majority Description (smd) and Stored Vector Weight (svw) will be calculated and stored in the lookup table.
Where: E is artificial network energy. n is the number of elements in the vector V.
Wij is the weight from the output of neuron i to the input of neuron j.
The convergence phase of the MCA shown in Algorithm 2 is completely dependent on the Energy Function Matrix (E). It provides the value that denotes the amount of convergence between the stored vectors (from the Learning Phase), which is represented by the learning weights (w), and the input pattern's vectors in the convergence phase. According to the energy function value, MCA can know similar to, or different from, the input pattern's vectors the convergence phase. Thus, the main goal of this research is to develop an Interior Visual Intruders Detection (IVID) module with the following objectives: 1. To design and construct an IVID module based on Multi-Connect Architecture Associative Memory. 2. The IVID module generates an alarm signal if an intruder comes into a pre-specified region. 3. To outline and accurately describe the state of the mode of operation and conditions for optimal performance. 4. To successfully implement this IVID module. Using the MCA associative memory paves, the way to gaining two main contributions, which are summarized below: 1. Using the MCA associative memory as a new trend paves the way to finding an efficient IVID to detect any moving object in the scene without detecting or tracking objects. The proposed module uses human brain techniques to identify its environments, taking advantage of the fact that humans can recognize objects quickly and efficiently, although this may be difficult if the objects are from different viewpoints, rotated, or partially obstructed. 2. The proposed IVID module would be able to be adapted to its environment efficiently after implementing the training process during the installation period.
The scope of this paper is limited to rooms with only one entry and exit. Such rooms ensure that everyone entering the room can only access the room and exit the room through a single door. This makes tracking access information much easier for the user.
In the proposed IVID module, MCA associative memory was used to detect any moving object in the scene. The training method will be done via the learning phase of MCA. On the other hand, motion detection will be done via the convergence phase of the MCA.
The rest of the paper is organized as follows. First, the problem statement has been presented. Second, the proposed module's related works have been introduced. Third, the proposed IVID module is presented in detail. Fourth, the proposed model is evaluated, and the results are analyzed. Fifth, a quantitative and qualitative comparison with the works most closely related to it is carried out. Sixth, the proposed IVID module limitations are diagnosed. Finally, the paper is concluded.

Problem Statement
Security is an important factor in all of our lives. A safe environment often gives people peace of mind. However, as the world economy recovers, people have turned to degrade practices to survive. The level of theft has increased, which has led to tensions in the areas where such incidents occur. This sometimes leads to unpleasant events like fights and sometimes injuries. These injuries occur when a criminal is caught and the action of the mob is determined. The proposed design aims to reduce such incidents by warning appropriate living quarters while the unauthorized entry of a room occurs. Therefore, the consistency of this paper was not overemphasized.

Related Works
Recently, awareness of human activity has received a great deal of attention as it promises to track anyone attempting to enter a protected area without permission. 2 observed that the detection threshold corresponds to the sensitivity level of the link to human movement and showed that the proposed threshold model achieved comparative detection performance. 12 provided a model that characterized the amplitude variation of CSI (Channel State Information) subcarriers due to human presence, explaining the non-linearity that can occur in CSI under human influence. 13 is the first to incorporate meaningful phase information for deviceless human detection by successfully eliminating the randomness contained in the raw phase. They proposed a new unified property using the eigenvalues of the CSI correlation matrix. 14 proposed an approach to detect the presence of immobile people and provide a unique solution for the diversion of domestic professions using the concept of Wi-Fi peripheral vision. They showed that the system could achieve 96.7% accuracy in occupation detection in a variety of occupational scenarios, even in empty spaces and on-the-go and stationary subjects. 15 designed and implemented a unified observational approach for stationary people by modelling and leveraging the chest movements of human breathing as an essential indicator of human existence. 16 proposed a device-free presence and location detection algorithm using CSI fingerprint pattern analysis. 17 proposed a new scheme for robust door-to-wall detection without moving human devices on basic Wi-Fi devices. In particular, they examined the correlation changes between different subcarriers in the presence of human movement and averaged the first-order differences of the CSI eigenvectors between the different subcarriers.
Another promising technique for detecting intrusions into computer vision is 18,19 . Vision-based IDS uses computer vision technology to instantly detect intruders from the flow of media information and greatly improves the usability of surveillance applications. However, vision-based IDS systems require imaging hardware that is distributed across the environment and that responds to differences in light intensity. While omnidirectional cameras can record information from all angles in Line-Of-Sight (LOS) environments. On the other hand, it does not work effectively in Non-Line of Sight (NLOS) environments. In addition, authentication systems based on visibility can raise privacy concerns.
All of the works listed above must use feature extraction, object detection, object tracking, and so on. Therefore, the difference between this work and the above-related works lies in the unique use of associative memory in general and MCA in particular. The uniqueness of this work came from using MCA as an associative memory. Thus, the memory size has been reduced as well as the need to use the abovementioned methods (i.e., feature extraction, object detection, object tracking, etc.).

Interior Visual Intruders Detector IVID Module Based on MCA
The module proposed is a warning module that can identify the movement that entered the scene. Based on MCA associative memory, this module would analyze any motion in a previously installed camera captured environment. The general framework has been illustrated in Fig. 3. According to the above general framework, the proposed module would include preprocessing for the video stream. Preprocessing has been needed to prepare the input of the proposed IVID module. The preprocessing transferred the video stream into a stream of images in JPG format. Then, these color images or greyscale images will be converted to black and white images. These mono images will be manipulated to represent them in bipolar using a hard limiter equation (see Eq. 3). A hard limiter function has been used to produce an obtained matrix that has a value of 1 or -1. Finally, these mono images are now ready for the training process, where they will be presented as training images for the IVID module.
Where: P is a pixel in the image with position (x,y) 11 . The results of the preprocessing are bipolar black and white patterns. Initially, at the first time of processing, this image stream will proceed to the training phase of the proposed IVID module. Thus, the training phase will be implemented based on the MCA learning phase. One of the inputs of this phase is a flow of patterns to be considered as training patterns (i.e., black and white patterns with bipolar representation). Since the training process is supervised, the training patterns will be carefully selected from the flow of patterns entering the IVID module during the learning process. The chosen patterns should reflect different states, whether they are intruders in different positions in the scene or the absence of intruders at all. The training process will be accompanied by a periodic test to ensure that the module can identify most of the cases through which it can distinguish between scenes that are interspersed with the presence of an intruder or not. Using these periodic tests, the accuracy ratio AR will be calculated using Equation 3. Thus, the learning process will continue by increasing the number of training patterns until the accuracy rate AR reaches the target accuracy rate TAR. At this point, the learning process will stop and be switched to the analysis process throughout the module's working life. Finally, the outputs of the training phase have resulted from the MCA learning phase, which are four 3*3 weight matrices to be stored in a specific lookup table.
The analysis phase has begun after the training phase has been completed and the lookup table has been constructed. Thus, the analysis phase started only if the proposed IVID module had been trained to be capable of making a decision depending on the MCA convergence phase. In the analysis phase, which depends on the convergence phase of MCA associative memory, the input of this phase is a flow of black and white images with bipolar representation to be considered as an IVID module environment. If any, the output of the analysis phase is the intruder detection alarm.
In the analysis phase, the weights obtained in the training phase would be relanced to perform the Energy Function (E) check on each input image of the flow of bipolar images that reflect the IVID environment. In this process, the role of the energy function E has been increased according to the MCA convergence phase. The energy function E is calculated by applying Eq. 2, as used in 11 .
According to the Energy Function (E), this phase checks whether the image stream has been converged towards any of the training images in the lookup table, which saves their training results. These results contain all the intruder's detection requirements (in case of an occurrence). Depending on the lookup table, the convergence image stream has been checked to detect the intruder. Finally, in this phase, the module would trigger a warning in case of intruder detection. The IVID module process is illustrated in Algorithm 3 with both phases. In this algorithm, the training phase depends on the Target Accuracy Ratio (TAR), which starts with a minimum value. While the AR has been calculated using Eq. 4. Where: Corrected_Images is the number of correctly detected Images.
Total_Images is the total number of training images.
The process of the proposed IVID module has been illustrated in Algorithm 3. The parameters used by Algorithm 3 are provided in Table 1. In Algorithm 3, AR will gradually be increased during the training phase process. The end of this phase will be when the AR reaches its target value, which is TAR. Subsequently, the process of entering the analysis phase does not occur only after the AR has reached the target value. This phase is the one that will be responsible for giving the alert (whether it is an audio alert, an emergency call, a text message, etc., or all of the above). As Algorithm 3 shows, both AR and Intruder_Detection_Alert (IDA) initially have their initial values, which are 0 and False, respectively. Through steps 3.1, 3.2, and 3.3, all the images coming from the video stream will be represented in the bipolar format in preparation for the training and detection processes that will be carried out on these images through the use of MCA, which only deals with bipolar representation patterns. Through step 3.4, the training phase of the proposed module will be performed on the incoming images if the AR value is less than the TAR value, through the use of the MCA. If the AR value is greater or equal to the TAR value, it means that the training phase was sufficient and the proposed module will proceed towards the process of the intruder analysis phase using MCA too. Finally, if the proposed module detects an intruder, the proposed module will give the alert (whether it is an audio alert, an emergency call, a text message, or all of the above) by changing the value of the IDA flag to be true. Input: Stream of Images SI and the Targate_ Accuracy_Ratio TAR Output: Lookup Table LB and Intruder_Detection_Alert IDA.
Step 3: For each image, repeat steps 3.1 to 3.4: Step 3.1: Convert the image to Jpg format.
Step 3.2: Convert the JPG image to a black-andwhite pattern.
Step 3.3: Represent the pattern in bipolar representation using a hard-limiter function using Eq. 1.
Step 3.4: If AR < TAR Then Step 3.4.1: Apply Algorithm 1 of the MCA Learning phase using the pattern as input.
Step 3.4.2: Save the Learning Phase results in the LB.
Step 3.4.3: Calculate the AR using Eq. 3. Else Step 3.4.4: Applying Algorithm 2 of the MCA convergence phase based on the LB using the pattern as input.
Step 3.4.5: If there is an intruder has been detected Then End_If. End_If.

Results and Analysis:
For the evaluation process of the IVID module, 11 videos were used for different indoor locations in terms of the place or the location of the camera to be as realistic and complete as possible. From these videos, 1075 frames in JPG format have been extracted from each video. These frames have been extracted by taking one frame from every 50 frames, ensuring that a change in the scene has occurred without allowing the loss of any change that may have occurred during the video shooting period. Thus, the total number of images from all videos is 11,825 images. For more details, the parameters used for the proposed IVID module evaluations are provided in Table 2

1) Accuracy Ratio of the Proposed Module
The proposed module has been evaluated using the 11,825 images mentioned above. Each video has 1075 images. The accuracy ratio has been calculated using Eq. 3. The accuracy ratio for all videos was high. It was between 97% and 100%. The accuracy of each video has been illustrated in Fig. 4. In Fig. 4, both videos 2 and 5 have the lowest accuracy ratio, which is 98% and 97%, respectively. For video 5, the presence of the window harmed the accuracy ratio of the proposed module because of the unwanted outside moving object phenomenon. For example, the appearance of a car moving outside through the window led to a wrong sense of object movement in the scene, although this movement was outside the room, not inside it (see Fig. 5).

Figure 5. Unwanted Outside Moving Objects Phenomenon (A Car Could be Detected as a Moving Intruder).
In the same video, another phenomenon appeared. This phenomenon involved all the 11 videos, which is the color overlapping phenomenon between the colors of the intruder and the colors of the objects that are already found in the scene, which led to the intruder being somewhat invisible, thus the proposed module could not sense it unless the intruder changed his/her location during the next frames.
Furthermore, in video 2, there is a phenomenon of distant moving objects that are outside the room but appear in a sense. The unwanted movement may be detected as a moving intruder that harms the accuracy of the proposed module (see Fig. 6).

2) The Number of Training Images Versus the Accuracy Ratio
The training process should be applied using a carefully selected set of images that are extracted from each video. Where these images are considered as training images. It is also known that one of the strengths of any training process depends on the number of these training images. Thus, the fewer training images, the faster learning process, taking into account that the accuracy should be high.

065
The learning process was carried out by gradually increasing the number of training images to ensure that the least number of training images was obtained before reaching the TAR or saturation case. The saturation case is the case in which the learning process reaches the maximum number of training images. Thus, if the number of training images increases, the accuracy ratio will not be improved. Therefore, in this case, the training process must be stopped and the reached accuracy should be considered the best case.
In this evaluation, the 11 videos have been evaluated according to the number of training images versus the accuracy ratio (see Fig. 7). Figure  7 shows that the number of training images for each video was relatively small, as the number of training images for each video was between 6 and 10 training images, with an accuracy ratio kept in the range of 97%-100%. Fig. 7 shows the accuracy of the IVID module corresponding to the number of training images, the accuracy has been gradually increased as soon as the number of training images increases until the TAR or saturated case has been reached. The smaller the number of training images used in the training process; the less training time is required to finish the training process. Fig. 8 indicates that the time taken to complete the training process for all videos was in the range of 10.2 seconds as a minimum time to 23.2 seconds as a maximum time. This indicates that the learning process was relatively fast through a few learning images that produced very high accuracy, giving the IVID module proposed module a triple contribution (i.e., fast learning using relatively few learning images with high accuracy).

Quantitative and Qualitative Comparisons
In this section, quantitative and qualitative comparisons have been made between the proposed module and a group of works referred to in the section Related Works, which were the closest to this work as shown in Table 3. A quantitative comparison, clarified in Table 3, shows the accuracy ratio of a group of works related to this work. It clearly shows that the proposed module has high accuracy compared to the others, except for one work, which is 17 , whose accuracy exceeds the accuracy of the proposed work by only 0.1.
As for qualitative comparisons, even though the proposed IVID module has a high accuracy rate, its originality of using MCA as an associative memory will be a new trend. As a result, the memory size is reduced, and the need for any additional methods such as feature extraction, object detection, object tracking, etc. is also avoided. Rather than using these additional methods, the proposed unit defines its environment using human brain technologies, taking advantage of the fact that humans can detect things quickly and effectively. This advantage was acquired by the proposed module due to the use of MCA.

The Proposed IVID Module Limitations
The position of the camera has a great impact on the efficiency of the proposed module. Thus, the presence of windows that overlook scenes outside the room, in addition to the external and internal glass partitions, may incorrectly reflect the movement of an object outside the room, causing an incorrect sense of the movement of this object, which affects the accuracy of the proposed module. Another less effective limitation that reduced the accuracy of the intruder's detection in the proposed model is the overlap of colors between the intruder (if any) and the objects already in the room, which are considered part of the scene. It is described as less effective since the effect of this limitation is removed as soon as the intruder changes its position in the scene to become visible by the proposed module.

Conclusion:
According to the outcome of the above evaluations, it concludes that the use of associative memory in general and MCA in particular as a new trend of techniques gives the proposed module an advantage by being able to adapt itself to its environment efficiently and quickly. The efficiency was represented by the ability of the proposed model to accurately detect the intruders with an accuracy rate of 97%-100% through a small number of training images (which did not exceed 10 images in the worst case). In other words, this small number of training images enabled the proposed module to work continuously in the environment in which it was trained. During its working time, the proposed module is supposed to face a stream of images that it must analyze from the time of entering service after the training phase, to reveal whether there are intruders. On the other hand, the speed of training on its surroundings was evident through the time of the training process, which was an average of 10.2-23.2 seconds. That is, the proposed model did not need more than this average of seconds to adapt to its environment, gaining its ability to detect intruders in real-time with a high accuracy rate, benefiting from the mechanism of MCA in the training phase and analysis phase for the streaming images from its environment during its working time.