Perceptually Important Points-Based Data Aggregation Method for Wireless Sensor Networks

: The transmitting and receiving of data consume the most resources in Wireless Sensor Networks (WSNs). The energy supplied by the battery is the most important resource impacting WSN's lifespan in the sensor node. Therefore, because sensor nodes run from their limited battery, energy-saving is necessary. Data aggregation can be defined as a procedure applied for the elimination of redundant transmissions, and it provides fused information to the base stations, which in turn improves the energy effectiveness and increases the lifespan of energy-constrained WSNs. In this paper, a Perceptually Important Points Based Data Aggregation (PIP-DA) method for Wireless Sensor Networks is suggested to reduce redundant data before sending them to the sink. By utilizing Intel Berkeley Research Lab (IBRL) dataset, the efficiency of the proposed method was measured. The experimental findings illustrate the benefits of the proposed method as it reduces the overhead on the sensor node level up to 1.25% in remaining data and reduces the energy consumption up to 93% compared to prefix frequency filtering (PFF) and ATP protocols.


Introduction:
Wireless Sensor Networks (WSNs) can be utilized in a wide range of applications, including military surveillance and environmental and facility monitoring 1,2 . A WSN is usually made up of multiple sensor nodes with communication capabilities, as well as the ability to connect with any external sinks or base stations 3,4 . These sensors are either dispersed at random around a rugged landscape (such as a battlefield) or are strategically placed. Different communication networks, such as single and multi-hop networks, or a hierarchically organized system with a number of clusters and cluster heads, are formed by the synchronization of these sensors 5 . The sensor senses, processes, and transmits data to the base stations on a regular basis. The frequency at which data is reported, as well as the number of sensors involved in the operation, are both dictated by the application 6 .
Data gathering is the act of routinely extracting data from a variety of sensors and sending it to the base station for processing. Giving the sensor node's energy constraints, direct data transfer to the base station by all sensors will be inefficient 7 . This is attributed to the high overlap of data from neighboring sensors, which results in redundancy. Furthermore, base stations are incapable of processing the massive quantities of data provided by a wider sensor network 8 .
As a result, such networks are expected to merge data and generate significant information at sensors or intermediate nodes, as this can help to reduce packet transfers to the base station, saving energy and bandwidth. Data aggregating techniques, which typically require the fusion of data collected from multiple sensors at intermediate nodes and routing aggregated data to base stations, may be used to do this 9 .
The major contributions in this paper concentrate on the design and application of a strategy for energy-efficient data aggregation to extend the lifespan of WSNs. The contributions that this paper provides are as follows: 1. At the sensor node level, a method of data aggregation based on the perceptually important points is suggested for reducing the amount of data readings transmitted, reducing the energy consumed and thereby extending the lifespan of the network whereas preserving the accuracy of the data readings obtained at the base station. 2. The evaluation of the suggested approach is carried out with the use of comprehensive simulation experiments provided by the simulator of the OMNeT++ network. The efficiency of the proposed technique is evaluated with two related works: the PFF protocol proposed in 10 and the ATP protocol proposed in 11 .
The reaming portion of the research paper is systematized as follows: Section II surveys existing works on data aggregation methods. Section III focuses on the proposed method that aggregates data based on the perceptually important points. In Section IV & V results and conclusions are drawn.

Related Works:
For WSN, a large number of studies have addressed the reduction of data transmissions through removing data redundancy (i.e., using data aggregation). The primary target of this review is to thoroughly examine the published works of literature on extending the lifespan of WSN  . There are several methods and principles dedicated to save energy and expanding WSN's lifespan, concentrating mostly on reducing data transfer, such as predictive monitoring, routing, aggregation, elimination, prediction, adaptive sampling, clustering and data compression. Data Aggregation before transmitting it is thus a key strategy in terms of energy efficiency. A description of the related works on the techniques of data aggregation in WSN is shown in Table 1.

The Proposed Data Aggregation Method:
This section will include a comprehensive explanation of the proposed process, which is a technique of data aggregation based on the perceptually important points for decreasing the amount of transmitted data readings, reducing the energy consumed and thereby extending the network lifetime while retaining the precision of the data readings obtained at the base station.

WSN Topology:
The PIP-based data aggregation technique was developed based on cluster topology. The WSN-based clustering is shown in Fig. 1. In the proposal, the formation of the cluster topology is out of scope, it is assumed that there is already a topology and deliberately skipping to discuss the formation of the topology. The proposed technique can be applied to these clusters produced by any clustering protocol. Focusing basically on designing an energy-efficient data aggregation method. More precisely, the objective of this technique is to reduce the sensed data at the sensor node level to prolong the WSN's lifetime. The data is first filtered using the fitting algorithm, which has two thresholds; if the data's usual value is less than one of the thresholds, the data is not sent to the aggregator for aggregation. To get rid of the redundant data readings, the SAX symbolic algorithm and adaptive piecewise constant approximation (APCA) were used as a data aggregation technique. In cluster-based architecture, every cluster has several cluster members (sensor nodes) and one common cluster head (CH). Usually, when the sensor nodes are scattered in close regions, they can generate the same readings or similar readings, and therefore the temporal and spatial correlation can be exploited. The data set of the sensor node is sent to the respective CH that belongs to it after the end of each period.
In the proposed method, the following assumptions have been considered regarding WSN topology:  All sensors are homogeneous, predetermined deployed within the communication range of the base station node.  For energy consumption, each node is presumed to be using the same model of radio.  Each node is presumed to use a periodic data collection mode, where the data collected is processed and sent to the corresponding CH regularly by each node.  Data transfer from the sensor nodes to the relevant CH relies on single-hop communication.  Environmental conditions or events such as pressure and temperature are monitored by a sensor node.  It is possible to partition a cluster-based network into disjoint clusters. There is one cluster head (CH) and several sensor nodes (SNs) in each cluster. Each CH gathers data from its SNs and transmits the processed data to the base station.

Data Collection:
The main objective of WSN is to make human life easier and simpler. The implementation of WSN is often concerned with data collection and communication of information. In WSN context, the data is often collected from sensors. Based on application requirements, in WSN, the collection of data may be event-driven (like forest fire, oil and gas leaks detection) or time-driven (like habitat monitoring, logging temperature and humidity in the plants for precision agriculture). The timedriven data collection model, called Periodic, is taken into account in this article.
Each sensor node captures a data readings vector for each cycle and then transmits it to the CH in the periodic applications as follows: = [ 1 , 2 , … , −1 , ] where τ reflects the total number of data readings obtained in the period ρ. Figure 2 displays a periodic data collection example in which every sensor node capture one reading of data every 10 minutes, e.g. = 10 minutes, and transmit the set of collected data that include 6 reads, e.g. τ = 6, to CH at end of every hour.
Often, data readings obtained from the sensor are redundant in any cycle, i.e., in , depending on how the conditions monitored differ. In order to minimize the amount of data readings transmitted and to preserve the energy of the sensor, the search for data redundancy in each sensor is therefore important. Hence, our objective is to reduce the size of by aggregating it using the perceptually important points (PIP) segmentation method.

Perceptually Important Points Based Data Aggregation (PIP-DA) Method:
Using PIPs is a promising approach to manipulate salient points from a time series. In the data mining context, PIPs were used mostly for  34 and for clustering purposes 35 . Time series are formed from a series of readings of sensor data = [ 1 , 2 , … , −1 , ] and the importance of each reading i has varying degrees of effect on the time series of the sensor's motion form. That is, each data reading will have its own meaning for the time series, the total motion form of the time series can be calculated by a data reading, while another either has no effect on the time series or can even be discarded. PIP technology aims to find the reading that has a crucial effect on the overall time series motion form 36 .
According to the proposed data aggregation method, a collected sensor data reading = [ 1 , 2 , … , −1 , ] can be expressed by a PIP series = [ 1 , 2 , … , ], where ≪ as follows:  The PIP-DA begins by identifying as the first two PIPs 1 and 2 the first 1 and last readings of the original sensor data readings .  Next, it computes the distance between all residual readings of sensor data and the two initial PIPs.  Subsequently, the PIP-DA chooses the sensor data reading with the maximum distance as the third PIP 3 .  The PIP-DA selects as the fourth PIP 4 the sensor data reading that significantly increases its distance from its neighbouring PIPs (that are either the first 1 and the third 3 , or the PIP identified by the third 3 and the second 2 ).  The PIP-DA will stop when the number of PIPs specified by the user is determined.

The Measures of Distance for PIP Detection:
In the PIP-DA technique, three metrics can be utilized for distance, namely vertical distance (VD), perpendicular distance (PD) and Euclidean distance (ED) as shown in Fig. 3. This paper has used the Euclidean distance (ED) as a measure for the PIP detection. Let = [ 1 , 2 , … , −1 , ] be the sensor data readings of length , and two adjacent PIP = and = . The Euclidean distance of each of the intermediate data readings = , for ∈ { + 1, … , − 1} from the two PIPs is defined as in Eq.1. The new point of PIP is the one which maximizes the distance, = argmax ( ( , , )), where "argmax" refers to the maximum argument.

Application Criticality:
Using different types of sensor instruments, such as noise detection, concentration of chemicals, strain, displacement and temperature, the WSN may be used to track disasters. The effect on individuals and on the ecosystem of disasters is not the same. Therefore, if the disaster risk level is high, the sensor node would send more data readings than if the disaster risk level is low which will include high quality collected data readings to simplify both the analysis and to help understand the tracked catastrophe.
The number of PIP in proposed method depends on the application criticality (i.e., the disaster risk level). There is a proportional relation between PIPs number and the risk level; therefore, when the risk level is high, the PIP-DA will push the sensor node to send as maximum as possible data readings and vice versa. In general, this would save energy because the sensor node has the option to adjust the sending rate based on the needs of the application in WSNs. The criticality of the application is represented in the PIP-DA method as a minimum number of transmitted data readings for a sensor node, ℛ , over a period. ℛ takes values from 1 to 100, indicating either a low or high criticality level, respectively. So, the number of PIP is calculated as in Eq2. PIP is used to segment the sensor data readings dynamically.

Algorithm 1: PIP-Data Aggregation Method
Input: Output : Sensor data readings = [ 1 , 2 , … , − Finally, in order to further reduce the size of the resulting PIP series before it is sent to the CH, the proposed method will encode this PIP series; where, as shown in Fig. 4, each in the PIP series is encoded using two bytes. From Fig. 4, every is encoded in a 16-bit representation in the PIP series. For negative numbers, the sign bit takes 1 and positive numbers 0. The integer part of the was expressed in the 8 bits that followed. The remaining 7 bits constitute the fraction part of the .

Simulation Experiments and Results:
This section shows the performance evaluation and simulation results as graphs and discussion for the proposed technique outlined in Section 3. The goal is twofold: first, evaluating the performance of technique with different performance metrics via real sensor data. In Table  2, the efficiency of PIP-DA is measured using the following parameters. Second, comparing the technique proposed with recent existing protocols belongs to the same field.

Simulation Environment:
To evaluate proposed techniques, extensive simulation experiments are carried out using the OMNeT++ simulator and dependent on actual data from sensors. A network of sensors and a singlehop topology was considered installed in the laboratory during these simulations. The middle of the laboratory comprises a single CH node. This installation is shown in Fig. 5.
Periodically, sensors measure the local measurements at a set frequency (e.g., temperature). Proposed technique is disseminated in every sensor node, which is dependent on using the Intel Berkeley Research laboratory dataset 37 . These sensed weather data (like light, humidity and temperature) are collected periodically every 31 seconds. The sensor nodes using a log file in our simulations that contains 2.3 million readings previously obtained by 54 Mica2Dot sensor nodes in the lab as shown in Fig. 5. This paper only uses one measure of measurements of sensor nodes: temperature. Each sensor node shown with a yellow sign in Fig. 5 is not included in our experiments because its data may be incomplete or truncated. Then the temperature readings for 47 sensor nodes are collected and processed. The findings are 47 sensor nodes averaging.

The Residual Data after Aggregation:
Within this experiment, the aim is to demonstrate how the proposed technique can be used by the sensor nodes to aggregate the collected data readings (i.e., remove redundant data readings at each period). Figure 6 indicates the proportion of residual data readings that will be remained once the redundancy has been removed by applying the PIP method. It may be easily seen that with the various parameters, the proposed technique has the ability to adjust the sending rate based on the application criticality. Based on the findings obtained, it can infer the following things:  The outcome of the aggregation at the proposed strategy PIP-DA depends on the application criticality level ℛ selected. As the greater the risk level of the application, the greater the amount of data residual and vice versa.  ATP is found to keep smaller amounts of data if the amount of data collected or the similarity threshold increases.
 While the PFF keeps all the collected data 100%.

The Ratio of Transmitted Data Sets:
In this experiment, based on the suggested PIP-DA system, each sensor decreases the number of data sets transmitted to its respective CH. Figure  7 indicates the ratio of data sets transmitted utilizing PIP-DA, PFF and ATP protocols by a sensor node. The PIP-DA helps each sensor node, depending on the degree of application criticality, to change its transmission rate. Several assumptions can be rendered as follows, based on the findings in Fig. 7:  As and ℛ are raised in the ATP and PIP-DA methods, the sensor node sends further sets.
 Our proposed method sends fewer data sets than ATP and PFF protocols in all the cases when varying the application criticality level ℛ between 5% to 75%.  While the PFF sends all the collected data 100%.

The Analysis of Energy Consumption:
Our aim in this experiment is to research the cost of energy at the sensor node level. At the sensor node, the energy consumed represents the energy consumed in sending data to the CH. The same model for energy indicated in 27,28,29,30 is used. The energy consumption of the transmission demands extra power to amplify the signal due to its distance from the endpoint. Thus, the radio consumes energy as defined in Eq. 3 to send a -bit message to distance , where is the power required by radio electronics and is equivalent to 50 / , is the power required by the amplifier and is equivalent to 100 / / 2 .

=
× + × × 2 3 Figure 8 shows a comparison between our technique PIP-DA, ATP and PFF in terms of the amount of energy consumed using different parameters. The findings obtained indicate our technique's dominance over ATP and PFF by reducing energy consumption.  Based on the findings obtained, it can infer the following points:  In the proposed technique, the reduction of redundant sets has a very large impact on reducing energy consumption by reducing the operation of the radio unit (i.e., transmission and reception operations).  The application criticality level ℛ plays an influential role in energy consumption, as increasing the ℛ leads to an increase in the energy consumption; the reason is due to sending more packets.  Also, the data size captured and similarity threshold can affect the energy consumption where increasing and will reduce the energy consumption as the case in ATP and PFF.

Data Accuracy:
A significant problem for the WSN is to delete redundant data without compromising accuracy. The Accuracy of data reflects the "loss rate" of readings captured by sensor nodes while the CH does not receive it. Figure 9 indicates the proportion of accuracy (i.e., data loss rate) which will not be delivered to the CH once the data sets have been aggregated. It may be easily seen that, with the various parameters, there is a trade-off between data accuracy, the amount of data transmitted (see Fig. 7) and energy consumption (see Fig. 8). To have high accuracy, more data must be sent and thus more energy spent. Some applications do not need high accuracy in data, such as monitoring the environment, and thus fewer data can be sent, while military and health applications need high accuracy in data and therefore need to send a larger amount of data for this purpose. Based on the findings obtained, it can infer the following things:  In the proposed technique PIP-DA the reduction of the data readings has a relation to the accuracy, the loss rate increases when the application criticality level ℛ decreases.  The similarity threshold plays an influential role in the accuracy of the data, as increasing the similarity threshold leads to an increase in the accuracy of the data.  Also, the data size captured can affect the accuracy were increasing the volume of data collected will make the rate of data loss big.  Note that the accuracy in the ATP and PFF protocols is better than our proposed method PIP-DA for the level of application criticality ℛ > 25%, but at the expense of more energy expenditure, as shown in Fig. 8.

Conclusion:
Wireless sensor networks are networks with a limited amount of energy. Given that data communication consumes the vast majority of resources, some method of data aggregation is needed. As a result of its ability to remove redundant data transfers within the network, this method has lately attracted a lot of attention. To improve the WSN lifetime, the principal idea is to exploit the advantage of the temporal data correlation between the sensor node data readings to minimize the energy depletion by aggregating sensed data before sending them to the CH. In the sensor node level, a Perceptually Important Points  Based Data Aggregation (PIP-DA) method for WSNs is suggested for reducing the number of transmitted data readings, decreasing the consumed energy, and thus extending the network lifespan whereas preserving the accuracy of the data reading received at the base station. The simulation results that based on real data of the sensor network using OMNeT++ simulator show that the proposed data aggregation approach outperforms some recent existing approaches in terms of several performance metrics like remaining data after aggregation, the ratio of sets sent to the CH, data accuracy at CH and energy consumption. For future studies, it is recommended to investigate the possibility of implementing another dynamic segmentation algorithm that can be applied at two levels: the sensor node and gateway. In order to forecast the missing data at the CH, also expect to implement a prediction approach and combine it with our work.