Shuang Jia,Lin Ma,*,Danyang Qin
1 School of Electronics and Information Engineering,Harbin Institute of Technology,Harbin 150080,China
2 Key Laboratory of Electronics Engineering,College of Heilongjiang Province,Heilongjiang University,Harbin 150080,China
Abstract:Wireless sensor network is an important technical support for ubiquitous communication.For the serious impacts of network failure caused by the unbalanced energy consumption of sensor nodes,hardware failure and attacker intrusion on data transmission,a low energy consumption distributed fault detection mechanism in wireless sensor network (LEFD) is proposed in this paper.Firstly,the time correlation information of nodes is used to detect fault nodes in LEFD,and then the spatial correlation information is adopted to detect the remaining fault nodes,so as to check the states of nodes comprehensively and improve the efficiency of data transmission.In addition,the nodes do not need to exchange information with their neighbor nodes in the initial detection process since LEFD adopts the data sensed by node itself to detect some types of faults,thus reducing the energy consumption of nodes effectively.Finally,LEFD also considers the nodes that may have transient faults.Performance analysis and simulation results show that the proposed detection mechanism can improve the transmission performance and reduce the energy consumption of network effectively.
Keywords:wireless sensor network; low energy consumption; fault detection; detection accuracy
Wireless sensor network (WSN) consists of a large number of sensor nodes deployed in a specific area by a self-organized manner.There is no central control node in the network,and the end-to-end information transmission can be achieved by the intermediate nodes in a multi-hop forwarding way [1].WSN with the flexible,distributed and dynamic characteristics has a wide range of applications,such as battlefield,disaster relief,exploration,environmental threats detection and other fields [2].The sensor nodes,however,often suffer from various attacks and other external damage since they are usually deployed in severe environments [3].In addition,sensor nodes have the low manufacturing cost,limited resources and radio coverage [4].All the factors will cause failure to the nodes,thus will reduce the accuracy of monitoring data.Therefore,the network node fault detection is very important for ensuring the accuracy of monitoring results.
For the serious impacts of network failure caused by the unbalanced energy consumption of sensor nodes,hardware failure and attacker intrusion on data transmission,a low energy consumption distributed fault detection mechanism in wireless sensor network (LEFD) is proposed in this paper.
The fault detection algorithms of sensor nodes in WSN can be divided into centralized fault detection and distributed fault detection according to different data processing methods [5].The centralized fault detection algorithms usually require all the information being collected by a particular node,and then determine the states of the other nodes.These algorithms can lead to many problems easily,such as single node failure,information loss and much energy consumption [6].The distributed fault detection algorithms require each node to possess the ability to detect faults and adopt the data collected by itself or the surrounding nodes to determine their own faults [7].At present,some problems existing in the fault detection algorithms are as following [8]:
(1) The network energy consumption is sacrificed for higher detection accuracy and lower false positive ratio.In the existing detection algorithms,the sensor node needs to communicate with its neighbor nodes during fault detection,which will lead to higher energy consumption.
(2) The types of fault nodes are not fully considered.Therefore,the detection performance of these algorithms will decline rapidly if the types of fault nodes increase.
(3) The ability of sensor nodes to collect data is not fully utilized,only the spatial correlation of sensor network is used to achieve fault detection so that the complexity of algorithms increases significantly.
In order to solve the above problems,a low energy consumption distributed fault detection mechanism in wireless sensor network is proposed in this paper.LEFD adopts the time correlation features of data collected by sensor nodes to detect certain types of fault nodes,and then removes them from the network.LEFD may reduce the time and energy consumption in the communication between neighbor nodes.Then,in the remaining detection process,LEFD adopts the spatial correlation properties of WSN to detect the remaining fault nodes that are not detected during the initial detection phase.If the measured value of a node is the same or similar to that of the neighbor who is in the normal state,the node can be considered as a normal node.Otherwise,the node is considered as a faulty node.The algorithm also considers the transient faults in the sensor readings and corrects the fault data using the data collected in a short period of time when transient faults occur,which avoids mistaking the normal node as a faulty node.The main contributions of this paper are summarized as follows:
(1) A low energy consumption distributed fault detection mechanism is proposed,which uses the time correlation information of sensor nodes to detect the fault nodes in the initial detection stage.During the detection of the remaining nodes,the other nodes do not need to communicate with the detected fault nodes,thus reducing the communication traffic and network energy consumption.
(2) All kinds of fault nodes are fully considered to ensure the performance of the detection algorithm.In addition,the proposed algorithm also considers the nodes that may have transient faults in the sensor readings.For transient faults,the fault values will be promptly corrected by LEFD to avoid mistaking the normal node as a faulty node,thus improving the utilization of nodes and reducing false positive ratio.
A distributed Bayesian algorithm for detecting and correcting node faults in a wireless sensor network (BAFD) is proposed [].In BAFD,sensor node exchanges information with its neighbor nodes to obtain the statistical probability of the event,the failure ratio of the node is used to identify events and fault nodes.A fault detection scheme is proposed in [10],where each node detects any suspicious behavior using time correlation in its own reading,and the suspect node is required to communicate with confident neighbor node to find the fault node.The algorithm has high detection accuracy and low communication overhead,but does not take into account the impact of transient failure.The authors in [11] propose a distributed byzantine fault detection method based on hypothesis testing,the Neyman-Pearson test method is used to predict the fault states of each sensor node and adjacent sensor nodes,and then the final state of the node is determined by voting in this mechanism.A distributed fault detection method based on metric correlation is proposed in [12].The algorithm detects the fault nodes through the internal metric correlation of sensor nodes.The computational complexity of the algorithm is low,but it does not consider the influence of transient faults.A distributed localized fault sensor detection algorithm for wireless sensor networks (DLFS) is proposed in [13],the mutual test results between nodes and neighboring nodes are utilized to determine the states of nodes.DLFS has high detection accuracy and low computational complexity,but the algorithm requires at least two communications between adjacent nodes,thus resulting in much energy consumption.The authors in [14] propose a fault detection mechanism based on the hidden Markov random field.The HMRF model is used to characterize the correlation between the measured value and the actual value of sensor node,and then the parameters of HMRF model are obtained through the variable error estimation method to determine the state of node.The method has high detection accuracy and low false positive ratio,but it also causes high computational complexity and much energy consumption.In [15],a fully distributed fault detection algorithm is proposed.In this algorithm,the nodes first collect the measurements of their neighborhoods,and process them to determine whether they contain exception value and broadcast the results.Then,nodes determine their own operational state autonomously.Therefore,the computational complexity of the algorithm is low,while the algorithm needs to communicate with its neighbor nodes several times,thus leading to much energy consumption.The authors in [16] propose a novel method for detecting a sensor that generates fault data in a distributed manner.The algorithm detects the fault nodes in the cluster locally through the cluster head and uses the trust concept to identify the type of data failure,which may reduce the influence of fault node on sensor probability.However,this method leads to the uneven energy consumption of sensor nodes.
Fig.1.Network model.
WSN is a special wireless communication system,which does not depend on any fixed communication facilities; it can be deployed in a complex environment for data communication rapidly,and the architecture of WSN is shown in Figure 1.Each node plays the role of router and endpoint and has access service and wireless backbone interface [17].There is no absolute domination of the nodes in WSN,and each node is equal and independent.The data between nodes are transmitted to destination node by intermediate nodes,that is,data transmission is carried out by multi-hop forwarding,which can guarantee the flexibility of network topology.
Assuming that the number of nodes randomly deployed in a specific area isN.These sensor nodes have the same communication radiusRmax.A node stores at leastksegments of data that has been collected before executing the fault detection algorithm.nirepresents thei-th node in WSN.The node in the nodeni′scommunication radius is called the neighbor node ofni.N(ni) denotes all neighbor nodes ofni,andNum(N(ni)) represents the number of neighbor nodes ofni.ditdenotes the measurement data of the nodeniatt.It is assumed that theksegments of data have been collected in the sensor and stored in memory beforet,that is,Nodeniand nodeN(ni) are in the same or similar environment,which means that the neighbor nodes ofniare also in the same event area if the nodeniis in the event area,and the neighbor nodes ofniare also in the same normal area if the nodeniis in the normal area.The remaining parameters are shown in table 1.
Nodes could still receive,send,collect and process data if the network is partially faulty,but the data received,sent and collected by node are usually wrong.According to the abnormal data obtained by nodes,the fault of sensor nodes can be divided into the following specific types [18].
(1) Fixed fault:a sensor with this fault collects data with the same reading,and the data is not affected by the environment.The fault will cause the network to be paralyzed,but the fixed fault is easier to detect since the data does not change with the environment.
(2) Random fault:node readings are random and uncertain.The fault will interrupt the data transmission in WSN,and the random fault is more difficult to detect since the node readings are random.
(3) Offset fault:the node readings deviate from normal values,and the readings may change if the environment changes.The fault will disrupt the normal communication of other nodes in WSN,which in turn affects the normal transmission of data.
(4) Transient fault:transient fault may occur in a short time due to the hardware characteristics and the impact of the environment in the data collecting process,resulting in data anomalies occurred one or more times.The fault will cause a short-term failure in WSN.
In order to improve the utilization of the sensor nodes,this paper considers that the nodes with transient faults are normal nodes because the readings of these nodes are available at most of the time.
Table I.Notations and their definitions.
The data collected by sensor nodes in a short time is temporally relevant,which means that the collected data is the same or similar in a short time,and the change is not so great [19].LEFD can detect some types of fault nodes based on this feature,such as random faults and transient faults.The value of the collected data in a short time is unstable when these faults occur.However,this paper will consider that the nodes with transient faults are normal nodes in order to improve the utilization of nodes,so only the collected data generated when the fault occurs is corrected; the normal node will not be mistaken for fault node.The matrix Q is established to determine whether there are transient faults or random faults based on the difference of data collected by nodes.The faulty data will be replaced by collected normal data at other times for transient faults; therefore,the false positive ratio can be effectively reduced.However,it is not enough to use only time correlation information.For example,the node's reading still satisfies the time correlation feature when fixed faults or offset faults occur,and this type of fault nodes cannot be detected only using the time correlation feature,so the neighbor nodes are necessary.The node fails if the collected data of most neighbor nodes is not similar to the node's collected data; that is,the sensor nodes have spatial correlation property,which means that most sensor nodes have the same or similar readings in smaller areas.
The differences between LEFD and existing algorithms can be summarized from the above analysis.Firstly,LEFD uses time correlation information to detect certain types of fault nodes and corrects some values as needed,and then the spatial correlation property of nodes is adopted to detect remaining fault nodes.However,the existing algorithms do not use time correlation information or only the spatial correlation information is adopted to detect fault nodes,so there always are undetected fault nodes in the network.Secondly,the existing algorithms do not consider the transient faults of nodes so that the normal node is mistaken as a faulty node,thus reducing the utilization ratio of nodes.
The latestksegments of data can be obtained after the nodenicollects the data at timet.The matrix Q is established according to (1):
For each row in matrix Q,viris calculated:
At timet,the value ofviris corrected byvit:
Any measured value at other times when=0 can be considered as its value at timet.
(4) is used to determine the initial states of sensor nodes:
For the nodeniwith state 0,the neighbor reading whose initial fault condition is 0 is obtained.Then,the final state of the node is determined according to (5) and (6):
whereNum(N(ni) a ndT=0) denotes the number of neighbor nodes ofniand the number of nodes whose state may be normal.βrepresents the node failure thresholds.FSi=0 represents that the nodeniis a normal node.Otherwise,the nodeniis a faulty node.
For example,assuming thatk=5 andNum N n((i))=5,theksegments of data collected by the nodeniat timetand beforetisthe value is {60.12,30.23,31.54,10.68,30.87} .The neighbor nodes' reading of nodeniisatt,the value is {70.22,31.35,65.79,30.84,31.10},andβ=2.Then,the matrix Q is established according to (1):
As can be seen from this example,LEFD is a very effective detection method for transient faults and random faults.The detailed description of LEFD is shown in Algorithm 1.
Algorithm 1.The pseudocode of low energy consumption distributed fault detection algorithm.(1) Begin (2 for each node ni in WSN (iN=1,2,...,)/* The following method is adopted to establish Q*/(3) for each k times before time t (including time t)(4) if dd mn - ≤β1(5) Qmn=0(6) else(7) Qmn=1(8) end if(9) end for/* Generate test vir*/(10) if ∑t j t k= - +1Qij <k/2(11) vir=0(12) else(13) vir=1(14) end if/*Correct vit*/(15) if ∑t ii t=1(16) vi t=0,d d r </2 and vi r t k= - +1v k i tt x ii=-//((t xt ktv- ∈ - +=) [ 1,and 0] i t x-)(17) else if ∑t t=0(18) vi t=1,d d r ≥/2 and vi r t k= - +1v k i t x- )(19) end if/* Generate a state value S=tt x ii=-//((t xt ktv- ∈ - +=) [ 1,and 1] i r*/(20) if ∑t i 0 based on the value of vi r </2(21) Si=0(22) else(23) Si=1(24) end if/* For the nodes with status value S=r t k= - +1v k i i 0,test each member of their neighbors to generate test v,t{0 1} by adopting the following way*/(25) if d d ij tt - ≤β2(26) vij ij t=0(27) else(28) vij t=1(29) end if/* Make the final decision of the nodes' state FSi*/(29) if v Num N nS iji t <=(()and 0/2)n N nS∈=(∑)and 0 jii(30) FSi=0(31) else(32) FSi=1(33) end if(34) end for Process end
The algorithm adopts the historical data sensed by nodes to determine the initial state of nodes.The node may be a normal node if the collected data is stable in a short time (almost no change).Otherwise,the node may be a faulty node.In other words,only the sensor node's own data can be used to identify some of the fault nodes.After determining the initial states of nodes,LEFD further determines that the initial states of their neighbor nodes are normal for the nodes with normal initial state.A node will be determined as a normal node if its measured value is similar to most of its neighbor nodes.In the whole algorithm implementation process,the fault nodes that are identified in initial detection process are no longer able to communicate with other normal nodes,the algorithm adopts the data from nodes whose initial state are normal.This method not only consumes less energy,but also reduces the error detection ratio.In addition,LEFD also considers the transient faults of nodes.The algorithm will correct the false readings when transient faults occur,which means that the algorithm adopts the reading at other times instead of the reading at this time to further improve the fault tolerance ability of sensor nodes to transient faults.
The two indicators are usually adopted to evaluate the effect of fault node identification,namely,detection accuracy and false positive ratio.Detection accuracy (DA) refers to the ratio between the number of fault nodes that have been correctly identified and the total number of actual fault nodes:
whereFrepresents the set of fault nodes detected by the algorithm,andArepresents the set of actual fault nodes.
False positive ratio (FPR) refers to the ratio between the number of normal nodes which are identified as fault nodes and the total number of normal nodes:
Most of the energy consumption is caused by communication between nodes [20].Thus,the total number of communications between nodes can be adopted to represent the total network.When the communication radius of node isRmax,it is assumed that the average energy consumption which the nodenicommunicates with its neighbor node once iseleci:
whereEcdenotes the total energy consumption.
The performance of LEFD was compared with DLFS and BAFD using NS2 in this study [21].In order to maintain the generality,it is assumed that the position of each node is known,and all nodes have the same communication radiusRmax,and the reading of the nodes in the normal region is subject to the distribution ofN(µ σ,),(µσ==35,1).At least 5 segments of data (k=5) are stored in each sensor node.The value of k should not be chosen too high because the sensor nodes have limited storage capacity.The data may take up too much storage space if the value of k is too high.The node failure thresholdβ=5,and the basic idea of the node failure threshold selection is to determine the node failure threshold according to the allowable deviation of the sensor node.The key step is designing an observer.The output of the observer and the output of the sensor node constitute a redundant signal,and then the two signals are compared to obtain the sensor residual sequence.The allowable error of the sensor node is selected as the node failure threshold [22].Since the fixed fault is similar to the offset fault,the two types of faults are also regarded as offset faults.The results were obtained from the mean of 100 experiments.All the simulation parameters are shown in table 2.
The time complexity of LEFD is O(N) for N sensor nodes in WSN.The algorithm only needs to determine the state of a node through the historical data sensed by a node and data collected by its neighbor nodes.Therefore,a node only needs to collect data from N-1 nodes at most.Then,the time complexity of LEFD is O(N).
Figure 2a and 2b show the performance comparison results of different algorithms in terms of DA and FPR when only offset faults occur.It can be seen that the DA of DLFS is much higher when the sensor fault probability is less than 30% in Figure 2a,and the DA of LEFD proposed in this paper is similar to that of BAFD; the DA of DLFS is rapidly reduced compared with LEFD and BAFD when the sensor fault probability is higher than 30%.However,DLFS has low FPR,and the FPR of LEFD is between DLFS and BAFD.Based on all the above factors,the performance of the LEFD algorithm is between DLFS and BAFD when only offset faults occur.
Table II.Simulation parameters.
Figure 3a and 3b show the performance comparison results of different algorithms in terms of DA and FPR when only random faults occur.It can be seen that the sensor fault probability has little effect on the DA,and the FPR increases with the increasing of the sensor fault probability in Figure 3.However,LEFD also has good performance and always maintains high DA and low FPR even in the case of high sensor fault probability for random faults,since the LEFD algorithm first checks whether the nodes' reading is stable in a short time.The node may be fault if its reading is unstable,because the data of the random fault sensor is random and unstable.Since the range of random faults is form 1 to 100,DLFS and BAFD are effective for this fault,but are less efficient than LEFD.The DA of DLFS and BAFD are also more than 93%,and the DA may increase (such as BAFD) when the sensor fault probability increases.But the FPR of DLFS and BAFD also increase when the sensor fault probability increases.However,the FPR of LEFD is almost zero.
Fig.2a.Detection accuracy under offset faults.
Fig.2b.False positive ratio under offset faults.
Fig.3a.Detection accuracy under random faults.
Fig.3b.False positive ratio under random faults.
Figure 4 shows the relationship between the sensor fault probability and the FPR if only transient faults occur whenk=5.It can be seen that the FPR of all algorithms increase with the increasing of sensor fault probability in Figure 4.The FPR of LEFD is very low because it has good ability to handle transient faults.For example,the FPR of LEFD is still less than 5% when the sensor fault probability is 50%,since LEFD determines whether the collected data is correct according to theksegments of data.LEFD will replace the current data with the data collected at another time to avoid the impact of transient faults if data is wrong.Therefore,LEFD has a good fault-tolerant performance for transient faults.However,both DLFS and BAFD do not consider transient faults,so the FPR will continue to increase as the sensor fault probability increases.
Figure 5a and 5b show the relationship between the sensor fault probability and the DA/FPR when the offset faults,random faults and transient faults occur randomly,respectively.The DA of DLFS and LEFD is almost the same in Figure 5a.The DA of DLFS and LEFD are higher than BAFD when the sensor fault probability is less than 35%.However,the DA of DLFS will decline rapidly when the sensor fault probability is greater than 35%.The FPR of LEFD is the lowest of the three algorithms.In short,the performance of the LEFD algorithm achieves our expectations in the event of mixed faults.
Fig.4.The relationship between the false positive ratio and the sensor fault probability under transient faults.
Fig.5a.The relationship between the detection accuracy and the sensor fault probability under mixed faults.
Fig.5b.The relationship between the false positive ratio and the sensor fault probability under mixed faults.
Figure 6 shows the relationship between the energy consumption (EC) of DLFS,BAFD,LEFD and the sensor fault probability when the transient fault ratio and the offset fault ratio is 1:1,the communication radiusRmaxis 2,and the non-transient faults occur.It can be seen that DLFS has much higher energy consumption as the sensor fault probability increases under the same conditions in Figure 6,since each node needs to communicate with its neighbor nodes at least twice (the first communication is to exchange the initial data set,the second communication is to exchange the initial state of each node).However,nodes that have not yet determined the final state need to make the third communication.As a result,the energy consumption of DLFS is always high relatively.Each node only needs to communicate with its neighbor nodes once for BAFD,so its network energy consumption is moderate,and does not change with the sensor fault probability.LEFD first adopts time correlation information for initial fault detection,and each node does not need to communicate with its neighbor nodes in this process.Only nodes that have been detected to have a normal state need to communicate with the neighbor nodes and consume additional energy.Therefore,most of nodes will be detected as fault nodes by the LEFD algorithm in the case of high sensor fault probability,and the energy consumption of network also decreases.In summary,the energy consumption of the LEFD algorithm is low.
Fig.6.Energy consumption of different algorithms.
WSN is an important component of modern mobile communication systems.However,network performance is seriously affected due to the breakage of data link and frequent changes of network topology.Therefore,a low energy consumption distributed fault detection mechanism in WSN is proposed in this paper.LEFD adopts the data sequence collected by the sensor node itself to detect a particular type of fault,and then further uses the neighbor data to determine the states of nodes,thus reducing the communication traffic and network energy consumption.In addition,LEFD also considers the nodes that may have transient faults.For transient faults,the fault values will be promptly corrected by LEFD to avoid mistaking the normal node as a faulty node,thus reducing false positive ratio.The simulation results show that LEFD has high detection accuracy,low false positive ratio and less energy consumption for various faults.Future research will study fault tolerance method for WSN,which may provide a new way for the effective transmission of data and ubiquitous routing.
ACKNOWLEDGMENTS
This work is supported by the National Natural Science Foundation of China No.61571162,61771186,Ministry of Education-China Mobile Research Foundation No.MCM20170106,Heilongjiang Province Natural Science Foundation No.F2016019 and University Nursing Program for Young Scholars with Creative Talents in Heilongjiang Province No.UNPYSCT-2017125.