|
Real-Time Implementation of Fault Detection in Wireless Sensor Networks Using Neural Networks
John W. Barron, Azzam I. Moustapha, and Rastko R. Selmic
Department of Electrical Engineering
College of Engineering and Science
Louisiana Tech University
Arizona Avenue, Nethken Hall 229
Ruston, LA 71272, USA
Tel: 318-257-4641 Fax: 318-257-4922
Email:
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
Abstract—This paper presents the real-time implementation of a neural network-based fault detection for wireless sensor networks (WSNs). The method is implemented on TinyOS operating system. A collection tree network is formed and multi-hoping data is sent to the base station root. Nodes take environmental measurements every N seconds while neighboring nodes overhear the measurement as it is being forwarded to the base station and record it. After nodes complete M and receive/store M measurements from each neighboring node, recurrent neural networks (RNNs) are used to model the sensor node, the node’s dynamics, and interconnections with neighboring nodes. The physical measurement is compared against the predicted value and a given threshold of error to determine sensor fault. The process of neural network training can be repeated indefinitely to maintain self aware network fault detection. By simply overhearing network traffic, this implementation uses no extra bandwidth or radio broadcast power. The only cost of the approach is battery power required to power the receiver to overhear packets and MCU processor time to train the RNN.
Index Terms—Fault detection, wireless sensor networks, neural networks.
Introduction
Wireless sensor networks (WSNs) consist of a set of sensor nodes that can communicate with each other, sensors that measure a desired physical quantity, and the system base station for data collection, processing, and connection to the wide area network. Modern wireless sensor nodes have microprocessors for local data processing, networking, and control purposes VIII. Increases in modern embedded computing power have given rise to many WSN applications. These applications range from medical projects to environmental measurements. For instance, networks have been developed to record vital signs and forward them to a base station for real time analyzing, possibly improving triage time. Also being developed are sensor boards that would record movement data in rehabilitation of stroke patients [2]. This raw data would improve physical therapists’ ability to track and quantify improvements.
The very heart of WSN technology is the ability to measure remote environmental qualities with low-cost nodes that are able to self group into a network topology to reliably forward data to a base station. A vineyard monitoring systems that measures soil moisture and the irrigation system’s water pressure is given in [3]. Environmental measurements can also help analyze structural health. A WSN has also been implemented that spans the Golden Gate Bridge in San Francisco, CA [4]. These nodes allow engineers monitor important remote qualities such as ambient vibrations both safely and cost effectively.
WSNs can take remote measurements, organize into a network, and forward data to a base station. Due to the environments they operate in and efforts to maintain cost effectiveness, node failures can occur. The probability of failure increases as the number of nodes in a network increases. The traditional solution to this problem is redundant systems; however, multiplication of sensor devices adds cost, complexity, and power consumption to the sensor node and whole network. Most of the present research efforts have concentrated on an analytical redundancy VIII, VIII in which sensor measurements are processed analytically and mathematical models are compared with physical measurements. However, with the limited onboard microprocessors and battery power, these approaches decrease the amount of measurements taken and increase processing time and battery consumption.
In a previous paper [7], we have presented the theory and modeling of WSNs with recurrent neural networks (RNNs). This approach has been implemented on Moteiv’s Tmote Sky platform running the operating system TinyOS.
Modified Recurrent Neural Nets in Sensor Network Modeling
A. Wireless Sensor Network Model using RNNs
Dynamic RNNs consist of a set of dynamic nodes that provide internal feedback to their own inputs, see Figure 1. They can be used to simulate and model dynamic systems such as a network of sensors. WSNs consist of a large number of sensors, which in turn have their own dynamics. They interact between themselves and the base station which controls the network. In a multi-hop wireless sensor network, information hops from one node to another, and finally to the network gateway or the base station.
To develop a dynamic model for such sensors, without a loss of generality, we assume that there is one sensor per sensor node. More sensors per node will just increase the size of the RNNs.
Sensor nodes can be viewed as small dynamic systems with memory-like features. Output of one node forwards the information to the next node (for example node 3 provides the input to node 5, Figure 1). While the standard RNN is structured in layers, we introduce an ad-hoc RNN analogous to WSN systems with confidence factors () between nodes i and j. The confidence factor depends on the signal strength and data quality in communication links between nodes. For instance, in tuning node 2, valuable inputs are coming from node 1 and node 4 providing that corresponding confidence factors are close to 1. If node 7 is not in the coverage area of node 2, then confidence factor is 0 and node 7 will not influence node 2 directly.
Figure 1. Ad hoc recurrent neural network with topology of a wireless sensor network.
Note that confidence factors do not provide stochastic modeling of the communication channel. The overall modeling process can be divided into two phases: the learning phase and the production phase. The learning phase is where the neural network (NN) adjusts its weights that correspond to the healthy and N faulty models, where N is the number of fault types. The production phase is where the current output of the sensor node is being compared with the output of the NN. The difference between these two signals is used as a measure of a sensor’s health status. In case of a fault, NN weights (model) are compared with the faulty models to isolate the fault. If there is no similar fault model, then the fault bank model is updated with the new type of fault and corresponding model parameters, i.e., NN weights. This whole process is repeated during the production phase.
It is shown in [7] that a sensor node i can be modeled as a modified recurrent neural network (RNN) with neural net inputs selected from previous output samples of the node and output samples of neighboring sensors.
, (9)
where Ri is an unknown, nonlinear function representing the model of a sensor node i. Neighboring sensor node outputs are , , …, , in is the n-th neighbor of sensor node i, and is the confidence factor between sensor node in and sensor node i, which is equal to 1 in case there is ideal communication between sensor node in and sensor node i. Detailed analysis about the modeling is given in [7].
Confidence factors for sensor node i are proportional to the signal strength between node i and its neighbors. A confidence factor between neighboring nodes i and j represents a “confidence” of sensor node i from data generated by sensor node j. The factor depends on certain parameters such as proximity and distance between two nodes, terrain between nodes, topology of the sensor network, and the received signal strength. We used received signal strength which can be obtained from receiving sensor nodes as a measure of the confidence factor. As the received signal strength decreases, the confidence factor will decrease and eventually reach zero in the worst case scenario.
B. Application to Sensor Node Fault Detection
Previous results provide a tool for approximating a wireless sensor node output using recurrent neural networks. The method can be applied to a wide range of nonlinear dynamic models. A motivation for the above results stems from the need to detect faults in a network of distributed, WSN nodes.
In order to detect possible sensor faults at the node level, we compare the real output and the recurrent neural net (RNN) approximation model. If such a difference is larger than a threshold, then there is a fault at the sensor. For a sensor node i, its real output, and a RNN model output , if , then there is a fault at the sensor node i.
Figures 2.a-b show the structure of the modified recurrent network with inputs consisting of the delayed output signals of the same NN and the previous and current modified output signals from neighboring sensors. It is initially assumed that all confidence factors between node i and the neighboring nodes are equal to 1. Figure 2.a shows the topology during the learning phase and Figure 2.b during the production phase, where a fault analyzer detects the difference between sensor and modified RNN.
Figure 2.(a) Block diagram of the system identification in the learning phase.
Figure 2.(b) Block diagram of the system identification in the production phase.
Implementation
The hardware testbed uses Moteiv’s Tmote Sky wireless sensor node modules. The nodes have onboard microcontroller, wireless radio stack, and the ability to take sensor readings of temperature, light, and humidity [8]. The microcontroller is an 8 MHz Texas Instruments MSP430 microcontroller. The radio chipset is a 2.4 GHz Chipcon CC2420 wireless transceiver with an integrated PCB trace antenna [9]. The onboard temperature sensor used in this implementation is Sensirion’s SHT11 temperature/humidity sensor [10]. The Tmote Sky node requires a minimum operating voltage of 2.1 volts. To conserve power, this implementation has taken aspects of bandwidth, transmitting/receiving power, and MCU processing time into consideration. All information needed for the training and prediction of the NN is gathered by simply overhearing radio transmissions. As environmental measurements are taken and multi-hop forwarded to the root of the collection tree, neighboring nodes overhear the transmissions and record the neighbor’s measurements for the local node’s NN training. This approach uses no extra radio transmission power or network bandwidth.
The software coding of the implementation was done using TinyOS, an open-source operating system designed for wireless embedded sensor networks [11]. It is specifically designed for the embedded systems with memory constraints and low power consumption. TinyOS uses the programming language NesC [12] which is similar to the programming language C, with big differences in the linking model. NesC programs include a configuration file and a module file. The module file looks much like event driven C coding, but the configuration file is the key to NesC. TinyOS has been ported for dozens of hardware platforms and many more chipsets. The configuration file allows the programmer to link the module code to specific hardware chipsets, background functions, and communication network topologies.
To decrease development time, many common actions of WSNs are already built-in to TinyOS architecture. An essential function of any WSN is to relay the collected data to a base station for future processing and analysis. TinyOS has addressed this issue and has included provisions for Collection Tree Protocol (CTP) [13]. CTP is based on a tree network where the base station is defined as the root of the tree and all other nodes branch out from their parent nodes in the network. The routing engine is based on expected transmissions (ETX). The ETX of a node is the EXT of its parent plus the link level EXT to its parent. When a node searches for acceptable routes, it will choose the route with the lowest ETX. Because TinyOS is component based, sending messages to the root of a network is very similar to sending messages to a specific node. To multi-hop a message to the root of the network, the Send.Send() command is linked to the collection component instead of the address driven message sending component used to send node-to-node messages. This component will forward the packet to the Multi-hop Forwarding Engine and relay the packet through the tree to the base station via the route with the lowest ETX. The collection layer only triggers the Receive.Receive() event when a packet reaches its final destination (normally the base station). In the fault detection implementation, neighboring nodes need access to the forwarded information to train their NN. Under normal network operation, all packets are received from the radio and screened at hardware level to determine if the packet is needed by the receiving node. The Snoop.Receive() event bypasses this check and is triggered anytime a packet is received by a node. Packets’ origin can then be compared against the list of nodes in the routing engine’s neighbor table. If the packet is from a valid neighbor the information is stored for future training.
Code Overview
Upon applying voltage to the sensor node, the node will go through a pre-defined boot-up sequence. This will initialize components such as the radio, network engine, and specific sensors. Once the components have successfully come online, a periodic timer is started with a period T. Because TinyOS is event driven, this allows the node to remain idle until an event handler is triggered. From this point on in the program, the code is no longer sequentially executed. TinyOS will handle events as they occur.
Upon receiving a message, the node must consult the forwarding engine to ensure it is in the nodes neighbor table. If the message was overheard from a viable neighbor, the local node stores the information and checks to ensure all the data is now gathered. If data collection is complete, it sets a flag and checks another routine’s flag status. If the situation dictates, the program will enter the NN training subroutine. This process can be seen in Figure 3.
Figure 3. Flow chart of subroutine run after message is received.
The workhorse of this program is the subroutine (Figure 4) that is executed each time the periodic timer fires.
Figure 4. Flow chart of subroutine run when periodic timer fires.
First, it shall make a call to the sensor to take a reading. If training and prediction were completed prior to this timer cycle, the node would now compare this reading against the NN predicted reading. If the error was greater than a threshold setting the sensor would have a fault. This information would be relayed so that proper maintenance attention would be administered. However, if the data was within the threshold range, the data would be verified as good, and preparation would be taken to complete the cycle again.
Had training and prediction not been completed prior, the sensor would store the new measurement in an array. Next, the subroutine would inspect a counter to determine if this was the M measurement. If so, a flag would be set and if other subroutine's flags allowed, NN training would begin. The actual training of the NN is completed as described earlier. Once training has been completed and the next measurement has been predicted, a flag is set to signal that the node is ready to compare the next measurement against the NN’s prediction.
Experimental Setup
An experiment was conducted using a nine node network. These nodes were configured in a collection tree topology with two branches, as shown in Figure 5.
Figure 5. Network topology setup for experimental data collection.
The two branches were placed in different climates (i.e. separate rooms with differing temperatures.) Temperature measurements were taken and forwarded to the base station node via the route the black lines show. As these data samples are sent to the base station, nodes are listening to overhear data from their neighbors shown by the red lines. After sufficient data was collected to train the network, RNN predictions and fault detection began.
Results
To simplify the results, we will examine the data collected from a single branch shown in Figure 7.
Figure 6. Experimental results from a branch of the collection tree.
This branch consists of four neighboring nodes shown connected by red lines in Figure 5. Neighboring node 1 was placed in a sunny window, while the other three nodes were dispersed throughout the room. At sampling point 9, an air conditioning unit was turned on to emulate a faulty sensor. We can see that the training node was the closest to the air conditioning unit followed by neighboring nodes 3 and 2. At sampling point 10, NN prediction and fault detection was started. The NN prediction is shown as a dotted line in Figure 6. At sampling point 15, the training node was placed directly on the air conditioning unit to simulate a quick drift fault. The fault was detected at sampling point 16. A fault is defined as the real world measured value lying outside a ±2 degree threshold of the prediction. Although only one node’s training is shown here, all nodes except the base station node are training, predicting, and detecting faults simultaneously.
To reduce power consumption and MCU runtime needed to complete the training process, the RNN is configured to allow the highest level of error acceptable in the result. This tolerance saves valuable seconds of processor time during training iterations by omitting results calculations to unneeded significant digits. The implementation set a precision goal of less than one degree Fahrenheit. Obtaining accuracy to hundredths of a degree was considered wasted power and time for our application. With this goal in mind, we were able to tune down the NN to achieve the process from beginning of neural network training to prediction of the next measurements is less than 12 seconds. Depending upon the required accuracy needed this time could be further shortened or extended in other applications.
Power consumption measurements were isolated to the MCU. This was done to remove all variables such as radio stack and sensor power consumption. Due to the rapidly advancing nature of node platforms, there are many available chipsets all drawing differing amounts of power. Therefore, amperage readings were taken with all peripherals other than the MCU off. While the MCU was at idle, the current draw was 7 uA and while calculating weights for the NN training it was 1800 uA. Using the standard 3.0 volts and requiring 12 seconds to train, this process only draws 0.0648 joules. This is minuscule when compared to the amount of power drawn by only the onboard CC2420 low power radio. While initialized, the idle radio consumes 20mA. Over the same 12 second timeframe the radio alone draws 0.72 joules. Therefore, this fault detection application has extremely low power consumption.
Conclusion
As noted above, the traditional methods of fault detection rely on hardware. Because this approach is software based, it can be implemented at a lower cost and can be easily upgraded in older systems. By simply overhearing data as it is forwarded to the base station, no extra bandwidth or redundant hardware is needed to detect a fault.
The price to be paid for this approach is an amount of processor time and battery power. Neural network training can be completed in a matter of seconds, and with the implementation of TinyOS’s task scheduler, the system remains extremely responsive during the training. During training, nodes can continue to take measurements, transmit data, and forward packets to the base station.
Based on distance, neighboring nodes that are closer to the local node are more likely to have similar values. This is especially true in environmental measurements. Due to laws of diffusion, nodes with closer proximity will likely have closer temperature readings than neighboring nodes of farther distance. This information can increase prediction accuracy when attached to NN weights during training. Without adding extra hardware to preserve nodes’ cost effectiveness, the only onboard equipment to estimate physical distance between nodes is the received signal strength indictor (RSSI). The confidence factor was attempted to be implemented, but due to interferences and noise, the values would change wildly as the actual distance change was only inches. From numerous projects’ efforts on localization, we know current indoor RSSI measurements are not capable of reliably measuring precise distance [14]. While the confidence factor approach is ahead of its time, future advances in low cost localization algorithms will allow its implementation.
For indefinite real time fault detection, the minimum sampling interval N must be less than the time required to train. A node must be able to train, predict, compare, and store new data every N seconds. With sampling interval N greater than training time, uniform sampling points can be taken with enough time between sampling for training, prediction, and fault detection.
References
[1]Crossbow, http://www.xbow.com.
[2]Harvard University. “Code Blue: Wireless Sensor Networks for Medical Care.” http://www.eecs.harvard.edu/~mdw/proj/codeblue.
[3]M. Holler Ed. “Camalie Net: Wireless Sensor Network at Camalie Vineyards - Mt. Veeder, Napa Valley, California." http://camalie.com/WirelessSensing/WirelessSensors.
[4]UC Berkeley. “Structural Health Monitoring of the Golden Gate Bridge”. http://www.cs.berkeley.edu/~binetude/ggb.
[5]S. C. Lee, “Sensor Value Validation Based On Systematic Exploration of the Sensor Redundancy For Fault Diagnosis”, IEEE Trans. on Systems, Man and Cybernetics, vol.24, no. 4, pp. 594-605, Apr. 1994.
[6]M. L. Leushen, J. R. Cavallaro, I. D. Walker, “Robotic Fault Detection Using Analytical Redundancy”, in Proc. IEEE Conf. Robotics and Automation, 2002, pp. 456-463.
[7]A. I. Moustapha and R. R. Selmic, “Wireless Sensor Network Modeling Using Modified Recurrent Neural Networks: Application To Failure Detection” To be published in: IEEE Transactions on Instrumentation and Measurement, March 2008.
[8]Moteiv Corporation. “Tmote Sky Datasheet.” V1.04. November 2006. http://www.moteiv.com.
[9]Chipcon. “SmartRF CC2420 datasheet.” V1.2. June 2004. http://www.chipcon.com.
[10]Sensirion. SHT1x/SHT7x Datasheet. V2.0. March 2003 www.sensirion.com.
[11]UC Berkley, www.TinyOS.net.
[12]P. Levis.“TinyOS Programming.” June 2006. www.TinyOS.net.
[13]R. Fonseca, O. Gnawali, K. Jamieson, S. Kim, P. Levis, and A. Woo. “The Collection Tree Protocol (CTP)”. TinyOS Extension Proposal 123. V1.8. August 2006.
[14]Y Chraibi. “Localization in Wireless Sensor Networks.” KTH Signal Sensors and Systems. Stockholm, Sweden. 2005.
|