A Novel Search Engine for Internet of Everything Based on Dynamic Prediction

2019-03-21 07:21HuiLuShenSuZhihongTianChunshengZhu
China Communications 2019年3期

Hui Lu,Shen Su,*,Zhihong Tian,*,Chunsheng Zhu

1 Cyberspace Institute of Advanced Technology,Guangzhou University,Guangzhou 510006,China

2 Department of Electrical and Computer Engineering,The University of British Columbia,V6T 1Z4,Canada

Abstract:In recent years,with the rapid development of sensing technology and deployment of various Internet of Everything devices,it becomes a crucial and practical challenge to enable real-time search queries for objects,data,and services in the Internet of Everything.Moreover,such efficient query processing techniques can provide strong facilitate the research on Internet of Everything security issues.By looking into the unique characteristics in the IoE application environment,such as high heterogeneity,high dynamics,and distributed,we develop a novel search engine model,and build a dynamic prediction model of the IoE sensor time series to meet the real-time requirements for the Internet of Everything search environment.We validated the accuracy and effectiveness of the dynamic prediction model using a public sensor dataset from Intel Lab.

Keywords:IoE search engine; IoE security; real-time search model; dynamic prediction model; time series prediction

I.INTRODUCTION

With the development of Internet and sensing technologies,the deployment speed of devices,such as smart phones,power meters,heart rate monitors,thermometers,and various sensors,is growing rapidly.These sensors have formed the Internet of Everything (IoE),and IoE attempts to connect unique identifiers and addressing objects based on standard communication protocols to the Internet [1,2].Billions of heterogeneous data items have been created [3,4].Analysts predict that by 2020,the number of connected objects will reach 212 billion [5].At the same time,these sensor nodes generate a large amount of real-time data [6].Over time,the Internet of Everything will introduce more intelligent objects and countless related services.In such large and highly heterogeneous real-time data,finding data that meets user requirements will become an important research hotspot in the Internet of Everything.At the same time,with the development of intelligence,the Internet of Everything will bring new cyber security challenges while bringing convenience.The development of Internet of Everything search technology can provide strong support for the development and investigation of IoE security issues.The literature [7,8] research on the use of deep learning models to detect dangerous intrusion information sources in the Internet of Thing.The literature [9,10] research on the disclosure of data information in the Internet of Things proposes new detection and sensing methods.The literature [11] research on the correlation in intelligent networks.The literature [12] studies a hidden attack in the Internet of Things.The literature [13] proposes an evaluation method for the trust level of the Internet of Things structure.The literature [14,15] pointed out that searching for objects,data and services in the Internet of Everything is a key challenge,especially in real-time environments.

Due to the distributed,large-scale,heterogeneous,multi-modal and dynamic characteristics of the Internet of Everything application environment,the reliable transmission of data on the Internet of Everything has certain challenges.Therefore,a variety of reliable transport IOE protocols and communication structures have been proposed [16,17].Under the new characteristics and needs,the existing Internet web search methods are less suitable in the Internet of Everything.Therefore,it is necessary to calculate a search mode suitable for the Internet of Everything application environment.Since the 1990's,extensive research has been conducted on search engine models that play an important role on the Internet,and a large number of research results have been obtained [18,19].As a new search model,the search mode under the Internet of Everything scenario is in the initial stage of research,and there are still many key issues to be solved.A distributed,highly dynamic IoE application scenario requires an efficient,extensible indexing and sorting mechanism.[20] pointed out that the design of the indexing mechanism of IoE application scenarios requires the integration of quality analysis,trust management and usability analysis methods to process massive amounts of semantically described data and services.Data and service discovery mechanisms also need to consider automatic association and semantic description to provide an extensible framework for access and search of IoE information.This new method of publishing data and services,distributed indexing and discovery methods,and effective information and service subscription access methods are all future research hotspots in the field of Internet of Everything search.In order to discover the IoE device resources connected to the network in real time,[21] designed the search engine,WOTS2E,which is suitable for semantic WOT,to provide standardized,scalable and flexible discovery methods.

In most IoE search scenarios,users are not sensitive to the raw data collected by the sensor,but pay more attentions to the advanced state of the sensor after the fusion process.Based on the assumption that the sensor output is in a high-level state,[22] proposes a search engine named Dyser that can use the existing network architecture for IoE entity search.It constructs a predictive model that senses the state transition of the sensor state.By calculating the probability that the sensor page matches the dynamic attribute of the search request,a set of physical sensors that may match the search request is obtained,and then the physical sensors in the set are accessed to confirm whether match,thus reducing the communication overhead in the search process.

The main contributions of this paper are summarized as follows:

1) This paper proposes a new search framework based on the characteristics of IoE,which enables flexible access among network nodes and real-time search of user demand data.

2) we design a dynamic data prediction algorithm,which is not computationally complex and suitable for IoE search requests under resource-constrained conditions.

The structure of this paper is as follows:the second part establishes the Internet of Everything search model suitable for searching in the Internet of Everything.The third part builds the dynamic prediction model of sensor data in the Internet of Everything.The fourth part carries on the data simulation and the accuracy test to the dynamic prediction model.The fifth part is the conclusion of this paper.

II.DESIGN OF INTERNET OF EVERYTHING SEARCH MODEL

The environment of the Internet of Everything has the characteristics of distributed,largescale,heterogeneous,multi-mode,dynamic and so on which causes users to query all sensor nodes in a certain period in the process of receiving the feedback results from the client to the user to receive the search.Not only will it bring a lot of communication overheads and time consumption,but in the case that the data of the IoE sensor node is highly variable,if the sensor data of the user initiating the query time is returned,the real-time requirement of the user search cannot be satisfied.In order to meet the real-time data collected by sensors in the Internet of Everything,this paper designs a novel framework for IoE search,and designs a prediction model of sensor data based on the historical data reported by the sensor to cope with real-time in the Internet of Everything.The designed flow chart of Internet of Everything search is shown in figure 1.

Fig.1.IoE search flow chart.

Fig.2.Hierarchical architecture of the Internet of Everything search mode.

The sensor network collects corresponding environmental information data and reports it to the local network management system.The local network management stores the data collected by the sensor to the local database.The prediction model extracts the historical data of the sensor from the local database,and predicts the data of the sensor through the prediction algorithm in the model,and then sends the predicted results to the central database of the whole world.When the user sends a search request through the client,the user's search requests are parsed semantically.At the same time,the data after the analysis and the sensor data in the center database are matched to generate the correlation results with the correlation degree of the user query.The results are sorted according to the degree of correlation of the results,and the ranking results are fed back to the user.

The hierarchical architecture diagram of the Internet of Everything search mode,as shown in Figure 2,is divided into four layers of search structure,the application layer parses the search requests initiated by the user,because the users are all searching for their own search requests through natural language,so it is necessary to parse and search the natural language for the user search requests.At the same time,the application layer matches and sorts the search results of the search,and preferentially feeds back the results that best meet the user search request to the user.The core layer is divided into two parts.The first part performs data cleaning,data analysis,data preprocessing,etc.on the data reported by the network layer,so that the data can be input into the data prediction model in a standardized form,and the data collected by the sensor is predicted.Since the sensor nodes in the Internet of Everything and between the sensors and the gateway can be connected by different protocols,the network layer provides a stable communication environment for the data report collected by the sensor and the release of the user request.The sensing layer forms a sensor network through sensor nodes,and collects surrounding environmental data and sensor state data.

III.SENSOR CONTENT DATA DYNAMIC PREDICTION MODEL

The content data of the sensor is the real-time environmental data collected by the sensor.The data collected by most sensors is reported in time series and most of the collected environmental data is differentiated to meet a certain stability.For example,the ambient temperature data collected by the temperature sensor,the humidity data collected by the humidity sensor,the PM2.5 concentration data collected by the PM2.5 sensor,and the motion data of the object collected by the speed sensor are transmitted in time series.In order to satisfy the user's request for real-time search,it is necessary to predict the corresponding time data of the sensor at a future time based on the historical time data of the sensor.Since most nodes in the Internet of Everything are resource-constrained nodes,lightweight prediction algorithms should be used to reduce the communication overhead and computational overhead of nodes.

Based on the traditional time series prediction model [23,24],namely,the Box-Jenkins method,the basic idea of this approach is to provide a large class of models with sufficient parameters to accommodate various datasets.There are many classic predictive models for this approach,such as Autoregressive model (AR),the moving average model (MA),the Autoregressive moving average model (ARMA),and the Autoregressive integrated moving average model(ARIMA).Based on the Autoregressive integrated moving average (ARIMA) time series algorithm model,we establish a time series dynamic prediction model for sensor content data collection.

Suppose that after the sensor periodically collects and reports data and differentiates the collected data,the data has a stable characteristic.When the data X at time q is (tq,Xq),the set of numerical sequences generated by the sensor from the beginning to the time n is {(t0,X0),(t1,X1),(t2,X2)....(tq,Xq)....(tn,Xn)}.

3.1 Autoregressive moving average ARIMA (p,d,q) model of sensor time series

The time series data collected by the sensor is differentially processed to make the time series collected by the sensor have a stable characteristic.Because the time series collected by the sensor has periodic fluctuations,the time difference can be made according to the time period of the data reported by the sensor.The purpose is to change the time series with longterm influence of the random error into a time series with only temporary influence.That is,the new sequence conforms to the ARMA(p,q) model after the differential processing,and the original sequence conforms to the ARIMA(p,d,q) model.

The influence and effect of relevant factors on the prediction target of the sensor time series are reflected by the historical observations of the sensor time series variables.The value of the sensor at time t is predicted based on the historical value of the sensor at time t.That is,the p-order AR(p) Autoregressive model of the sensor time series is

Among them,εtis the noise sequence,δis a constant,and the noise sequence satisfies the following conditions

And the time series of the sensor value,that is,the independent variable of the regression is independent of the residual,which satisfies

The current predicted value is expressed by a linear combination of random interference or prediction error for each period in the past.The moving average MA(q) model of the past noise sequence of the sensor value

Among them,

The combination of AR and MA yields an Autoregressive moving average ARIMA(p,q) model,and then use a ratio of two polynomials to approximate a longer AR polynomial,where the number of p + q is smaller than the order p in the AR(p) model.

Among them,εtis the noise sequence,δis a constant,and the noise sequence satisfies the following conditions

And the time series of the sensor value,which is the independent variable of the regression,is independent of the residual,which satisfies

3.2 Selection of related parameters of sensor time series ARIMA(p,d,q) model

The autocorrelation coefficient ACF of the sensor time series,in a given sensor time seriesXt,the autocorrelation coefficients ACFXt-1,Xt-2,...,Xt-k+1,XtandXt-kare

The partial autocorrelation coefficient PACF of the sensor time series,that is,remove the correlation coefficient between the intermediate time and the sensor time series at two time points.Given the sensor time seriesXt,the conditional correlation coefficients forXt-1,Xt-2,...,Xt-k+1,Xt-kandXt-kare

That is

Among them,φis the partial autocorrelation coefficient,and,ρare the autocorrelation coefficients calculated above.

Selection of p,q in ARIMA(p,d,q)

AR(p) MA(q) ARIMA(p,q)ACF tailing Censored after q period tailing PACF Censored after p period tailing tailing

The time series values of the sensors are random.The ACF of the time series samples,PACF will not present a perfect theoretical truncation,but will oscillate around zero.Therefore,the hypothesis test is performed by statistical means to find the order of q and p in 95% confidence intervals of each.

Barlett test:suppose that the autocorrelation coefficient satisfies.

Quenouille test:suppose that the partial autocorrelation coefficient satisfies.

Find the respective 95% confidence intervals

If the value before the m-order is obviously outside the confidence interval,almost 95% of the value after the m-order falls within the confidence interval,and the attenuation of the value is very sudden before and after the m-order,then the truncation order is judged to be m.The values of p,q of the sensor time series ARIMA(p,d,q) model are defined in order.The value of d selects the first-order difference or the second-order difference of the sensor time series to satisfy the stationary condition of the sensor time series.

IV.SIMULATION AND EXPERIMENTAL VERIFICATION

The experimental data for this model comes from Intel Lab's public dataset [25],Intel Lab Data contains information about data collected from 54 sensors deployed in the Intel Berkeley Research lab between February 28th and April 5th,2004.The sensor deployment location is shown in Figure 3.

Select the temperature sensor data at coordinate 1,and then dynamically predict its value.First,the time series of the sensor is tested for stationarity,since the sequence stability of the test time series is a prerequisite for performing time series analysis.There are two definitions of a stationary time series:strictly stationary and wide stationary.Strictly stationary is a very demanding stationarity that requires the sequence to remain constant over time.The time series of the environment collected by the sensor cannot meet the conditions of strictly stationary.The other is wide stationary,also called weak stationary or second-order stationary (mean and variance stationary),and should satisfy the condition that the mean is constant,the variance is constant,and the auto covariance is constant.

Since the reporting period of the data center sensor is relatively frequent,in order to facilitate testing and simulation of the temperature sensor at coordinate 1,the time series data of the collected temperature sensor is down sampled,and the sampling frequency is 30 minutes The first-order,second-order difference,and differentially processed sensor data of the simultaneously calculated sensor are shown in Figure 4,and some of the data are shown in table 1.

The data distribution line diagram of the first-order difference of the sensor,the frequency distribution map,the autocorrelation function distribution map,and the partial autocorrelation function distribution diagram are shown in figure 5.

Fig.3.Sensor coordinate map.

Table I.Temperature sensor time series data at coordinate 1.

The data distribution line graph,frequency distribution map,autocorrelation function,and partial autocorrelation function distribution of the data of the second-order difference of the sensor are shown in figure 6.

Fig.4.Sensor time series first-order difference,second-order differential data distribution map.

Fig.5.Data analysis diagram of the temperature difference of the temperature sensor.

Fig.6.Data analysis diagram of the second-order difference of temperature sensor.

The ADF unit root test method is used to test the stationarity of the sensor data with coordinate 1.The calculated data is shown in table 2.The P-value corresponding to the unit root test statistic is far less than 0.05,then we can think that the first-order differential time series of is a stationary sequence and does not require second-order differential processing of the sensor's time series.Therefore,the ARMA model is built with the first-order differential data of the temperature sensor.

Bayesian information criterion and akaike information criterion are used to select the intermediate frequency,p and q parameters of the ARMA(p,d,q) model of sensor time series.The heat map of the BIC calculation results of different p,q orders is shown in figure 7.The optimal P and q values of AIC and BIC are both 3.

According to the time series of the sensor,we establish the frequency p=3,d=1,q=3,ARIMA (3,1,3) time series prediction model,and train the model on the training set,and perform Ljung-Box test,namely Q test,on the residual sequence in the model.The relevant results of the model during the training process are shown in table 3 and Figure 8.

Predict the time series values of the test set according to the trained ARIMA model.The selected training set data is in the coordinate 1 position 2004-02-28 00:00:00 - 2004-03-06 00:00:00 time period Temperature sensor data,test set 2004-03-06 00:30:00 - 2004-03-07 00:00:00 time period temperature time series data.The image of the model prediction difference value and the temperature sensor raw difference value is shown in figure 9.

We randomly selected 10 time stamps from the data in the period of February 28th and April 5th,2004,when the data is concentrated.We evaluate the accuracy of the model.The time period of the training set is selected as 2004.02.28 00:00:00 - 2004.03.06 00:00:00,2004.03.04 00:00:00 - 2004.03.11 00:00:00,2004.03.09 00:00:00 - 2004.03.16 00:00:00,2004.03.24 00:00:00 - 2004.03.31 00:00:00,2004.03.02 00:00:00 - 2004.03.09 00:00:00,2004.03.07 00:00:00 - 2004.03.14 00:00:00,2004.03.15 00:00:00 - 2004.03.22 00:00:00,2004.03.01 00:00:00 - 2004.03.08 00:00:00,2004.03.13 00:00:00 - 2004.03.20 00:00:00,2004.03.13 00:00:00 - 2004.03.20 00:00:00.A line chart of ten predictions and true worthy accuracy is shown in Figure 9.The mean,variance,maximum,and minimum values of the ten prediction accuracy rates are shown in table 4 and Figure 10.

Usually,in the process of searching for things in the Internet of Everything,the accuracy of the numerical value is not reasonable,and there is a certain tolerance to the error.If the search request is not high-precision and high-accuracy,the result of the dynamic data prediction model can satisfy the user's search request,and the computational complexity of the dynamic data prediction algorithm is not high,and the resource-constrained situation in the object-oriented network can be The search request is processed efficiently.

Table II.ADF unit root test data.

Fig.7.BIC heat map of different orders of p,q.

Fig.8.Temperature sensor time series ARIMA model analysis chart.

V.CONCLUSION

To accommodate the unique characteristics of the Internet of Everything environment,i.e.,heterogeneity,multi-modality and dynamics,we design a novel model for entity search in the Internet of Everything.At the same time,in order to adapt to the highly dynamic characteristics of the Internet of Everything,the time series values of the sensor data in the Internet of Everything are predicted to meet the real-time requirements of the user's Internet of Everything search.When designing the dynamic data prediction model,it is assumed that the time series difference reported by the sensor nodes in the Internet of Everything has weak stationary characteristics,which is a prerequisite for the dynamic data prediction model proposed in this paper.If the reported data of the sensor node does not satisfy the condition,a new algorithm for dynamic data prediction needs to be designed.At the same time,the dynamic data prediction algorithm proposed in this paper is not computationally complex and suitable for IoE search requests under resource-constrained conditions.

Table III.Related parameter table of temperature sensor time series ARIMA model.

Table IV.Ten random sampling model prediction accuracy statistics.

Fig.9.Model sensor predictive difference value and temperature sensor true differential value line chart.

Fig.10.Ten times random sampling model prediction accuracy rate line graph.ferential value line chart.

ACKNOWLEDGMENT

This work is supported by the National Natural Science Foundation of China under NO.61572153,NO.61702220,NO.61702223,and NO.U1636215.and the National Key research and Development Plan (Grant No.2018YFB0803504).