Lu Yin ,Xue Yongtao ,Li Qingyuan ,Wu Luocheng ,Li Taosen ,Yang Peipei ,Zhu Hongbo,4
1 College of Internet of Things,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
2 Key Lab of Broadband Wireless Communication and Sensor Network Technology(Nanjing University of Posts and Telecommunications),Ministry of Education,Nanjing 210003,China
3 Jiangsu Key Laboratory of Wireless Communications,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
4 Research Institute of Internet of Things,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
5 Bell Honors School,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
6 School of Automation,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
Abstract: Traditional IoT systems suffer from high equipment management costs and difficulty in trustworthy data sharing caused by centralization.Blockchain provides a feasible research direction to solve these problems.The main challenge at this stage is to integrate the blockchain from the resourceconstrained IoT devices and ensure the data of IoT system is credible.We provide a general framework for intelligent IoT data acquisition and sharing in an untrusted environment based on the blockchain,where gateways become Oracles.A distributed Oracle network based on Byzantine Fault Tolerant algorithm is used to provide trusted data for the blockchain to make intelligent IoT data trustworthy.An aggregation contract is deployed to collect data from various Oracle and share the credible data to all on-chain users.We also propose a gateway data aggregation scheme based on the REST API event publishing/subscribing mechanism which uses SQL to achieve flexible data aggregation.The experimental results show that the proposed scheme can alleviate the problem of limited performance of IoT equipment,make data reliable,and meet the diverse data needs on the chain.
Keywords: blockchain;data sharing;Internet of Things;Oracle
The rapid development of intelligent Internet of Things(IoT)applications has attracted more and more attention in the fields of agriculture,medical care,transportation and manufacturing [1].The growth of the scale and complexity of IoT as well as the constantly updated application scenarios make it more dynamic.However,some problems have arisen in data security,device management,data storage,etc.For example,most existing intelligent IoT system architectures are centralized [2],whose terminal devices with limited performance rely on cloud servers to achieve large-scale computing and storage,such as identification,authentication,data computing,etc.Data is usually transmitted from the bottom layer to the top layer based on the tree-level structure,and finally received uniformly by the cloud server.This mode heavily relies on the performance of the centralized server.Once the server is attacked,the entire system will collapse.In addition,in the process of data aggregation,the crash of a single node will cause all the data under it to be invalid,leading to security issues such as data leakage in the system[3].
Blockchain is a distributed data structure consisting of a distributed network of peer-to-peer nodes.It does not rely on a single centralized management,allowing parties to complete nontrusted environment transactions without thirdparty assistance.Representative systems in the blockchain include the Bitcoin system[4] and Ethereum [5].Compared with the Bitcoin system,Ethereum uses smart contracts to additionally implement the programmable computing of the blockchain.A smart contract is an automated script that can be triggered by data on the blockchain and cannot be changed once deployed.Its appearance makes it possible to implement blockchain applications[6].
Oracle is a mechanism that can write the external information of blockchain into the chain,which solves the problem that the smart contract cannot obtain the external data of blockchain by itself.Unfortunately,many academic papers on blockchain and business deployment are biased or incomplete on the Oracle problem [7].But in the real world,the application of the blockchain depends on Oracle,and Oracle may reintroduce the concept of centralization.Therefore,it is particularly important to use a decentralized Oracle network to supply data to smart contracts.
At present,the application scenarios of IoT are usually closed.Each IoT device is unified and coordinated by the background centralized server for data collection.Social trust will become a real bottleneck when IoT entities enter another system and message sharing involves multiple stakeholders.AlBreiki et al.discussed the concept of trust in Oracles in the blockchain ecosystem while comparing the leading blockchain Oracle technologies and platforms in detail[8].
In recent years,a lot of work has been done on the application of blockchain to intelligent IoT.Lao et al.analyzed some popular blockchain applications,compared different consensus algorithms,and provided a traffic model for IoT blockchain systems[9].G.Vonitsanos et al.summarized the latest developments,requirements and challenges in various fields for the application of blockchain systems to smart cities [10].Z.A.Khan and others investigated the application of blockchain in IoT integration and security,listed a variety of integration technologies to integrate IoT data for processing and storage,and summarized IoT vulnerabilities in multiple aspects and corresponding blockchain solutions[11].
Blockchain nodes have high requirements for the computing performance and energy consumption of devices.Most IoT devices are heterogeneous and have limited performance,which cannot meet the resource requirements to run blockchain nodes.Dorri et al.proposed to deploy a high-resource device in each home to handle node requests,and the node itself also has a private blockchain network as the control and audit of the home environment[12].Tahmasebi et al.proposed to use fog nodes to solve the resource limitation of IoT devices,and generate hash codes from the data stored in the database and send them to the blockchain network to alleviate the limitation of transaction throughput [13].Ye et al.considered the offloading roles of primary and replica nodes,block size and block interval to minimize consumption costs,and maximize the transaction throughput of the blockchain system to ensure data security and computational efficiency [14].Le-Dang and Le-Ngoc proposed a scalable IoT device management solution based on blockchain,which uses REST API based event publishing/subscribing mechanism to decouple IoT devices from blockchain and reduce performance requirements for edge devices[15].
In terms of data storage and management,Wang[16],Rupasena[17]and Tulkinbekov[18]stored most of the data on the chain in external cloud servers.Kanade proposed a secure decentralized storage solution for user storage sharing[19].Liu et al.integrated a decentralized blockchain network and a distributed storage network.The data generated by IoT devices was stored in the distributed storage nodes of the peerto-peer network,and the data identifiers were stored in the blockchain[20].Both Zhai[21]and[22]had used the Chameleon hash algorithm,so that the data stored on the chain could be dynamically updated.Wang et al.also proposed a concept of self-updating secret sharing,allowing users to update trapdoors and corresponding hash keys after editing the blockchain.
In terms of security and privacy,Cha et al.proposed a blockchain-connected gateway design,and used the proposed signature scheme PDSS to encrypt user data to ensure that the data stored in the gateway will not be leaked [23].Salih Mohammed proposed an IoT data transmission protection framework called HFSDS-IoT,in which the proposed attack authentication mechanism(IDS)was used to filter malicious IP[24].Lu et al.proposed a two-level data aggregation scheme with a three-tier architecture.In the scheme,homomorphic encryption ensured the security of data transmission,and finally read blockchain data through private key analysis in the cloud layer [25].Mena et al.proposed a way to configure a whitelist on the gateway to filter access IP to prevent basic devices from being attacked [26].Zhou et al.proposed an optimized blockchain deployment mechanism[27].They grouped all nodes into multiple consensus units.All nodes in each consensus unit maintained a complete blockchain,and each consensus unit nodes stored only part of the full blockchain.In addition,they developed a chaotic genetic algorithm to dynamically adjust the optimal block allocation,which could be a good tradeoff between the length of blockchain to be stored and the level of security provided.
To sum up,the research on the application of blockchain to the intelligent IoT has made great progress,but there are still some problems.Firstly,for the performance bottleneck of edge devices,most studies have not considered the application cost issue.Some solutions propose to run blockchain nodes at the perception layer.This requires the existing infrastructure to be replaced and updated regularly.It is not a wise choice.Secondly,data provenance is critical to blockchain systems.If a large amount of false data is generated on the chain,users will have a crisis of trust in the system.Finally,the rapid development of the scale of IoT has brought about explosive growth in the amount of data,which greatly increase the storage pressure of the blockchain nodes and the throughput pressure of the network.The data under the chain should be selectively uploaded to the chain so that the data on the chain are valuable.
In view of the challenges and lack of research on the above problems,considering the integration of blockchain,resource-constrained IoT devices,and the credibility of IoT data,this paper provides a blockchain-based intelligent IoT data acquisition and shares a common framework.Aiming at the performance bottleneck of edge devices,the REST API is used to decouple blockchain nodes and IoT devices,so that IoT device administrators do not need to care about node storage and the other issues brought by blockchain nodes.Chainlink Oracle network,whose mechanism is based on Byzantine fault tolerance,is used for data credibility and data sharing by setting a public aggregation contract.Aiming at the massive data in the large-scale IoT,a data aggregation scheme based on SQL language in the gateway is proposed.At the same time,we also design a data structure so that the aggregation method can be determined by the data requester on the blockchain.The main innovations and contributions of this paper are as follows:
1.A general framework for intelligent IoT data acquisition and sharing in a blockchain-based and untrusted environment is proposed,which provides a credible data acquisition method and expands the data source of the contract on the chain to achieve data sharing.
2.On the basis of the proposed framework,a gateway data aggregation scheme based on REST API event publishing/subscribing is developed to decouple the blockchain and IoT device,reduce the interaction frequency between the gateway and the blockchain,as well as the performance requirements for IoT devices.
3.We use Raspberry Pi as a gateway to deploy and test the feasibility of the proposed scheme.Experiments show that the proposed scheme can alleviate the problem of limited performance for IoT devices.At the same time,it makes the IoT data credible and meets the diverse data requirements on the chain.
In this section,we first introduce the general framework for intelligent IoT data acquisition based on blockchain in an untrusted environment,and then introduce the gateway-based aggregation scheme.
Figure 1 is a schematic diagram of a general framework for intelligent IoT data acquisition in a blockchain-based untrusted environment.There are five main parts in the architecture: users,data providers,blockchain networks,gateways,and sensor-like devices.
Figure 1.A general framework for intelligent IoT data acquisition and sharing in an untrusted environment based on blockchain.
In the context of the large-scale development of IoT,devices are heterogeneous and diverse.Generally,IoT communication can be divided into two categories:machine-to-human communication and machine-tomachine communication.Machine-to-human communication includes imaging devices,smart phones,switches,etc.The other includes monitoring devices,robotic arms,etc.In order to integrate existing infrastructure at low cost,the interface of the perception layer should support various communication protocols.Therefore,data sources are more extensive and IoT devices can even be automated and controlled through blockchain.
Between the blockchain network and perception layer,we use the EdgeX Foundry IoT edge computing framework to connect to the sensing layer and obtain device data.EdgeX internally uses a message bus based on an event publishing/subscribing mechanism to reduce communication costs between devices and gateways.For blockchain networks,Chainlink[28]is an industry-leading decentralized Oracle network that can provide data sources for the chain while ensuring the decentralization of the blockchain system.It internally uses multiple rounds of encryption/decryption broadcast communication to obtain trusted data from multiparty data sources in an untrusted environment.Farther,eKuiper is a tool embedded SQL-based data analysis that used for publishing/subscribing data from EdgeX Foundry.
Blockchain networks are divided into public chains,consortium chains and private chains.The framework proposed in this paper can be used for the above chains.However,different chains affect the throughput of the system and the cost of contract interaction[29].While consortium chains and private chains do not have cost issues.We will do a detailed analysis in Section IV.
Data providers provide data source services to users on blockchain network,such as topics for device data releases,commands for obtaining device data,etc.They also ensure gateway devices remain up and running.In addition,data providers need to update data sources to meet the needs of users on the chain.
Users are consumer of IoT data.They initiate a request to the blockchain network.The request is implemented according to the message provided by data providers.The data acquisition is completed by calling an aggregation contract status on the chain to realize data sharing.
Chainlink uses decentralized data sources and distributed Oracles to improve the credibility of data.Different from the data request decentralization model defined by Chainlink,in a public chain environment such as Ethereum,due to the data value,some illegal personnel or organizations may destroy the data source by forging data sources.Then,they inject false data into the chain to obtain illegal profits.However,in our proposed scheme,the data sources are some edge devices,such as sensors,monitoring devices,etc.These devices are generally only affected by the environment or the device itself,and under the existing social order management,credibility can be guaranteed.Each gateway runs a Chainlink node as an Oracle,periodically collecting data from sensors,and the sensor sets under each gateway have no intersection,and finally each gateway is connected to the Oracle network as an aggregation contract on the blockchain provide data,as shown in Figure 2.It demonstrates the distributed data acquisition model of our proposed system.
Figure 2.Decentralized distribution mode of data requests at the gateway level.
In order to enable the gateway to respond to requests from the blockchain,we use a unified data structure.The data structure includesType,ruleIdReq,redefine,ruleRedefine.Typerepresents the requested data type,as shown in Figure 3.First,it can bedeviceName.Each device has a metadata configuration file.By determining thedeviceName,the device data aggregation of the unified configuration can be realized.Second,it can bedeviceType.By determining thedevice-Type,the data aggregation of similar devices of different versions can be realized.Third,it also can bedeviceCommand.deviceCommandrepresents the subcommand of each device.For example,the temperature and humidity sensor registers contain information such as temperature,humidity,and baud rate.You can only obtain the temperature of the temperature and humidity sensor by configuringdeviceCommand.
Figure 3.Gateway unified data structure.
ruleIdReqrepresents the basic SQL-based aggregation rules preset by eKuiper in the gateway.
redefinedetermines if custom aggregation rules are required.When the predefined rules in eKuiper cannot meet the user’s needs,the user can configureredefineto true and determine the aggregation functionagg-Funcand window functionwindowFunc,andruleIdReqparameter will be invalid at this time.
aggFuncis an aggregation function.The eKuiper IoT data analysis tool has a series of built-in general aggregation algorithms,and eKuiper supports external custom aggregation methods.Data Providers can implement custom data aggregation by providing their own aggregation methods.
windowFuncis a window function.Similarly,eKuiper has a built-in window function to meet most needs,such as the rolling window TUMBLINGWINDOW(SS,10)to obtain all the data passed in 10 seconds,where SS represents the second unit.
We developed the adapter using the scripting language JavaScript and used Algorithm 1 to parse the data structure.
Figure 4 illustrates the workflow diagram of the proposed platform.Before the framework runs,data providers need to preconfigure gateways to access device,and provide certain data source information to users on the chain.
Figure 4.Decentralized distribution mode of data requests at the gateway level.
2.3.1 Configuration of Bridge and Chainlink Task
The bridge configuration implements the function of accessing external addresses.And the main configuration parameters of the bridge are:bridge name,bridge URL(to accessed external data source address).
Chainlink realize the monitoring of contract events.When the contract on the chain is called and triggers the corresponding event,Chainlink will subscribe to the event and analyze the parameters passed on the chain,triggering the predefined task process.As shown in Figure 5,node subscribes to blockchain network event logs to obtain predefined function parameters and parses parameters to obtain data packets containing data types,aggregation methods,etc.Then,the bridge sends data packets to the outside to obtain data.Next,the node parses data and chooses whether make the data multiple,because decimals are not supported by smart contracts.After that,the node converts data type and specifies a predefined function on the management contract that can change the state of the contract.Finally,the node specifies the address of the management contract and calls the function.
Figure 5.Chainlink node task flow.
2.3.2 The Gateway Regularly Collects Data
EdgeX supports most communication protocols,and all its internal data are published in the Redis Stream data bus.The data uploaded regularly by all devices under the gateway are stored in the bus in real time.eKuiper establishes rules by obtaining parameters from Chainlink nodes,then subscribes to the data bus to obtain data and aggregates.
2.3.3 Information Provided by Data Providers
Data providers need to provide data source information to all nodes on the chain,as shown in Table 1.The data source includesdeviceName,deviceType,deviceCommand.In the built-in data bus of Edgex,the device data is uploaded to the data bus whose subject type is “deviceName/deviceType/deviceCommand”,and “#” as a wildcard can match any value.dataTypes,windowFunc,andaggFuncare used for creating rules.The predefined rule ID is named in the format of “device-Name_aggFunc_deviceCommand”or“device-Type_aggFunc_deviceCommand”.
Table 1.Device administrator provides data source information(example).
2.3.4 Complete Process
Initially,the user calls the contract function on the chain to trigger the event through the data source and configuration information provided by data providers on the chain.Next,the Chainlink node subscribes to the event log on the chain,and uses the process shown in Figure 5 to carry the data packet to access the outside of the node.Then,the adapter parses the parameter data packet and establishes rules to eKuiper in the manner of Algorithm 1.After that,eKuiper aggregates the data back to the adapter and calls the callback function to return it to the node.Also,Chainlink node follows the process shown in Figure 5 to parse the data and returns to the contract on the chain.Finally,the aggregation contract aggregates data from multiple Oracle on gateways to obtain the final value.Users call the aggregation contract to obtain parameters and complete data acquisition.
In this section,we will conduct a deployment test of the proposed scheme.The equipment hardware is a PC Intel(R)Core(TM)i7-7700HQ@2.8GHz 16G RAM laptop with Ubuntu20.04 desktop and a Raspberry pi 4B installed with Ubuntu mate.Since the compilation of the Chainlink node is limited to the CPU of the AMD64 architecture,we start it on the PC.EdgeX Foundry,eKuiper,and the adapter are run on the Raspberry Pi connecting the B-TH-RS30 temperature and humidity sensor,as shown in Figure 6.The sensor uses the standard ModBus-RTU communication protocol to simulate real IoT scenarios.
Figure 6.Sensors and Raspberry Pi.
We use the Goerli Ethereum test network as the blockchain network.The Goerli API node provided by Alchemy use the Hardhat framework and solidity programming language to deploy contracts.All the following implementations are done on the Goerli testnet.A large number of sensors are required to complete a data aggregation.Therefore,we start 10 simulation programs instead of real sensors.
To evaluate our proposed scheme,there are several aspects to be considered.
Maintaining blockchain nodes requires nodes to participate in network consensus,which requires nodes to have matching computing power.We compare the proposed scheme with an Ethereum full node,as shown in Table 2.Ethereum nodes require higher data storage space,wider and more stable network bandwidth.Both the proposed scheme and the Ethereum full node have the same CPU core count requirements.But the normal operation of a full-node Ethereum requires constant interaction with the blockchain network.In our test,a 4-core processor and 8G of RAM are used to run an Ethereum full node.Its CPU and memory usage is always above 50%,while it takes a lot of time to synchronize blockchain information at startup.This is an unnecessary consumption of hardware and time for edge devices especially when the blockchain is already very large.We also considered interacting with light Ethereum nodes.Compared with full nodes,light nodes do not participate in the consensus of the network,and only store the hash tree root of transaction data.It requires only 400MB or more of storage space.However,light nodes have a disadvantage.Reading on chain data depends on full nodes.Obtaining data on the blockchain delays at least two transaction confirmation times,it takes a while.In addition,connecting the full node through HTTP web socket can also achieve the same function as the light node.Therefore,we believe that whether full nodes or light nodes,it is unnecessary to run them on edge devices.
Table 2.The configuration comparison between proposed scheme and Ethereum full node.
Compared with sensor devices running blockchain nodes and directly uploading data to the chain,the proposed scheme based on gateway data aggregation can effectively reduce the interaction frequency between gateway devices and blockchain nodes.Taking the data volume of the medium industrial IoT platform proposed by Wang [16] as a reference,it contains 50 groups of wireless sensor networks,and each network group includes 100 sensor nodes.The amount of transaction data transmitted by a sensor is 120 to 180 bytes,calculated on an average of 150 bytes.If the sensor participates in the maintenance of the blockchain network,each sensor uploads data once,generating 5,000 on-chain transactions per second,and the chain occupies a block size of 750,000 bytes per second(750kByte/s).Assume that the gas required for a basic transaction of an Ethereum account is 21,000.In the EIP-1559 proposal implemented by the upgrade of the Ethereum network in London,gas limit of a single blockchain is set at 15 million gas and no more than 30 million gas.The generation time of each block is about 13 seconds,so that the average peak transaction per second(TPS)is about 54,and the maximum peak TPS is 109.And as the scale of the scene increases,the storage pressure and transaction processing pressure of the blockchain network will also increase.If the blockchain is used as a public general-purpose network platform,it is obviously unreasonable to occupy such a huge network throughput just for data upload-ing.In the above scenario,if the solution based on gateway data aggregation is adopted,and each group of wireless sensor networks is configured with two IoT gateways as data aggregation points,only 100 interactions are required per second,and the chain occupies a block size of 15,000 bytes (15kB/s),saving higher processing power for subsequent on-chain application development.
System throughput may be affected by the blockchain network,which is related to the consensus mechanism of the blockchain network [30].The confirmation speed of transactions on the chain and the block generation speed will affect the data acquisition speed of users on the chain.For example,the confirmed TPS of the Bitcoin system based on Proof of Work(PoW)in the public chain is 7,the TPS of Ethereum is 30 to 40,and the TPS of some consortium chains can reach 1000 or even higher.Suppose there arenOracle machines,andfOracle machines are dishonest.All Oracle machines have the same off-chain equipment performance.Thenandfsatisfy the following relationship
According to on-chain aggregation algorithm based on Byzantine fault tolerance,at least one set of arrays containing 2f+1 data and one set off+1 data arrays need to be received to obtain the data required by the user.In addition,a data request and a data response are included.Therefore,a total of 3f+4 transactions need to be confirmed,and they are divided into four layers in time.
Assume thatGBandGxrepresent the gas limit of each block and the gas cost of each transaction respectively.Therefore,the number of transactionsNBof each block can be expressed as:
Thereby,the maximum throughputE(T) satisfy the following relationship
wheret1denotes the block generation time of the blockchain network,andNpdenotes the transaction pool size.Defineδas the round-up function.The time required to complete a data request is derived from formula(3).
System throughput can also be affected by the performance of IoT devices.We use a solution based on gateway aggregation to sink the computing pressure on the chain to the gateway node,requiring the gateway device to have a certain computing power.The larger the amount of data,the greater the time required for calculation.The delay of the sensor network is within a reasonable range,and the contract call on the chain is affected by the transaction confirmation speed of the blockchain network.The on-chain contract aggregation and off-chain report(OCR)protocol use multiple rounds of communication and consume a certain amount of time.
On-chain aggregation and OCR schemes transfer the attribute of decentralization from on-chain to offchain by sacrificing time for credibility.A perfect Oracle does not exist [25].In our proposed framework,the gateway plays the role of the Oracle,and the sensor acts as the data source.Assuming that gateways are configured in a group of sensor networks,as long as there are no more than dishonest gateways and satisfies the condition of formula (4),the credibility of the data can be determined.
Traditional IoT systems often use realtime data collection to monitor the perception layer.For systems that use blockchain as a hosting network,this will significantly increase the storage requirements and network bandwidth of blockchain maintenance nodes.Currently,there are two problems with the application of public blockchain systems.First,the deployment of blockchain maintenance nodes is limited by the performance of edge device.Second,there is a data trust issue with light nodes or proxy links such as socket.The proposed scheme uses decentralized public chains as the bearer network,which effectively solves the above problems.It extends the feature of decentralization to the off-chain.Data can be automatically collected and aggregated from infrastructure to blockchain,and the reliability of data is ensured by Byzantine fault-tolerant mechanisms.Ethereum test network verification shows that: on the one hand,we decouple blockchain nodes from edge device through REST API,reducing the network bandwidth requirements of edge device by 75%.On the other hand,the data aggregation of the gateway reduces the storage space of edge device by 70%.In addition,the delay in data upload is mainly limited by the block generation speed of the blockchain network,which is generally four times the block generation time of the blockchain.The block generation time of Ethereum test network is about 12-13 seconds.It takes approximately 1 minute to achieve a data upload.However,if a blockchain network with faster block generation speed is used,the completion of a data upload can be shortened to the second level.
Aiming at the limited performance of IoT devices and the credibility of blockchain data sources in the field of blockchain and IoT,we propose a general framework for acquiring IoT data based on blockchain.It uses REST API-based publishing/subscribing mechanism to decouple IoT devices and blockchains to integrate existing infrastructure at low cost.Further,in order to cope with the huge amount of IoT data,a gateway-based data aggregation scheme is proposed.A software interface adapter between Chainlink and eKuiper is defined.The proposed scheme is tested by using the Ethereum Goerli testnet.The experimental results show that the proposed scheme can realize the credibility and sharing of IoT data.At the same time,it effectively reduces the amount of data on the chain and the interaction frequency between IoT devices and the blockchain.In addition,it can also meet the diverse data needs on the chain.
ACKNOWLEDGEMENT
This work was supported by the open research fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology(Nanjing University of Posts and Telecommunications),Ministry of Education (No.JZNY202114),and Postgraduate Research&Practice Innovation Program of Jiangsu Province(No.KYCX21-0734).