A multi-resource scheduling scheme of Kubernetes for IIoT

2022-06-27 00:28ZHULinLIJunjiangLIUZijieandZHANGDengyin

ZHU Lin ,LI Junjiang ,LIU Zijie ,and ZHANG Dengyin,2,*

1.Jiangsu Key Laboratory of Broadband Wireless Communication and Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210003, China; 2.School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

Abstract: With the rapid development of data applications in the scene of Industrial Internet of Things (IIoT),how to schedule resources in IIoT environment has become an urgent problem to be solved.Due to benefit of its strong scalability and compatibility,Kubernetes has been applied to resource scheduling in IIoT scenarios.However,the limited types of resources,the default scheduling scoring strategy,and the lack of delay control module limit its resource scheduling performance.To address these problems,this paper proposes a multi-resource scheduling(MRS) scheme of Kubernetes for IIoT.The MRS scheme dynamically balances resource utilization by taking both requirements of tasks and the current system state into consideration.Furthermore,the experiments demonstrate the effectiveness of the MRS scheme in terms of delay control and resource utilization.

Keywords: Industrial Internet of Things (IIoT),Kubernetes,resource scheduling,time delay.

1.Introduction

In recent years, Industrial Internet of Things (IIoT) has received widespread attention.IIoT focuses on applications such as sensor data collection, device control, and remote monitoring.Most of these applications have high real-time requirements, and different applications have different requirements for different resources, such as CPU, memory, bandwidth, and disk space.Cluster resource scheduling selects appropriate nodes for different application tasks, to maximize the use of cluster resources.It can be used to find suitable computing nodes for different IIoT applications, to make full use of computing resources in the Internet of Things, and reduce costs.

Docker container technology stands out among many virtualization technologies with its features such as simplified deployment, multi-environment support, fast startup, service orchestration, easy migration, and lightweight,providing strong support for IIoT applications.Kubernetes is an open-source docker container management system of Google, which originated from Google’s longstanding Borg system [1].And Kubernetes has been applied to resource scheduling in IIoT scenarios [2].Currently,Kubernetes is widely used, and its scheduling strategy has also been widely studied.The current work can be roughly divided into three categories: containerized application for heterogeneous clusters [3,4], dynamic scheduling of various resources [5−7],task aware scheduling based on task characteristics such as pod tags and task progress [8−10].

However, the current multi-node resource scheduling strategy in cloud computing has not been fully optimized for IIoT scenarios.The scheduling strategy cannot yet meet the multi-resource balance and low-latency requirements in the IIoT scenario.In summary, there are three problems in the above scheduling strategy.First, the above scheduling algorithm does not take into account the resource requirements for specific tasks in the IIoT environment.The Kubernetes default resource scheduling(DRS) algorithm uses CPU and memory resources to schedule computing-intensive tasks.However, IIoT applications may have special requirements for network bandwidth, storage space and other resources.Second, the aforementioned scheduling strategy does not comprehensively consider the resource usage status of the working node and the overall system.In the optimization process,part of the scheduling strategy simply considers the performance and balance of the working nodes and uses the greedy algorithm to select the node with the most abundant resources, resulting in a significant drop in system performance.There are also some scheduling strategies that only consider the performance and balance of the entire system and ignore the performance requirements of the task itself running on the node, resulting in low node operating efficiency.Third, the aforementioned scheduling strategy does not consider the special requirements of IIoT applications for delay.Many IIoT applications require very low processing delay to achieve industriallevel control.This makes pods should be deployed on nodes close to the IIoT production environment to obtain sufficiently low latency.

In order to solve the above-mentioned problem, based on the structure of the Kubernetes scheduler, this article creatively optimizes the two parts of Kubernetes, filtering and the optimal scoring strategy, and designs a general scheduling strategy in the IIoT environment.Firstly, filter out the nodes that meet the communication delay.During the scheduling process, considering the high requirements for delay, when filtering nodes, we set the corresponding labels for each pod, and equip each node with corresponding geographic attributes, and filter nodes according to the delay requirements of pods to ensure that the delay requirements are met.Then, this paper conducts multiresource scheduling for specific resource requirements(including CPU, memory, network bandwidth, disk) in IIoT scenarios.Finally, in the optimal scheduling process,pod resource requirements and system performance status are comprehensively considered.The scheduler performs customized analysis according to the specific resource requirements of each pod and obtains dynamic weights based on the proportion of resource requirements.In the process of obtaining the system dynamic weight, considering that the overall performance will change over time,this paper uses the exponential moving average method to analyze the system weight.

The rest of this paper is organized as follows.In Section 2, Kubernetes scheduling mechanism and DRS are introduced.A multi-resource scheduling scheme(MRS) of Kubernetes for IIoT is proposed in Section 3.The experiment results and analysis are given in Section 4.And Section 5 draws the conclusion.

2.Kubernetes default scheduling scheme

2.1 Kubernetes default resource scheduling mode

The cluster consists of a master node and several working nodes [11−13].The cluster master node is the core node of the entire cluster and all commands and operations for the Kubernetes cluster are executed by it.It is responsible for the scheduling and management of the entire cluster,and is generally an independent server in the cluster.The master node mainly includes Kube-apiserver, Kube-controller, Kube-scheduler and Etcd.Except for the master node in the Kubernetes cluster, the rest of the nodes are called slave nodes.The slave node acts as a real working node, which runs containers for business applications.The components of the worker node mainly include Kubelet, Kube-proxy and container runtime.The overall architecture of Kubernetes [14] is shown in Fig.1.

Fig.1 Kubernetes architecture

The overall scheduling process is below.

(i) First, the user submits a request to create a pod.

(ii) Then, apiserver processes user requests and Etcd stores pod data.

(iii) Schedule checks the new pod through the watch mechanism of the apiserver and tries to bind the node to the pod.

(iv) Filter node.The filter traverses all nodes in the cluster, and filters out a list of nodes that meet the requirements according to specific preselection strategies.If no node meets the preselected policy rules, the pod will be suspended until the required node appears in the cluster.

(v) Preferred node.Based on the preselected node list,the scheduler will score and sort the candidate nodes according to the preferred strategy to obtain the optimal node.

(vi) Select node.The scheduler selects the host with the highest score, performs the binding operation, and stores the result in Etcd.

(vii) The Kubelet executes the pod creation operation based on the scheduling result.

Based on the above analysis, the master node is responsible for cluster management and cluster scheduling.The cluster is mainly composed of several working nodesn1,n2,···,nifor task processing, and each node is equipped with a set of resources (CPU, memory, network,disk ).The scheduler allows the pods to be scheduled to form a workflowp1,p2,···,pjin the scheduling order,waiting for being scheduled, and selecting the correspondingnjsuch that

From related analysis and research, mainstream Kubernetes scheduling strategies are mainly divided into two categories.One is to focus on the resource richness and resource balance of the node itself, and the other is to analyse the performance of the entire system.However,these two types of scheduling strategies are too rough when selecting nodes, and they do not select nodes for the low latency requirements in IIoT.After fully considering the delay limitation after pod deployment, this paper proposes a scheduling strategy that starts from the resource requirements of the pod itself and analyses the node and overall system performance comprehensively.The scheduling strategy is mainly composed of two parts: The default filtering step is optimized in the screening process,and this paper divides and manages the node based on the delay.The optimization process comprehensively considers the resource requirements of the pods to be deployed and the current resource usage of the cluster system, and adaptively weighs the importance of various resources such as CPU, memory, network bandwidth, and disk space resources, so as to make reasonable scheduling computing resources.

2.2 Kubernetes DRS strategy

Kubernetes consists of a master node and several working nodes.The cluster master node is the core node of the entire cluster.All command operations for the Kubernetes cluster are executed by it.It is responsible for the scheduling and management of the entire cluster and is generally an independent server in the cluster.In the Kubernetes cluster, all nodes except the master node are called worker nodes.As the real work node, the slave node is the real node to run the business application’s container.

The Kubernetes scheduler finds a suitable node for the pod to run on the cluster according to the pod request created by the user.The scheduling process is divided into two stages, namely the screening stage and the optimization stage.

The filtering process calls a group of filtering algorithms, traverses all the work nodes in the cluster, filters out the nodes that do not meet the requirements, gets the work nodes that can be run by the task and completes the initial screening.

The optimization process is mainly to score a set of nodes obtained after screening according to the optimization algorithm, and select the worker node with the highest score.Then the worker node binds with the pod and deploys the container to run the pod task.The default optimization algorithm of Kubernetes consists of two parts: least requested priority and balanced resources allocation.

(i) Least requested priority

The algorithm mainly selects the node with the least resource consumption in the cluster.It calculates the CPU and memory consumption of the pods that have been run on each node and the pods to be scheduled respectively to obtain the resource idle rate after scheduling to the node,and expand the idle rate by 10 times as the final score.Finally, taking the arithmetic average of the two scores,the calculation formulae are as follows:

Among them, CputotalNand MemorytotalNrespectively represent the total amount of CPU and memory resources of the candidate node.CputotalPand MemorytotalPrepresent the CPU and memory resource consumption required by the running pod and the pod to be scheduled.If ScoreCPUand ScoreMemare less than 0, return directly 0.

(ii) Balanced resources allocation

The algorithm selects the node with the most balanced resource usage.It calculates the CPU and memory usage rates of the pods that have been run on each node and the pods to be scheduled.Then, take the difference in usage rate and multiply it by 10, and finally subtract the result of the previous step from 10 to obtain the node balance rate between memory and CPU:

Finally, after obtaining the scores of the candidate nodes by the two algorithms, the Kubernetes scheduler takes the weighted average of the two algorithms as the final score of each node:

By default, the weighting factor of the two algorithms is 1, and the node with the highest score is selected as the target node for scheduling the pod.

3.MRS scheme of Kubernetes for IIoT

In this section, we introduce our MRS strategy.This scheduling strategy is based on the Kubernetes scheduling process, creatively improving the fourth step of scheduling filtering and the fifth step of the scoring strategy.Subsection 3.1 introduces the filtering steps based on delay, Subsections 3.2−3.4 introduce the optimal scoring strategy based on MRS for IIoT scenarios.And MRS can be used in cluster scheduling systems in IIoT scenarios.It should be emphasized that this kind of scheduling strategy is not limited to the four types of resources: CPU, memory, network bandwidth, and disk space.In practical applications,there may be a variety of resources that need to be scheduled, and this strategy can be used for scheduling after increasing or decreasing the resource type.

3.1 Delay-based node filtering in IIoT scenarios

Due to the multi-connection feature in the IIoT environment, a large amount of data floods into server nodes.In this case, it is unavailable to rely on a single server to solve a large amount of data in the IIoT environment.Most IIoT products and services are supported by cloud platforms with greater computing power.Cloud platforms usually provide computing, network and storage services based on large-scale hardware resources.Under this situation, when the computing task is submitted to the cloud,the location of the pod is usually not considered, and no special consideration is given to the delay from the worker node to the IIoT work scenario.However, in the IIoT environment, low latency and fast response are often required to help maintain and operate the IIoT system.Under such a background, it is not advisable for a slow system where all data storage, processing and response are placed in the cloud, and the data communication between equipment and work node cannot rely on the remote cloud.

To ensure that the node running the task and the IIoT working environment maintaining a low-latency communication process, we first conduct an experimental analysis on the response delay between points by https://ping.chinaz.com.Here we first select 102 servers across China to perform response delay analysis on three different nodes, namely nodeNJ,nodeSH, and nodeSZ.Among them,nodeNJis located in Nanjing, Jiangsu, nodeSHis located in Shanghai, and nodeSZis located in Shenzhen, Guangdong.The response delay thermal diagram obtained from the experiment is shown in Fig.2.

Fig.2 Response delay thermal diagram

As shown in Fig.2, 1−20 ofX-axis represents 20 provinces from China, each province selects randomly distributed servers, a total of 102 nodes.There are three different nodes on theY-axis from bottom to top, namely nodeNJ,nodeSH, and nodeSZ.The depth of the color of each grid in the figure represents the size of the delay,and the darker the color, the greater the delay.We use these three nodes as the object to conduct communication response delay comparison experiments and analyze them.It can be seen from Fig.2 that the response delay of each province is often very similar for fixed communication nodes, and the color block in a range, excluding individual outstanding nodes, is often caused by error.Vertically, the delay from three nodes to the same server is related to the distance between administrative units.As shown in the figure, nodeNJhas the lightest grid color and the smallest delay at 13 which corresponds to the server in Jiangsu.The grid color of nodeSHis the lightest at 5 which corresponds to the server in Shanghai.The grid color of nodeSZis the lightest at 18 which corresponds to the server in Guangdong.For fixed nodes, the communication response delay is usually proportional to the distance between administrative units.In general,the closer the administrative units are, the lower the delay is.

According to the above experimental results of response delay, the following response delay aggregation diagram is obtained as shown in Fig.3.

TheX-axis of Fig.3 is the same as that of Fig.2, both representing 102 servers from 20 provinces in China.TheY-axis represents the communication response delay of the corresponding server to the experimental node.It can be concluded from Fig.3 that, excluding individual points, the response delays of nodes in the same province adjacent to the subject are often highly aggregated and are in a numerical range.

Fig.3 Response delay aggregation diagram

It can be seen from Fig.2 and Fig.3 that, for the same node, the response delay is usually aggregated according to the physical location, and the response delay of different places in the same province often exists in the same interval.According to this principle, we can use the region label to divide and filter each work node based on the delay, so as to achieve the effect of controlling the delay to adapt to the IIoT environment.Kubernetes requires to be adjusted and optimized.According to this feature of IIoT, we set the corresponding regional label and reasonable delay for the pod, set the region attribute for each node, and filter out the nodes whose response time is out of bounds according to the delay requirements of the pod.The subsequent scheduling is performed in the node group that meets the delay standard.In this paper, label is used for identification and attached to two types of resource objects, node and pod, to achieve the function of controlling response delay.

First, users can set the region label and delay requirements according to their needs when submitting a pod.The cluster manager sets the corresponding region attribute when submitting work nodes in batches.

Secondly, in the scheduling filtering stage, the corresponding nodes are divided into areas according to the custom delay requirements of the pod, and routine filtering steps are performed.By binding a region label to all the pods and nodes, the regional grouping management of nodes is realized, to facilitate flexible and convenient scheduling management.The effect comparison charts of the improved strategy and the original strategy are shown in Fig.4 and Fig.5.

Fig.4 Kubernetes default scheduling scheme for node filtering

Fig.5 The proposed scheduling scheme for node filtering

3.2 Scoring strategy based on pod resource requirements

In the study of scheduling strategies, there have been studies on the resource requirements of the pod itself[15−17].When this strategy [9] processes jobs, it simply divides jobs into three categories: data-intensive jobs, realtime jobs, and other jobs.However, in the face of more complex tasks in IIoT scenarios, it is not enough to simply divide pod workflows into three categories.Therefore,in the optimization process of the scheduling strategy[18,19], the improved resource scheduling strategy starts from the pod itself and analyzes the resource requirements of the pod.It does not simply categorize jobs, but scores each node based on the tendency of pod demand.If the pod uses more resources of a certain type, then when selecting nodes, more consideration should be given to the uniform distribution of such resources.

Based on the above analysis, firstly we need to obtain the corresponding resource demand tendency according to the resource demand of the pod.The first step is to compare the weight of pod resource requirements in the IIoT environment.

whereCp,Mp,Np,Dprepresent the demand of pod for CPU, memory, network bandwidth and disk respectively;Cs,Ms,Ns,Dsrefer to the reference standard requirements of CPU, memory, network bandwidth and disk respectively.

The second step is to normalize each coefficient after obtaining the weight ratio of pod demand.

whereNpis the normalized denominator of the pod resource;wC,wM,wN,wDare the normalized coefficients of the corresponding various resources, so as to obtain the normalized weight coefficient vector of the pod resource demand:

In the third step, the score of this part can be obtained according to the normalized weight coefficient vector of the pod resource demand and the remaining resource occupancy rate of each node:

Among them,R(C),R(M),R(N),R(D) respectively represent the remaining usable amount of CPU, memory,network bandwidth, and disk in each node.S(C),S(M),S(N),S(D) respectively represent the total amount of CPU, memory, network bandwidth, and disk resources in each node, scorepodis the score of each node obtained from the pod resource requirements.

3.3 Scoring strategy based on system performance

The second part of the scoring criteria for each node is determined by the overall performance of the system, and the nodes are scored according to the use of system resources to measure the overall system resource balance.It is not difficult to understand that if the overall occupancy of a certain type of resource in the current system is very high, then more consideration should be given to the balanced allocation of such resources when doing resource scheduling.

The first step is to analyze the resources of the entire system and find the resource weight,

whereCo,Mo,No,Dorespectively represent the average total occupancy of CPU, memory, network bandwidth,and disk in the cluster system;CA,MA,NA,DArepresent the cluster system of CPU of memory, network bandwidth, total disk resources.

It needs to be specifically explained thatCo,Mo,No,Doare not the current occupancy of various resources,but an exponentially weighted moving average of the historical occupancy of various resources and the current occupancy.Take the CPU resource usage as an example:

Then, these coefficients are normalized to obtain the normalized weight coefficient variables of the system resources.

In the third step, considering the balance of system performance, the resource with the larger resource occupancy rate should be correspondingly lowered when performing the proportion analysis.Therefore, each node is scored according to the system resource weight coefficient.

3.4 Comprehensive scoring results

Finally, we combine the pod resource requirements and the overall performance of the system to comprehensively score each node.

Among them, scorepodand scoresystemrepresent the score based on pod resource requirements and the node score based on system performance.

4.Experiment

To verify the proposed container scheduling algorithm,this paper simulates based on the open-source container cloudsim [20,21] cloud computing simulation framework of the University of Melbourne for simulation and nodesim.For the sake of comparison, the indicator cluster resource imbalance is experimented with cloudsim, but for delay research, nodesim is used to do a delay experiment.The framework is used to simulate a Kubernetes edge cloud with 50 nodes.The resource information of each node is shown in Table 1.

Table 1 Configurations of each node resource

Taking into account the diversity of needs in actual use,this paper constructs five types of resource requirements,which are respectively biased towards CPU usage, memory usage, network bandwidth usage, storage usage, and balanced use of various resources.These five resource demand types correspond to five types of pods.

In terms of evaluation indicators, we select three indicators: response delay, cluster resource imbalance, and reasonable rate of schedule to evaluate the experimental results.

(i) Response delay

The response delay here refers to the time required for a message or packet to be transmitted from the IIoT production environment to the cloud scheduling work node.

(ii) Cluster resource imbalance

For each node in the cluster, the standard deviation of various resource utilization rates can reflect the resource balance of the node.If there areNdifferent working nodes in the cluster, and each working node hasmkinds of resources, thenU(i,r) represents the utilization rate of the resourcerof nodei, and the standard deviation of the resource utilization rate of nodeiis expressed as

whereR={r1,r2,···,rn} represents a collection of all resourses.

Define the cluster resource imbalance as

whereN={n1,n2,···,nm} represents all nodes in the cluster.

The smaller the value of stdAVE, the more balanced the utilization of various resources in the cluster, and the smaller the probability of using a single resource.As a result, more containers can be deployed in a cluster of the same size, which improves the utilization efficiency of the cluster.

(iii) Reasonable rate of scheduling

After a pod is scheduled to nodei, if the various resources of the nodeihave not reached the upper limit of use, the scheduling is considered reasonable.The reasonable scheduling rate is the ratio of the reasonable scheduling to the total scheduling requests.If the scheduling strategy can achieve reasonable scheduling every time,the scheduling reasonable rate is 1.

Based on the diversity and differentiation of application resource requirements in IIoT scenarios, this paper constructs pod resource requirements according to different resource-oriented applications and analyzes the response delay, cluster resource imbalance and reasonable rate of scheduling under different scheduling strategies.

Fig.6 shows the comparison of response delays of ten random pods after scheduling.Participating in the comparison is the proposed region label scheduling and the default method.The former sets the corresponding region label and reasonable delay for the pod and sets the region attribute for each node.The former scheduling strategy filters out nodes whose response time is out of bounds according to the delay requirements of the pod and performs subsequent scheduling in the node group that meets the delay standard.The latter uses the default filtering strategy.It can be clearly seen from the figure below that the region-based scheduling strategy effectively shortens the response delay.The latter does not consider the particularity of IIoT in the scheduling process.The former is optimized on the basis of the latter.The improved scheduling strategy is based on the limitation of time delay and effectively selects the working nodes, which greatly reduces the response time delay.

Fig.6 Response delay aggregation diagram

Table 2 shows the situation of cluster resource scheduling when there are 2000 pods and 10000 pods to be scheduled.The Kubernetes default scheduling method, the ant colony algorithm [22], and our proposed algorithm are compared.The ant colony algorithm is currently one of the algorithms that effectively schedule cluster resources.In order to verify the improvement effect of the MRS scheduling strategy on the resource imbalance and the reasonable scheduling rate, this paper adopts a comparative experiment to reflect the performance of the MRS strategy.Following the control variable principle of the comparative experiment, the first group of experiments uses the DRS, the second group of experiments uses the ant colony algorithm, and the third group of experiments uses the custom MRS strategy proposed in this article.After scheduling, cloudsim will return the scheduling result to the scheduler.Before the start of the three groups of experiments, the initial state of the available resource settings at the working point is completely consistent, and the scheduling performance of the three groups of scheduling strategies is analyzed according to the results obtained from the scheduling.As can be seen from the table below, when the number of pods is small,the difference between the three methods is not large.However, when a lot of pods need to be scheduled, the gap between the two improved methods and the default method begins to widen.Our proposed method is better than the Kubernetes default scheduling method, and the scheduling effect is similar to the ant colony algorithm,but the complexity is obviously lower than the ant colony algorithm.This shows that our method has a positive effect on the dynamic scoring of pod resource requirements and system performance status.Especially when the cluster faces some load pressure, it can get better scheduling efficiency.

Table 2 Cluster resource scheduling under different number of pods

In order to better reflect the dynamic change process of cluster scheduling efficiency, we have drawn the changes in the imbalance of cluster resources of the Kubernetes default scheduling algorithm and our method when different numbers of pods are to be scheduled, as shown in Fig.7.

Fig.7 Cluster resource imbalance between our method and the Kubernetes default method

From Fig.7, we can see that, on the whole, the imbalance degree of cluster resources of the two methods will increase with the increase of load.When the number of pods is less than 5 000, the cluster load is low.The difference in resource imbalance between MRS and DRS is not large, and the difference may even be slightly higher than the resource imbalance of DRS under certain specific requirements.However, when the number of pods that need to be scheduled increases to 10 000, the gap between the improved method and the default method began to widen.As the number of pods slowly increases, the load pressure of each node increases, and the weights obtained by MRS through analysis of the resource conditions begin to take effect.The resource imbalance of MRS is significantly better than that of Kubernetes DRS.Since MRS considers the weight relationship of the four types of resources in the IIoT scenario, it reduces the possibility that a certain type of resource in the cluster will be exhausted and a large amount of other resources will remain, so that the overall resource imbalance of the cluster is greatly reduced.This shows that the scoring method we propose can dynamically adjust the importance of various resources, thereby making resource allocation more even.The default scheduling algorithm has a fixed weight, and only two kinds of resources, CPU and memory, can be used as scheduling references.In the case of high-load scheduling, it is difficult to balance various resources.The scheduling strategy we propose can consider the importance of CPU, memory, bandwidth and storage comprehensively.This strategy also comprehensively considers the allocation of multiple resources, making the allocation result more balanced.

5.Conclusions

This paper proposes a new resource scheduling strategy for the resource demand under specific conditions of IIoT.Compared with the Kubernetes default scheduling algorithm, we consider the delay requirements in the IIoT scenario and more resource types, including CPU, memory,network bandwidth, and disk storage space.In order to balance the importance of various resources, we comprehensively consider the demand of pod for various resources, as well as the overall resource utilization of the cluster, and dynamically schedule the resources.In practice,if a new business is encountered, more kinds of resources need to be considered.This method can still be applied.Experimental results show that the proposed method can schedule cluster resources more evenly.