A Privacy-Based SLA Violation Detection Model for the Security of Cloud Computing

2017-04-09 05:53:06ShengliZhouLifaWuCanghongJin
China Communications 2017年9期

Shengli Zhou*, Lifa Wu Canghong Jin

1 Army Engineering University, Nanjing 210007, China

2 Information Department of Zhejiang Police College, Hangzhou 310000, China

3 Zhejiang University City College, Hangzhou 310000, China

* The corresponding author, email: 76933768@qq.com

I. INTRODUCTION

Cloud computing is a computing model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks,servers, storage, applications, and other services) that can be rapidly provisioned and released with minimal management effort or cloud service provider (CSP) interaction[1].An increasing amount of work is migrating from the desktop environment to the cloud[2].However, current security incidents, such as,the leaking of information on 5 million Gmail accounts and iCloud vulnerability, which has led to the disclosure of the personal data of celebrities and others, poses an ever present threat to user privacy. By exploiting system vulnerabilities and elevating their own user privileges, hackers can make unauthorized actions, for instance, reading other user data[3].From this perspective, it is also particularly important to prevent privileged administrators in a cloud environment from carrying out illegal actions. One way to implement this would be for the cloud service provider to sign a service level agreement (SLA) to identify the privacy requirement from SC. However,a traditional SLA is mostly enacted for the purpose of ensuring Quality of Service (QoS)and is seldom aimed at protecting the user’s privacy and SLA management model, even if they have a security strategy on the basis of character matching. In this paper, a SLA-based Privacy Protection Model (SLA-BPPM)is devised for the maintenance of the user’s privacy in a cloud-computing environment.This is based on a Markov decision-making process, which is introduced for modeling according to the users’ privacy requirements in the SLA, which can result in the permission of CSP operation. Additionally, the operation of the CSP is specifically recognized as a threat to SC when it operates outside the expected range. This model is unique among existing SLA based models in that it is designed to prevent the CSP from violating the users’ privacy.

The remainder of this paper is organized as follows. Section 2 describes related work.Section 3 introduces the background and technology of Markov Decision Processes. Section 4 introduces the design of the SLA-BPPM.Section 5 analyzes the implementation of the model. Section 6 describes two experiments,which facilitate testing and analysis on the model. Finally, the last section concludes the paper and proposes further work.

The authors propose a privacy-based SLA violation detection model for cloud computing based on Markov decision process theory to supervise a cloud service provider (CSP)directly.

II. RELATED WOR K

For cloud computing, most research on SLAs focuses on QoS. In situations where QoS is violated, the CSP will be penalized[4]. In [5], the authors transferred the concept of a Web Service Level Agreement (WSLA) to cloud computing and designed a service measurement module, a service state estimation module and a service management module. Meanwhile,this agreement has been adapted for the particular conditions of cloud computing. With the design of these three modules, the QoS and level of security can be measured and service management can also be carried out, based on the service status and conditions under which the SLA is violated. In [6], the authors proposed a way to measure concerns about the security of QoS in cloud computing, in which users can judge the reliability of a CSP and choose a suitable one if their requirements are not met.

In [7], the authors provided a comprehensive introduction to the field of Cloud monitoring. Several critical issues, such as the motivations for taking monitoring strategies in Cloud computing and some major properties of the monitoring system were discussed in this article. Moreover, this paper also reviewed some commercial and open-source Cloud Monitoring systems. In [8], a SLA Violation detecting infrastructure (DeSVi) was proposed to monitor SLA violations. This infrastructure was responsible for allocating resources for a requested service. At the same time, the resources are monitored by a new framework that is able to map the low-level resource metrics to user-defined SLAs. Moreover, this model also detected possible SLA violations through knowledge databases. In addition,the optimal configuration of SLA parameters in this model were investigated in stress tests generated by two applications: (1) image rendering applications based on ray-tracing and(2) transactional web applications based on the well-known TPC-W benchmark. In [9],the authors developed a set of related SLA indicators based on the pattern of three major service formats of cloud computing, namely:Software as a Service (SaaS), platform as a Service (PaaS), and infrastructure as a Service(IaaS). These indicators can be monitored while the CSP is providing services. In [10][11], the authors undertook research into setting up a framework, known as LoM2HiS, that could report the supervising indicators of fundamental infrastructures to SLA clauses. This research also investigated the optimization of time intervals for supervision. In additional,they also established a framework called DeSVi, which could detect situations when the SLA is violated by the CSP. This framework allocated and deployed system resources for a virtual machine by utilizing LoM2HiS. It also implemented the function of automatic status monitoring where the SLA was deemed to be violated by the CSP when key indicators crossed a predefined threshold.

In [12], the authors devised a model that could automatically match with SLA parameters, and the CSP could also be selected automatically based on the content demanded by the SLA. In addition, this model enables the user to build learning templates according to their own requirements and set up SLA param-eters that can be recognized easily. One weakness of above models is that the SC’s benefits are affected even if the CSP is penalized subsequent to any violation of the requirements of the agreement. In [13], the authors formulated a more rigorous threshold value according to the indicators established in the SLA. This value is even more rigorous than the real threshold value. In this way, the CSP can be regulated before they go beyond the threshold.However, this strategy presents another problem, which is that users may become dissatisfied with the service due to the regularity of apparent violations. The user may then leave the service so undermining the purpose of the SLA.

In [14], the authors presented an overview of SLA monitoring and made a comprehensive analysis of it in the context of trust maintenance in cloud computing. In [15], the authors devised a model that can update the trustworthiness of peer clients and statistically analyze data on providers without any intervention from a central entity. The prices in later transactions are negotiated based on the level of trustworthiness displayed during previous transactions. The model is also embedded into the Service Level Agreement negotiation and enforcement process, thus giving priority to trusted clients to minimize the consequences of Low Quality of Service. In [16], the authors introduced a mechanism to calculate a service provider’s trustworthiness, based on their compliance to the promised SLA parameters.This model was simulated in a MATLAB program to prove its applicability in a cloud environment. Other articles have also made useful contributions in SLA violations detection.Paper [17] proposed a violations detection system which implemented a re-negotiation process for SLAs to limit the over provisioning of resources and thus achieved an optimal usage of resources. Paper [18] proposed a service-oriented monitoring system called Grid Eye, which has an extensible architecture and enables the prediction of overall resource usage. Paper [19] gave a general survey of the current state of SLA management in Cloud computing and introduces some performance measurement models for SLA, giving their advantages and limitations. This was useful for obtaining an overview of the performance of existing measurement models.

In [20], the authors designed an automatic negotiation model for SLAs, which formulated a sound SLA based on a set of quality parameters for the CSP to meet, as well as the requirements of users. This aimed to maximize the interests of users and help them to make good decisions in choosing the right CSP for them. Our paper improves the posterior test for SLA’s management by taking the concept of above model, and thus conducts the protection of users’ privacy by means of an SLA.In the meantime, the model of users’ privacy requirements in SLA has also been built and a set of operations that conform to users’ requirements have been generated accordingly.In addition, this model attempts to anticipate the intentions of the CSP in order to permit operations that are beneficial for users and prohibit the operations that will violate users’privacy. Furthermore, the problem of excessive resource consumption associated with traditional encryption based privacy protection mechanisms is avoided in our model, as is the risk of arbitrary operations by administrators.

III. TECHNOLOGICAL BACKGROUND

IV. MODEL DESIGN

A SLA forms a bridge between the CSP and users. Three elements of particular importance in its management process are: parameter definition, reporting, and penalties for violating the rules. A SLA-BPPM in combination with SLAs has achieved the data and privacy protection target that can satisfy the demands of users. The major function modules in this model are the SLA identification module, optimized modeling module, violation detection module and the CSP behavior-supervising module. An additional database, which stores the possible action set of the CSP is also included in this model, the structure of which is depicted in figure 1.

SLA parameter identification: This module parses the SLA files to obtain the privacy requirements of the user. It then converts them into a form, which is suitable for the modeling phase.

Optimized modeling: This is a core function module, which contains all possible actions by the CSP. Using MDP, a users’ privacy requirements model is established and an action set is generated where every state is associated with an action and a reward is also linked with the execution of each possible action. When the terminal state is reached, the model finds the action set which earns the highest reward.This becomes the optimized action set that is allowed by the privacy requirements.

CSP behavior monitor: The actions of the CSP are monitored and collected in real-time while they are providing services. The CSP’s actions are recorded and delivered to the violation detection module.

Violation detection: The CSPs behavior is compared against the optimized action set.The CSP is judged to be in violation of the users’ privacy requirements if their real-time action conflicts with optimized action set. In this case, they will be penalized accordingly.

CSP action set: This contains the information needed for modeling. It stores all the operations that the CSP might take, the operations of administrators and users, which may be authorized or unauthorized, are all included in this set.

Fig. 1 SLA-BPPM functional framework

Fig. 2 Metric type expansion

Fig. 3 Metric definition example

Fig. 4 Example of privacy-based SLA file definition

The SLA-BPPM should operate as a third party, independent from users and the CSP,although the cooperation of CSP is needed to gain access to the actions of the CSP. In order to obtain more reliable services, the SLA model could be deployed by the CSP and managed by specialized personnel. Additionally, it is possible to observe the daily behavior of the CSP, which provides a way to ensure power separation. The detailed implementation of our model will be presented in the following section.

V. MODEL IMPLEMENTATION

The human-machine interaction is implemented using an XML file as commonly applied in web services. This format is also adopted in our paper, but leaves out the negotiation procedure between the users and the CSP.

In our model, the implementation of the module “CSP behavior monitor” and “CSP actions set” requires the support of the CSP. We can get the set by employing some social engineering measures or from the related set and operation feedback interfaces provided by the CSP. The following section of this paper will describe the technological details of the three other modules.

5.1 Parameter identification

In web services, SLA documents are generally written using WSLA, as developed by IBM.This document defines the basic format and related parameters that ensure the appropriate QoS features. Our paper uses the format of this document and makes some extensions to incorporate protocols to implement privacy and security involving users and the CSP. In cloud computing, the major threats to users are: leakage of data, incomplete deletion of data, failure to isolate user data from others,illegal transferring of data, and the mining of data [11]. Focusing on the above security threats, we expand Metric and add some types for protecting users’ privacy as shown in figure 2.

Users base their negotiation for the SLA on this definition and the metric parameters for the SLA can be also defined here. As shown in figure 3, the example calls this Metric “PrivacyPromises,” and shows that the CSP, which provides these services is “CSPProvider” and the pledged promises are “NoLeakage” and“PermanentDelete.”

A complete document is shown in figure 4, which defines the providers, customers and optional third party services. The parameter“sponsor” defines the object with optional third party supports. The supporter of services introduced in this model is SLA_BPPM mod-el, which enables the evaluation of services provided by the CSP. In this paper, privacy protection is seen as a service, which conforms to the service oriented mechanism. The metric’s definition is cited in the parameter of services, which indicates the standard by which to measure services. The parameter “source”refers to the provider of the parameter, which is also the provider of services. “Pull” represents the receiving party of this parameter,which is the optional third party.

5.2 Optimized modeling

After defining the privacy requirements of users in the SLA, the model SLA-BPPM attempts to build an environmental model for the operations that will be provided to users by the CSP. This model defines a set of stateswhich denotes the situation after an action operation was taken by the CSP.represents the initialized state that started before any operation has been executed on the user data. An operations setdenotes the possible operations that the CSP might take on the user data. When the user data is in state si, an operationis carried out by the CSP which causes the data to shift to sj. The set of operations are executed sequentially by the CSP, where there is a particular probability that the CSP chooses an operationto move the system into the next state. In other words, the CSP’s operation on the user data is represented as a random process. The data state sjis determined solely by siand operationwith the state before sjhaving no effect. Out of the above characteristics, we choose MDP to model the operations of the CSP, in order to get a decision that will satisfy the users’ privacy requirements. Based on this decision, we inspect the CSP’s operations on the user data to see if any violation has taken place, and regulate the CSP’s behavior in order to protect the users’ privacy.

With the adoption of the set of states S and action set A, we define a MDPAll symbols are the same as those described in section 2. As shown in figure 5, the circular ring denotes the state of the user data, and the arrow represents the shifting of the state after the implementation of operation a with probability p and reward r. Both the operations set and the state set in the model are discrete and finite sets, so the MDP shown in figure 5 should be regarded as a discrete MDP. In the interests of clarity, the intermediate states and operations, as well as the related subscripts,are omitted.

To calculate the optimized decision V, not only the already determined parameters S and A, but also the parameter P and reward function R are needed. P denotes the probability of state s transitioning due to the triggering of operation a, which is also the probability of an operation that will be taken to user data by the CSP. For every state si, the model SLA-BPPM records the number of times sishifts to sj, after operationand is denoted asOn the basis ofthe probability can be calculated with the formula

In this model, the reward function indicates how well the CSP’s operations conform to the specific requirements of the user data privacy.It is a variable, which is highly relevant to users and can be determined by users’ feedback on the CSP’s operations.

Fig. 5 Environment modeling

We define a quintuple privacya_i(si,sj) =, corresponding to the 5 types of privacy requirements that were introduced in section 5.1. After users’ have completed a set of transactions through the CSP, feedback is generated on the level of compliance for the 5 types of privacy requirement. The feedback ranges between[-2, 2]. For example, if a user gives feedbackthen all related components are updated with the formula: privacyai(si,sj).noleak = privacyai(si,sj).noleak + feed.noleak.In addition, the related components of the reward function can be calculated with the formula

The calculated reward function is a proportion in global condition, which represents the impact that the operation has been taken by comparing with other operations.

5.3 Violation detection

In practice, the CSP may choose strategyin preference to carrying out operationThere is a risk that the users’ privacy could be violated by the chosen operation.When defining the set of optimal strategiesand operations setthree situations are possible, namelyandmeans that the CSP has seriously violated the users’ privacy requirement and in this situation, the CSP can be regarded as unreliable. Ifthe CSP has performed some additional operations on the user data which indicates some risk of undermining the users’ privacy. Lastly,indicates that both the requested operations and the users’ privacy protection have all been accomplished. The model in this paper is able to decide whether the CSP has jeopardized the users’ privacy with a probability, which is calculated objectively. Therefore, in practice, the results should be repeatable.

The user behavior based CSP reliability analysis can be conducted according to state transfer of Markov’s model. The analysis process is designed as follows. First, the existing CSP operations are statistically evaluated, including the classification of CSP’s operations and the number of operations of each category.At the same time, for each type of operation in different scenario, we estimate probability of each specific operation that is generated afterward. The Markov transfer matrix can be used calculate the next step, and since the Markov method has no subsequent validity, the state of operation in step n only relates to the operation in the n-1 step. The expression is as follow:

In the expression, N identifies the number of behaviors, P denotes that in N-1 step, the ratio of transfer from operation j to operation i, V means the abnormal operation from step n-1 to step n, such as exception interruption,etc. And V is equal to zero when no exception value is put into consideration.

Therefore, the number of actions in a certain step is determined by the number of the previous step and the respective matrix of both the steps. The matrix is as follows:

The establishment of transfer matrix is the core part of the entire model, and the build process can be divided into two steps. First of all, users’ operation log is collected and statistically analyzed in order to get the correlation of users’ accounts, time, and and operation types, and bayesian conditional probability model is used to get the transfer probability of each pair of steps. Secondly, due to the uncertainty of users’ behavior, some random factors are placed in different steps, making the matrix partially different each time.

VI. EXPERIMENT AND ANALYSIS

In the experiment, the simulation part is conducted on CloudSim, a cloud computing simulation software developed by the university of Melbourne, which supports the modeling and simulation of large-scale cloud computing infrastructure. The simulation layer of Cloudsim provides support for the configuration and simulation of virtual data center, including virtual machine, memory, storage capacity and bandwidth interface, which can be used to study on the allocating strategy from host machine to virtual machine, and the strategy can be implemented by extending some core virtual machine scheduling functions.

The experiment is conducted mainly on the recognition of the operations that undermine user privacy, and recognition accuracy is also tested. At them same time, the experiment takes simulation test only on the recognition function and ignored the cloud platform’s influence of the consumption of performance and resource . Since there is no similar publication on the related model of SLA, our experiment result cannot be compared with other related models.

6.1 Running environment building

This experiment is carried out based on the operation log of CSP of a cloud platform employed by a domestic Internet company. Since the secrecy and sparsity of the company’s business, some part of the analysis is simulated and amplified. Using a certain government cloud system, we record 6600 user data items,and tasks are divided into user full-text retrieval is named as FullSearch, the user path search, key staff query Important Person Query, key region query , equipment factory query are named as Path Search, Important Person Query, Key Areas Query and Manu Factory Query, respectively.

Each row and column in the matrix corresponds to one of the five behaviors listed in table 1, and the building results of Markov matrix on associated behavior of average administrator and senior administrator, which is as follows.

From the two matrices shown above, there is an evident difference of transfer probability between different roles, for CSP’s super administrator, the behavior correlation is higher than that of the average user. The probabilities of transferring between different operations are also different, for instance the probabili-ty of a path query and a key person query is higher than that occurring between other operations.

The 6,600 data items were evaluated by third party business experts, with 193 problems in total proposed, labeled as 0 and 1 respectively. Based on this actual labeled data,a total of 1 million records from 10000 users,with an average of 100 operations per user,are collected using simulation devices. We properly adjust the data and magnify the proportion of abnormal behavior, thus 95% normal operations and 5% abnormal operations are produced. As shown in chapter 5.1, users violation actions are divided into five types.Since the deletion operation is not contained in users’ operations set, so we take four types of violations as the classification, as shown in table 2. Therefore, for the people that have one of the abnormal behavior, we give them an associated role, and some actions can be shared by multiple roles.

6.2 Model prediction

Fig. 6 Convergence of the model

Table II Ratio of illegal behavior

This experiment mainly conducted two kinds of experiments on the given simulated data.According to the actual operation procedure of ali yun, the order of operation is randomly sorted after the similarity comparison, the details are listed as follows:

(1) Extract and classify the relevant aliyun CSP operation and users

(2) Sort each type of users’ operations based on time.

(3) Attain the operating mode from the training in the manually labeled scene.

(4) Set a scene label based on the pattern recognition result of the scene.

This completes the collection and analysis of the user’s normal behavior, while the user’s exception behavior is inserted randomly by the program, which is conducted based on the behavior pattern listed in 5.1.

Experiment I:This is used to inspect the convergence speed of the model and determine the number of user operations that are needed before the model results stabilize. Assuming that the transaction probability before the implementation of a certain operation and the transaction probability calculated after the operation is P1 and P2 respectively,the vertical axis of a graphical representation will be,and the horizontal axis is the number of transactions completed using the CSP. In addition, the value ofwill be gradually reduced with an increasing number of transactions, which will approach zero asymptotically. The lower transaction times are,the faster the convergence speed of the model.The experimental process is described in figure 6.

Experiment II: check the accuracy of the reward function. When the system stabilizes,the user trades with the CSP, setting a privacy target in the transaction, such as information that is not compromised. CSP in this experiment can choose m operations, among them there are n operations that must be completed in the transaction , and the other m - n ones is not the necessary operations (in order to better denote dangerous actions), and a possible leak of user privacy may exist in these unnecessary operations. Then we calculate the score of the reward function and determine whether the ac-tion is illegal, on the given threshold value of 0.3. By comparing the result with the violation data in simulation, we obtain the relevant coverage, accuracy and F value.

According to the random sampling method,1000 related users and their data are obtained from one million records, and a test collection is produced according to the regular behaviors and labeled violation behaviors. Take 10 relevant tests, and get the average value average of accuracy and coverage, we get the F value calculated, as shown in table 3.

Illustrated by table 3, the SLA-BPPM model has the ability to predict different types of violations, With highest accuracy for the prediction of the violation behavior existed in data mining and best comprehensive performance of the F value, thus means that the model has better real world performance.

VII. CONCLUSION

The assessment of CSP’s service quality is the hot spot in the research of the cloud computing platform. In this paper, SLA-BPPM is shown to be able to supervise the operation of CSPs by utilizing a Markov decision-making process to detect whether a CSP has violated the users’ privacy within a certain range of probability. Meanwhile, the CSP’s operations could be moderated using the model so as to avoid the violation of uses’ privacy. And the experiment results indicate that SLA-BPPM can better recognize the violation behavior and possess good practicality.

One limitation of this model is that it requires the cooperation of the CSP for its operation and the users’ role setting also needs to be determined aforehand. In future work, the modeling process could be carried out with a hidden Markov model, which may reduce the dependency on the CSP and suffice more general running state of cloud computing.

This work was supported in part by National Natural Science Foundation of China (NSFC)under Grant U1509219 and 2017YFB0802900.

Table III The average accuracy, coverage and F value of all scenes

Reference

[1] X D WANG et al., “Research on Security of Virtualization on Cloud Computing,” Telecommunications Science, vol. 31, no. 6, 2015, pp. 8-25.

[2] LIU M H et al., “Research on Sensitive Data Protection Technology on Cloud Computing,” Telecommunications Science, vol. 30, no. 11, 2014,pp. 2-8.

[3] Feng DG et al., “Study on cloud computing security,” Journal of Software, vol. 22, no. 1, 2011,pp. 71-83.

[4] M. Alhamad et al., “SLA-Based Trust Model for Cloud Computing,” Proc. 2010 13th International Conference on Network-Based Information Systems, 2010, pp. 321-324.

[5] Patel, P. et al., “Service Level Agreement in Cloud Computing.” Proc. Cloud Workshops at OOPSLA, 2009, pp.55-58.

[6] Zhang, J. et al., “a petri-net based specification model for web services,” Proc. IEEE International Conference on Web Services, 2004, pp. 420–427.

[7] Giuseppe Aceto et al., “Cloud monitoring: A survey,” Computer Networks, vol. 57, no. 5, 2013,pp. 2093-2115.

[8] Vincent C Emeakaroha et al.,” Towards autonomic detection of SLA violations in Cloud infrastructures,” Future Generation Computer Systems, vol. 28, no. 5, 2012, pp. 1017-1029.

[9] Alhamad, M. et al., “Conceptual SLA framework for cloud computing,” Proc. 2010 4th IEEE International Conference on Digital Ecosystems and Technologies (DEST), 2010, pp. 606–610.

[10] Emeakaroha, V.C. et al., “An architecture for detecting SLA violations in cloud computing infrastructures,” Proc. Proceedings of the 2nd International ICST Conference on Cloud Computing,2010, pp.45-48.

[11] Emeakaroha et al., “Towards autonomic detection of sla violations in cloud infrastructures,”Future Generation Computer Systems, vol. 228,no. 7, 2012, pp. 1017-1029.

[12] Redl, C. et al., “Automatic SLA Matching and Provider Selection in Grid and Cloud Computing Markets,” Proc. the 13th CM/IEEE International Conference on Grid Computing. 2012, pp.55-59.

[13] Brandic et al., “Advanced QoS Methods for Grid Workflows Based on Meta-Negotiations and SLA-Mappings,” Proc. 3rd Workshop on Workflows in Support of Large-Scale Science, 2008,pp. 56-60.

[14] Walayat et al., ”Maintaining Trust in Cloud Computing through SLA Monitoring,” Proc. ICONIP,2014, pp. 690–697.

[15] Mario Macías et al., “Analysis of a trust model for SLA negotiation and enforcement in cloud markets,” Future Generation Computer Systems.vol. 28, no. 5, 2013, pp. 1056-1072.

[16] Jagpreet Sidhu et al., ”Compliance based trustworthiness calculation mechanism in cloud environment,” Procedia Computer Science, vol. 37,no. 5, 2014, pp. 439 – 446.

[17] S. Anithakumari et al., “ Autonomic SLA Management in Cloud Computing Services,” Proc.SNDS, 2014, pp. 151–159.

[18] W. Fu et al., “a service-oriented grid monitoring system with improved forecasting algorithm,”Proc. Proceedings of the 5th International Conference on Grid and Cooperative Computing Workshops, 2006, pp. 211–219.

[19] Mohammed Alhamad et al., “A Survey on SLA and Performance Measurement in Cloud Computing,” Proc. OTM, 2011, pp. 469–477.

[20] Linlin Wu et al., ” Automated SLA Negotiation Framework for Cloud Computing,” Proc. Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013, pp.13-16.

[21] MA Bin et al., “An Optimized Vertical Handoff Algorithm Based on Markov Process in Vehicle Heterogeneous Network,” China Communications, vol. 12, no. 4, 2015, pp. 106-116.