Dacheng Zhou,Hongchang Chen,Guozhen Cheng,Weizhen He,Lingshu Li
National Digital Switching System Engineering and Technological R&D Center,Zhengzhou,Henan 450002,China
Abstract:Based on the diversified technology and the cross-validation mechanism,the N-variant system provides a secure service architecture for cloud providers to protect the cloud applications from attacks by executing multiple variants of a single software in parallel and then checking their behaviors’consistency.However,it is complex to upgrade current Software as a Service(SaaS)applications to adapt N-variant system architecture.Challenges arise from the inability of tenants to adjust the application architecture in the cloud environment,and the difficulty for cloud service providers to implement N-variant systems using existing API gateways.This paper proposes SecIngress,an API gateway framework,to overcome the challenge that it is hard in the cloud environment to upgrade the applications based on N-variants system.We design a two-stage timeout processing method to lessen the service latency and an Analytic Hierarchy Process Voting under the Metadata mechanism(AHPVM)to enhance voting accuracy.We implement a prototype in a testbed environment and analyze the security and performance metrics before and after deploying the prototype to show the effectiveness of SecIngress.The results reveal that SecIngress enhances the reliability of cloud applications with acceptable performance degradation.
Keywords:N-variant system;API gateway;cloud security;analytic hierarchy process
The Software as a Services(SaaS),as a service delivery model,enables software providers to expediently use various IT services without the need to build their physical machine(PM)and network infrastructure[1].Companies are increasingly deploying their web-based application in cloud environments for lower cost,easier management,higher scalability,etc.However,the complexity of traditional software functions leads to unavoidable backdoor loopholes and vulnerabilities[2].Moreover,it is inconvenient for cloud tenants to deploy traditional security products and technologies directly for the cloud’s centralized resource management in the cloud environment.The static nature of the executing environment in cloud data centers,as the main threat of the applications in cloud environments,gives adversaries chances to discover exploit vulnerabilities,find opportunities to exploit them,escalate privileges,and maintain a persistent presence over time[3].
N-variant system is an approach based on redundancy and diversity to protect programs from attacks by executing multiple diversified variants in lockstep on identical input,monitoring the outputs of variants,and checking their consistency to detect threats[4].The diversity technology avoids the common vulnerabilities of variants,so the output of a replica with vulnerabilities exploited by attackers is different from other variants at a high probability.Cloud computing is the overprovisioning of resources for scaling,and such dormant or underutilized resources can be utilized,at least sometimes,to enhance the security of services[5,6].Applying such resources to construct a N-variant system is a subtle balance between security and resource.Besides,with the benefit of container technology and convenient management in cloud environment[7,8],software applications can be containerized as images,and it is portable to replace the variants with read-only images when abnormal behaviors are detected by monitor,which enhances the resiliency of the N-variant system.However,transferring a typical application in a cloud environment to adapt to the architecture of the N-variant system with unfailing service exposure is problematic for popular API gateways,such as Kubernetes-Ingress,Haproxy,Kong,etc.Without continuous insight into coordinating the service latency and voting accuracy,cloud applications will not be able to attain service stability and reliability with appended investment that occupied to execute multiple variants.
In this paper,we propose SecIngress,an API gateway framework to secure the cloud application based on the N-variant system and expose the service of them to users.Compared with existing schemes,SecIngress provides a generic framework for cloud providers to upgrades applications based on the Nvariant system in the cloud environment,and solves the problem that applications cannot adopt N-variants to improve security under the cloud feature of dynamic scaling.The aims of SecIngress include ensuring users receive reliable responses and detecting abnormal behaviors of variants.Firstly,SecIngress receives HTTP requests from clients and redirects identical requests to multiple diversified variants.Secondly,SecIngress collects variants’responses and selects a reliable one to send back to clients after voting.Thirdly,SecIngress determines whether reporting abnormal software variants to the cloud manager according to further analysis during the voting phase.In this manner,we can detect abnormal variants and discard their output to interrupt the attack chain.
We implemented the prototype of SecIngress and built a testbed for it with a set of web application variants to evaluate the security gains and performance degradation of the prototype.We developed the prototype system based on NGINX,an open-source,highperformance HTTP server and reverse proxy,famous for its high performance and low resource consumption.It is worth mentioning that Kubernetes Ingress is implemented based on NGINX so that the SecIngress prototype can be deployed and managed in cloud environments easily with Kubernetes Ingress Controller.Based on the prototype,we firstly evaluated the security enhancements of SecIngress,by deploying simple variants on our micro-benchmarks by varying software running environment,e.g.,operating system(OS),and web containers.We tested them through the vulnerabilities scanner tool and penetrating test tools to evaluate the security gains by comparing the number of vulnerabilities or alerts between a variant with Kubernetes Ingress and variants with SecIngress.Secondly,we evaluated the performance influence after employing SecIngress for application variants by Apache Benchmark.Those evaluation experiments show that our prototype enhances the security of service with acceptable performance degradation.
The contributions of this paper are as follows:
•We propose SecIngress,a API gateway framework,to expose the service instance based on the N-variant system in cloud environments,providing a scheme for cloud providers to improve the availability and resiliency of cloud applications.
•We propose a two-stage timeout processing method in the process of collecting response messages of multiple variants with different execution efficiency,reducing the delays of responses received by clients.
•We propose an Analytic Hierarchy Process Voting under Metadata mechanism(AHPVM)for identifying the consistency of responses,reducing the probabilities of false-positive voting results under many reasonable inconsistent elements.
•We implemented the prototype system of SecIngress on the basis of NGINX.Its technology stack is the same as K8s Ingress,so SecIngress can be easily taken over by K8s Ingress controller in the container cloud environment.
The remainder of this paper arranges as follows:Section II introduces related work.Section III illustrates the motivations by analyzing several challenges and overview our scheme.Section IV describes the design of SecIngress,and section V presents voting mechanisms,called AHPVM.Section VI shortly introduces the implementation of our prototype.Section VII evaluates the security enhancements and performance degradation of Secingress,and Section VIII makes a conclusion.
In this section,a brief introduction to N-variant systems[4],and related works are provided.To our knowledge,N-variant systems can be regarded as the combination of diversity technique and redundant technique in the field of software security.We introduce related works from this perspective.
Diversity technique can be considered as replacing the components’ variant,which can be a server,programming language,operating system,or hardware,while the system provides equivalent functionality with the previous state[9–12].[9]used a recovery mechanism to enhance the system resilience by changing a running program’s variants erratically through which an extensive program can be divided into smaller components(e.g.,cells or tasks).[10,11]introduced a diversity technique aiming to increase the network service resiliency by deployed diversity on the virtual servers such as OS,virtualization components,Web Servers(WS),and application software.[12]proposed a method that diversifies the programming language in different web application layers to avoid code and SQL injection attacks.The redundancy technique aims to enhance a system’s reliability by increasing the number of components’ replica[13–15].[13]developed a redundancy approach for web servers,which aims to defend against malicious code injection attacks on a web server using a selfprotection model.[14]proposed an MTD technique that provides redundant web services aiming to maximize system dependability.[4]adopted a redundancy technique for cyber-physical system environments by maintaining redundant network sessions,including the distribution inter-packet compared with other typical network sessions.[16]used the diversity of instruction set architecture and application binary interfaces to prevent memory corruption attacks.
N-variant system consists of multiple diversified applications with identical behaviors[4].During attack[17],the divergence of the variants will be detected,and the system can restart the variants to achieve both self-protect and self-heal[18,19].Given an optimal combination of variants,a system running just two variants can defeat return to libraries,function pointer overwrites,and stack smashing attacks[4].N-variant systems have also thwarted information leakage[20],partial overwrite[21],code injection,and code reuse attacks[22].The attacker must devise a way to exploit vulnerabilities in each variant,and the exploit must compromise all the variants simultaneously or compromise them in a way that does not affect their behavior in order to evade detection[23].
Some researchers have put forward various studies to improve the safety and adaptability of the heterogeneous variant system.[24]adopted the construction of heterogeneous virtual machine clusters containing multiple operating systems to avoid the problem of fault propagation caused by a homogeneous environment.At the same time,They proposed a workflow scheduling method based on offensive and defensive game models to improve the heterogeneity of the execution environment to improve the security of scientific workflows.[25]drew on heterogeneous execution methods and used a combination of multiple security strategies to improve the security of intermediate data in scientific workflows.They also proposed a heuristic solution to search the optimal combination of multiple security strategies.[26]adopted the diverse variants to enhance network survivability.And the authors proposed a vulnerability-aware heterogeneous network device assignment scheme to improve the survivability of diverse variants consider that the variants have common vulnerabilities.[27]considered the diverse variants to improve the security of Openflow switch and proposed a correlation-aware dynamic instance switching to solve the problem that heterogeneous instances also have common vulnerabilities.
In the cloud environment,there is numerous usage of software diversity to enhance cloud service security.MEERKATS[5]creates idle and changing target applications to confuse the attacker and keep the redundancy in case of server failures.The Merkaats architecture integrates anomaly detection,data replication,and checkpointing to provide protection and restoration.DREME[6]defends against SQL injection attacks by using redundant database variants and diverse processes.Polinsky[28]proposed an n-m variant system to protect the media wikiphp applications serving dynamic content from external SQL persistent storage in the cloud.However,securing applications based on the N-variant system is a challenge due to processing the multiple data stream of variants and complicated voting under the protocol of L7.Our work aims to provide a proper API gateway framework to make it easy to deploy secure applications based on the N-variant system’s schema.
As the security of cloud applications is more fragile in an open and dynamic environment,the purpose of our work is to make cloud applications easy to use the N-variant system to enhance security.To effectively expose the service based on the N-variant system and ensure its security,we need to overcome the high delay effect of the N-variant system on the one hand and ensure the monitor’s voting accuracy for multiple diversified responses on the other hand.To achieve these two main goals,we face the following challenges.
The first problem is the high delay problem of the N-variant systems.Due to the distributed and independent operation of multiple variants,each variant’s execution time is varied.Monitors of multiple variants need to collect all variants’output,which undoubtedly binds the system’s output efficiency to that of the slowest variant.Consider the situation shown in Figure 1 that there are three application variants and one requestrqis processed by SecIngress.As described before,SecIngress needs to collect all the responses before voting,i.e.,.SecIngress needs to wait for three responses untilt3shown in Figure 2.However,the waiting time may be too long if any application failed,causing a large delay in the transaction.For example,if the 3−thapplication variant is blocked,t3may be large,and the client has to wait for the web page responding no less thant3.Generally,the N-variant technology will increase cloud services’response time,reducing the QoS of cloud services.
Figure 1.The flow diagram of requests and responses.
Figure 2.The responses latencies of the variants.
The second problem is the voting problem of the Nvariant output.The protocol complexity of the application layer brings more challenges to the voting of N-variant output content.Taking HTTP protocol as an example,long or short connection,chunked transmission,and 304-cache mechanism increase the diversity of variants’output.For example,as shown in Figure 3,it is different between the header fields of a simple web page running on Apache Tomcat and JBoss,respectively.Consider the reasonable divergences of diversified applications,the results of comparing strings simply that indicates there has abnormal applications are unbelievable,but both of them are believable actually.Toward reasonable inconsistencies,voting correct is a problem we need to address.
Figure 3.The reasonable divergences of responses between JBoss and Tomcat.
Figure 4.The system architecture of SecIngress.
The third problem is the oscillation caused by the rotation of voting abnormal executors.Due to the dynamic scalability of resources in cloud environments,the variant of voting exception,under attacks,can be cleaned offline,and new variants quickly deployed in the cloud environment.Therefore,the cloud manager and service agent need to exchange variant updated information,which will affect the continuity of services.
We address the first problem by designing a two-stage timeout processing method in the process of collecting response messages of multiple variants with different execution efficiency.The actual work environment statistics show that only a few variants are significantly lagging behind the others.Note that receiving two identical means achieving the pass condition of the majority decision.We can output the response message to the client first to avoid the client waiting too long.After that,another voting will work until the remaining response returns to check whether the existing abnormal response or not.
We address the second problem by extracting the metadata of application layer protocol and using the analytic hierarchy process(AHP)[29–31]to compare the consistency of response messages of variants from multiple dimensions.Through the statistical analysis of a large number of response messages of normal variants,we first analyze the approved metadata(maximum intersection)of response messages of different variants.Based on these metadata and status code classifications,we can vote more accurately under many reasonable inconsistent elements.
We address the third problem by decoupling the control plane and the data plane of the SecIngress.We decoupled the interaction function between the system and cloud manager to the control surface and designed a Message Broker(MB)to be responsible for the communication and negotiation with the cloud manager.The existence of MB avoids the dynamic changes of application variants disturbing the efficient data flow processing of the SecIngress,ensuring the high-speed processing ability of the data plane without state maintenance.
Based on the above ideas,we propose the architecture of SecIngress in Figure 4.The workflows of the SecIngress are as follows.
Figure 5.The connection management.
①Before the system starts working,container images of all application variants need to be packaged offline and deployed to the container cloud.
②The cloud manager sends the variant ip address and service port of the application to the Messenger Broker(MB),and the MB performs the initial configuration of the system.
③When clients’ requests terminate at SecIngress,the Dispatcher Proxy(DP)module copies the request and encapsulate the data packet according to the configuration information,and adds a unique identifier to each request.
④The DP establishes connections with the backend variants and redirects requests to n diversified application after copying packets.
⑤Then,the Response Monitor(RM)module collects the responses of multiple applications and classifies response packets as response groups for each request according to the inserted unique identifier.
⑥The RM analyzes each response group according to the voting mechanism(AHPVM),selecting one of the expected responses sent to its client.
⑦The RM detects abnormal responses and reports the compromised application variants to the MB.
⑧The MB,as a feedback agent running on the control plane of SecIngress and communicating with cloud manager,sends compromised variants identities to cloud manager which will reset the abnormal variants.
We aim to propose a generic API gateway that can support multiple applications with various service types.We do not introduce the data synchronization and consistency among multiple variants of an application in this work due to space limitation.
The Dispatcher Proxy(DP)is the only interface to access multiple diversified applications.It plays a central role in SecIngress architecture.Suppose there arenapplication variants andnis an odd number,which means that the majority voting schema can be used.Initially,DP receives the deployment information of application variants from Messenger Agent(MA)to configure its application variants list,As,i.e.,{a1,a2,...,an},which will be dynamically updated when any application variant represents a divergence in voting.When a request,rq,arrives DP,it duplicatesrqinto copies set,wherewill be sent to the application replicaa1∈As.Besides,a unique identification Γ(rq),usually presenting as random combinations of numbers and letters under irregular background,will be inserted into the HTTP header element of eachriq∈Rq,both to classify responses when multiple users are concurrent and as a unique tag for multiple application to synchronize some key parameters,e.g.,identifying code.Besides,suppose the application variants return responses after inserting Γ(rq)into the response header.In that case,the DP could validate this tag to detect whether there have packet hijack attacks between the DP and the diversified variants set.
Meanwhile,as shown in Figure 5,DP manages the connection of downstream,i.e.,the connectivity between client and DP,and upstream,i.e.,connectivity between DP and server.For the client,if it wants to access the application through browsers,it sends TCP SYN packet to DP,and DP then responds with TCP SYN-ACK.Finally,the client sends TCP ACK to DP,the TCP connection,i.e.,c0,is established.Then,the HTTP requestrqwill be sent to DP on the connectionc0.For the DP,it establishes TCP connections with each of application variants respectively,i.e.,{c1,c2,...,cn}after receivingrqand then duplicating it into requests replicas,i.e.,,and sent them to application variants based on{c1,c2,...,cn}respectively.After that,DP keeps the established TCP connections with application variants for waiting and collecting responses of each application variants,i.e.,.Then,a response,rs,selected by Response Monitor,will be sent to client through the TCP connectionc0.
We mainly introduce this module with three components,responses collecting,timeout processing,and voting pretreatment.
Responses collecting.As mentioned before,the responses need to be classified to continue voting if multiple users access concurrently.Consider two users accessing SecIngress at concurrency,there are two requests,rq1andrq2need to be duplicated toRq1,i.e.,andRq2,i.e.,which will be distributed to application variants at the same time.After the distribution operation described at 4.1,two response sets,Rs1,i.e.,andrespondingRq1andRq2,will return to SecIngress at the same time.However,the responses of application variants terminate SecIngress through NIC with the mixed and disorderly order,e.g.,.Thus,it is necessary to classify them intoRs1andRs2,providing the prerequisite for voting.It needs a distinguish tag in this process,and Γ(rq),inserted in processing request,is a proper tag for this requirement if it inserted into response header field.We use hash pool to store responses and set Γ(rq).When we receive a response,we first extract the tag Γ(rq)and take it as a hash key to store responses into a hash pool.Note that the value of hash pool is a chain table which can storenresponses.For example,the responses ofj −thuser will be stored into the chain table of which the key is Γ(rqj).Once all the responses ofj −thuser are collected,which means that the node number of the chain table with Γ(rqj)key equalsn,the voting forRsjcan be handled.
Timeout processing.Reasonable timeout processing is the basis for collecting responses because responses do not return at the same time due to servers’processing delays or network latency.
Note that the basic assumption of the N-variant system is that only a small number of variants can be compromised under certain attacks.It means that most consistent responses are the condition that a confident response exists under the majority voting schema.Therefore,if(n+1)/2 consistent responses ofnapplication variants are collected,the confident responserscan be selected from those(n+1)/2 responses and sent back to client.Thus,we design a two-stage timeout processing method.Specifically,when(n+1)/2 responses return,SecIngress votes on them in the first stage.If they are consistent,one will be randomly selected and sent back to the client the first time rather than waiting for other responses.While if they are inconsistent,SecIngress continue to vote when the next response until the condition that there are(n+1)/2 consistent responses is met,and then the client will also receive a response.When all responses return,SecIngress votes based on them in the second stage to check whether there have abnormal responses.Besides,the application variant will be regarded as abnormal if its response cannot be received within a predefined timeout period.
Under the two-stage timeout processing method,the client can receive the response message as soon as possible.It only needs to wait fort2under optimal conditions in the scenario shown in Figure 1 and Figure 2.At the same time,the client receiveswith the twostage timeout processing method rather thanrswith the traditional timeout processing.Note that a few of the attacked application variants have a high response delay in the actual working environment.Therefore,the two-stage timeout processing method can improve the performance of the N-variant system.
Responses pretreatment.After the responses collection phase,the last work before voting executing is to extract metadata from responses,which is the key to make a correct decision under reasonable divergences of application variants.Because software program diversification is a prerequisite to enhance software security in a N-variant system,it is ineluctable that the typical applications return responses differing from others due to several divergences in packets,such as additional blank spaces,different protocol regulars,and negligible marks.We regard them as reasonable divergences.
To overcome this challenge,we extract metadata from response packets to compare the multiple dimensions’ responses during voting.By analyzing the collected HTTP responses of diversified applications,we get the valid metadata in the HTTP response through semantic partition and pruning,including selfadaptive segment partition,e.g.,alignment,truncation,and complement.Based on the metadata obtained,we first divide the type of response according to the status code to avoid the interference of exceptional cases,such as the 304-cache mechanism and abnormal status code responses.All known exceptional cases are dealt with in corresponding ways,which is not the focus of this paper.Then,the analytic hierarchy process voting under metadata(AHPVM),which will be introduced in Section V,process all regular responses during the voting phase.
After the primary function of SecIngress is designed,service registration and discovery also need to be designed as the components of an API gateway.However,too many external interactions with the cloud manager system sacrifice the performance of SecIngress if the system has to interact with external devices in the course of dispatching requests and monitoring responses.To overcome this problem and further improve the performance of SecIngress,we decouple the function of the external interactions to control plane and append Message Broker(MB)module to take over the service registration and discovery.MB’s fundamental role is to receive deployment information,i.e.,the IPs and Ports,of application variants that update dynamically according to the voting results,and to send the information of malicious variants detected by the Response Monitor to the cloud manager for further scheduling.By this feedback mechanism,the N-variant application service system achieves resilience.
Before the SecIngress works,the MB firstly create an instance of SecIngress after receiving the a list of application variants,As,i.e.,{a1,a2,...,an}.During creating the instance,MB generally could regularly write the application list into a configure file and then send a semaphore to notice the main process of SecIngress to parse the configure file and cache the variants list in memory.Thus,the SecIngress only needs to check the configure file if receiving the operating system semaphore at a low rate,which will reduce the performance cost of the main process.Note that MB can produce more than one instance at the same time if there are several applications and each of them has a list of variants,i.e.,,respectively.
In addition to these basic service discovery features,when the monitor finds the response of the application variant is divergent from others,the main process of SecIngress will both update the variant list in memory and notice the MB with the information of this variant through a process to process communication.Then,MB will deliver this message to the cloud manager to further schedules.If MB receives new variants from the cloud manager,it also updates the configure file and notices the main process by semaphore to update the variant list in its memory.
The voting mechanism works based on the assumption that compromised variants are the minority of all variants.Therefore,the majority voting scheme can be utilized for selecting creditable responses.Because the metadata of response reveals the features of responses in multiple dimensions,a multi-objective comprehensive evaluation is necessary for balancing the vote results of all the metadata.In this section,the multipleobjective comprehensive evaluation based on the majority voting scheme is introduced.
The voting is a multiple-objective comprehensive evaluation problem based on multiple criteria,i.e.,the similarity of several metadata,and multiple options,i.e.,responses of application variants.The analytic hierarchy process(AHP)[29–31]is a theory of measurement to translate the multiple-objective comprehensive evaluation to multi-criteria ranking.The voting problem is shown in Figure 6 under the AHP schema.Selecting a response sent back to the client locates at the goal layer.The criteria layer includes m metadata of responses.Responses of N-variants are alternatives to the decision.
Figure 6.The AHP model for voting.
Criteria selection.The first criterion is the Response Status Line,also named starting line of theresponse message.Response Status Line consists of HTTP Protocol Version,Status Code and Reason Phrase,reveals the whole state of the response message.For the response header field,we only consider the self-defined unique identifier we mentioned in section IV.We insert it into request header field when dispatching and notify the application variant to carry it back with the response header field.Due to this agreement,we regard the unique identifier as a criterion to reflect whether there has the man-in-the-middle attack.Response Body contains the resource data that was requested by the client,the length of it is the valid criterion that reflect whether there has additional content in the response.However,the CRC checksum value of the response body is also needed consider the circumstance that the attack replace the response content with the equal length but malicious content.Besides,the functions or label in JavaScript or HTML that can be used to launch injection attacks are also important for the response message.Therefore,we count the number of the functions labels in Table 1 and regard it as a decision criterion in the criterion layer of AHP model.HTTP metadata and its corresponding functions are shown in Table 2.
Table 1.The functions/labels in JavaScript or HTML that can be used to launch injection attacks.
Table 2.The voting criteria.
Table 3.The relative scores[31].
Determine the weights vector of metadata.To calculate the weights for multiple criteria,a pairwise comparison matrixPmmis created as
wheremis the number of metadata dimension considered,and each elementpijof the matrix Pmmdenotes the relative importance score ofi −thmetadata dimension compared withj −thmetadata dimension.pijmeet the relations thatpii=1,pji=1/pij,andpij=pik/pjk,∀i,j,k∈Zm,Zm={1,2,...,m}.Generally,as described in Table 3,the value ofpijis quantified to a numerical scale from 1 to 9 or the inverse of 1 to 9,which shows the empiric value from the experts in a particular field.
Table 4.The values of the Random Index(RI)[31].
Once the comparison matrix Pmmis obtained,we check the consistency of it by calculating the Consistency Index(CI)of it by
whereλmaxis the maximum eigenvalue of Pmmand can be obtained by
The matrix Pmmneed to be adjusted until meet the condition thatCI/RI <0.1,where the Random Index(RI)are given in Table 4.Then,the metadata weight vector w can be obtained by normalizing the eigenvector a corresponding to the maximum eigenvalueλmaxby
The weight vector obtaining and adaptive adjustment.For unknown responses that have not yet been voted,it is difficult for us to determine in advance the reasonableness of the weights of each judgment criterion based on the metadata characteristics.However,it is certain that these criteria are valid if there if there is injection or tampering.Conversely,if in a security test environment where there is no attack,the voting should not produce false positives.On this condition,the criteria that reflect false positives should inevitably reduce the corresponding weight to improve the accuracy of the voting.
Based on the above analysis,we first set up a test environment where there is no attack,and obtain the false positive rate of each criterion in the voting under the condition of equal weight to determine the corresponding weight of each criterion.For example,since the content of the response body may contain the user token,which is a fixed-length random code,the CRC content of the package body may be voted abnormally while other criteria are normal.Therefore,to avoid false positives,the weight of C5 needs to be lower than C4.According to the 1-9 weight determination method mentioned above,we define the value of the element in the pairwise comparison matrix as
whereeidenote the false positive rate of criterioni.
The pairwise comparison matrix is obtained as
Then,we calculate thatλmax=5.12 and a=[0.23 0.78 0.36 0.10 0.44]T.Therefore,the result thatCI=(5.12−5)/(5−1)=0.03 andCI/RI=0.03/1.12=0.027<0.1 reveals the consistency of the comparison pairwise matrixP5×5.And we obtained the weight vector of our criteria w=[0.12 0.41 0.19 0.05 0.23]T.
In addition,on the basis of the weight vector initially obtained,we also adaptively adjust the weight according to the data obtained in the test environment.In each process of upgrading and testing new variant applications,we can adjust the weights adaptively according to the false positive rate of each weight.With enough training data,the weight vector can be more effective.
Calculating responses scores matrix under the majority voting schema.Responses scores,reflecting the credibility of the responses on multiple dimension is defined as
wheresijdenotes the score ofi−thresponse under thej −thcriterion.However,calculating the scores directly is difficult.Before that,we first obtain the pairwise comparison matrixforj −thcriteria which represent the scores of all responses underj −thcriterion.For example,the entryrepresents the score of thei −thresponse compared to ther −thresponse underj −thcriterion.Like the description in Table 3,the larger number ofmeans thati −thresponse is more believable thanr−thresponse underj −thcriterion.
To avoid the inexactitude of personal experience,matrixis derived from the majority voting schema.In practice,thej−thmetadata of variants are compared with each other,and classified as major setcontainingαvariants with consistentj −thmetadata and minor setcontaining otherβvariant.The numerical value ofαandβsatisfies the constraint condition that.A response belonging tois believable than that belonging to,and the ratio ofαtoβcan denote the credibility.Furthermore,responses oforcan be regard as equal credible.Therefore,an entryofis defined as
Note that it is not need to check the consistency of matrixdue to each entryobtained at the same time under the majority voting schema which eliminates the disadvantages of subjective judgement in AHP method.Then,to obtain the score matrixeach matrixis firstly normalized to matrixthrough
and the entities of the score vector sj,i.e.sj=,are obtained by normalizing the entries of each row ofby
Finally,the score matrix Snmcan be obtained as Snm=[s1,s2,...,sm].
Ranking global scores of responses.Based on the weight vector of metadata and the response score matrix,the global score vector g can be obtained from
which consists ofnentities,i.e.,g=[g1,g2,...,gn]T,reflecting the finally score of n responses.Then we rank the global scores in decreasing order and obtain final scores g′,The forehead responses of which are regarded as believable.One of them will be selected randomly to return to the client.In contrast,other responses are marked as abnormal and sent to the cloud manager through Message Broker to replace its correlate variants.
It is rare but possible that the majority consistency cannot meet,such as the number of responses collected less than(n+1)/2,or all variants’ behaviors differents,causing the voting mechanism discussed at 5.1 invalid.Moreover,abnormal processing of voting is needed.Consider that threat tolerance varies in different scenarios,and we provide the solution for this problem at two different levels as follows.
Optimistic voting.Assume that the application scenario has high threat tolerance.When the voting mechanism base on the majority voting schema,a response will be randomly selected from responses of variants to send to the client.Meanwhile,all variants will be regarded as abnormal variants and report to the cloud manager to replace them with other variants that are different from current variants to avoid the next abnormal transaction.
Pessimistic voting.Assume that the application scenario has low threat tolerance,e.g.,the system contains important data or executes critical tasks.Because attackers cannot be distinguished from typical users,any response cannot be sent back to the client if the responses’reliability cannot be determined by voting.Meanwhile,all the variants need to be replaced and even reset.
We have implemented the SecIngress prototype based on NGINX,an open-source,high-performance HTTP server and reverse proxy,famous for its high performance and low resource consumption.We now describe the key function of SecIngress:dispatcher module,monitor module,and messenger broker module,as shown in Figure.4.
The Dispatcher module aims to duplicate the request to multiple copies and send those copies to multiple application variants,which can be completed by the NGINX mirror module.The mirror module,mirroring an origin request by creating the sub-requests background,is usually used for copying HTTP request traffic from users for,maybe,detecting malicious traffic.For SecIngress,the main process of proxy module transfers requests to one application replica while mirror module duplicate request of user ton −1 subrequests for application variants,ensuring thatnapplication variants receive same input of request.
Responses to mirror sub-request usually are ignored.Therefore,we slightly modified the responses receiving module and filter module of the original NGINX while developing a self-filter module,inserted to filter module chain,to collect all the application variants’responses.The self-filter module performs blockingup responses and collects them into memory until all the responses back or the timeout is introduced at 4.2.About 5000 lines of C code are appended to the Nginx source code to accomplish the AHPVM along with the self-filter module.
Messenger Agent,developed in nearly 2000 lines of C code,plays the control plane’s role for SecIngress,which runs at the same machine but a different process with NGINX.It controls SecIngress instance by modifying the NGINX configure file,i.e.,nginx.conf,and reloading the NGINX instance.Meanwhile,it read abnormal replica from shared memory and sends alerts about them to the cloud manager.
As shown in Figure 7,we set up a simple test environment of API gateway.Our testbed setup uses a topology that consists of a test host,an internal switch,and four nodes of our private cloud,equipped with 2.50GHz 64-bit Intel(R)Xeon(R)CPU E5-2680 v3 processor with 12-cores,32GB RAM,2T disks,and four network interfaces with 1Gbps network speed.A node as the proxy server runs SecIngress and Kubernetes(K8s)Ingress separately,while other nodes run an open-source web application Jpress with diversified web servers and operating systems.
Figure 7.The testbed of the experiment.
Since the prototype system of SecIngress uses the same technology stack as K8s Ingress,our comparison method is to compare the security gains and performance differences between SecIngress and K8s Ingress.Due to different working modes,SecIngress works with three variants,while K8s Ingress only works with one variant.Therefore,the performance of SecIngress needs to be compared with the performance of the three sets of K8s Ingress.The Comparison schemes are shown in Table 5.
Table 5.The Comparison schemes.
Vulnerabilities Scanning.We deployed Nmap with Vulscan,a module to enhance Nmap to a vulnerability scanner,on the test host,Tester,and scanned web application vulnerabilities through it.The result,shown in Figure 8,depicted that the vulnerabilities number of the service instance discovered by SecIngress is less than the service instances discovered by Kubernetes Ingress 70.08%,41.13%,and 54.09% respectively,and at an average of 55.10%.
Figure 8.The vulnerabilities of Nmap scanning.
Figure 9.The threats of OWASP ZAP scanning.
Meanwhile,we noted that for different vulnerability libraries,the vulnerability information obtained by scanning is different.For example,the number of vulnerabilities of service instance discovered by SecIngress corresponding to the vulnerability libraries of Exploit-DB and OpenVAS(Nessus)is more than that discovered by Kubernetes Ingress.This shows that the variants deployed in our experimental environment are not completely heterogeneous.If most of the variants have the same vulnerability,then the vulnerability“hiding”effect of SecIngress will be invalid based on the majority voting scheme.This is why there are more vulnerabilities in part of the vulnerability library exposed by SecIngress.As the degree of heterogeneity of variants increases and shared vulnerabilities are further reduced,SecIngress can achieve better security effects.
Penetration Testing.We deployed the OWASP ZAP penetration test tool on the test host,Tester,and ran the Quick Start Automated Scans program based on it.The alerts scanning results,as shown in Figure 9,reveals that the SecIngress reduced the number of threats at an average of 41.70%.The number of the alert of services instance discovered by SecIngress is minimum,and the types of alerts of it are the least among other tests.For example,only the service instance with Jetty discovered by Kubernetes Ingress has the SQL Injection alert compared with Tomcat and Resin.This SQL Injection alert cannot be exposed to pass through SecIngress due to the majority voting.The reason is that all penetration tests are based on the interaction of requests and responses to obtain system information,and then determine effective attack methods.Because SecIngress cross-checks the response characteristics of multiple variants,and eliminates the response content of some variants that are sensitive to penetration testing methods,the information detected by the attacker cannot be obtained through the API gateway,so the type of threat is reduced accordingly..The reason why it is not completely eliminated is that these several variants in the test environment have a common response feature that simultaneously responds to the penetration method,so there will be response content that can pass the monitoring module based on the judgment of the large number.
Figure 10.The performance comparison between SecIngress and K8s Ingress under different packet size.
Figure 11.The performance comparison between SecIngress and K8s Ingress with different request concurrency.
Note that we only analyzed the SecIngress security gain through web applications with diversity techniques,including heterogeneous web servers and operating systems.However,the security gain will be no less than the improvement of this experiment through other diversity technologies such as rearranging memory[32],randomizing system calls[33],and randomizing the instruction set[34].
The purpose of this part is to evaluate the performance degradation of our prototype system.We mainly verify the performance degradation of SecIngress exposing service of variants when the resource size is different and the amount of concurrent HTTP requests is different.Through this evaluation,the performance in the actual work environment of multi-user and multiresource types is simulated.In the process of performance testing,we also evaluated the CPU occupancy in running Secingress and Kubernetes Ingress on the same proxy server under the same conditions.
It should be noted that our system is a comprehensive detection for potential threats,and the process of cross-checking the response content does not distinguish whether there is an attack or not.Therefore,regardless of whether the response content contains threatening content,a voting algorithm with the same time complexity is required for the Response Monitor module,and whether there is an attack or not has a negligible impact on the performance of the system.This is also the reason why we do not consider the attack scenario and only use the response content in the general data format for performance analysis.
Resource size impacts performance.The prototype of SecIngress worked on a proxy server with three web applications on the backend,each of which contains web pages with resource sizes range from 2KB to10KB.In this way,we simulated the existence of various resource pages of different sizes in the general server to illustrate the performance of SecIngress’s service exposure to multiple variants.During the actual experiment,for each resource size web page,we set the concurrency to 100 and set the total number of access requests to 1000,sent request packets to SecIngress,and recorded the response delay and the throughput of the requests using Apache Benchmark.Then,we tested the response latency and throughput of the Kubernetes Ingress to expose each variant,respectively.The experimental condition was identical to SecIngress,including Kubernetes Ingress running on the same proxy server and exposing variant introduced before.
In each group of testing,we record the proxy node CPU usage rate to evaluate the resource occupancy.To evaluate the performance degradation intuitively,we set both SecIngress and Kubernetes Ingress with a single-core processor.The experimental results in Figure 10 depicted that the latency increases,the throughput decreases,and the CPU utilization rate increases as the size of the resource increases.Compared with Kubernetes Ingress exposing a single variant’s service,the response delay of SecIngress exposing the service of multiple variants increased by 21.45%,throughput decreased by 48.77% and CPU utilization increased by 33.72% on average.The main reason is that SecIngress has processed three times the data volume of K8s Ingress,and its performance in all aspects has been reduced to a certain extent.As the size of the data packet gradually increases,the pressure of SecIngress doubles and the performance loss gradually increases.It is foreseeable that under the same configuration,the ultimate processing capability of SecIngress is weaker than that of K8s Ingress,which is also a certain resource price for absolute security in a specific scenario.
Figure 12.The performance comparison between SecIngress and K8s Ingress with different variant number.
Request concurrency impacts performance.Like the aforementioned experimental method,we configured three backend web applications containing 2KB of web pages.We then accessed the web services through SecIngress with a concurrent value range from 200 to 1000 with 200 gradient increments while recorded the delay and throughput.To guarantee the test results’ stability,in each experiment,we set the total number of requests in Apache Benchmark to ten times the corresponding concurrency to eliminate errors.Besides,we tested the performance of Kubernetes Ingress that exposes a single variant under the corresponding conditions.In each experiment,we recorded the CPU usage.
The experimental result in Figure 11 shows that the delay increases,and the CPU usage increases as the number of concurrent user requests rise.Completely,compared with the use of Kubernetes for service exposure,the latency of using SecIngress increased by 28.53% on average,throughput decreased by 30.45%,and CPU usage increased by 31.41%.The reason is that with the increase of concurrency,both SecIngress and K8s Ingress are facing greater network queuing delay and their own processing pressure,so response delay and CPU utilization have increased significantly.Due to the doubling of the processing power of SecIngress,the delay and CPU usage are further increased.
In addition,we noticed that as the amount of concurrency increases,the throughput of SecIngress and K8s Ingress first increases and then decreases.The reason is that when the amount of concurrency is small,as the amount of concurrency increases,the amount of data transmission can increase rapidly,so the throughput increases with the increase of the amount of concurrency.However,with the further increase in the amount of concurrency,the processing pressure of the system gradually increases,and the processing delay increases,which hinders the further growth of the throughput,and even reduces the throughput.
Performance impact of the number of variants.In practical applications,the number of variants should be no fewer than three.The greater the number of variants generated using different diversification technologies,the smaller the risk of shared vulnerability exposure.Considering this reason,we need to evaluate the impact of different variant numbers on performance.In this part,we merely deployed multiple isomorphic variants to simulate this application scenario.
As in the previous experiment process,we utilized the Apache Benchmark to perform appended inspections on scenarios with a different variant number.The experimental results in Figure 12 show that latency of responses increases,throughput decreases,and CPU usage increases as the number of variants increases.For each variant appended,latency increases by 16.92%and throughput decrease by 33.57%on average,and CPU usage increases by 12.04% on average.The reason is that as the number of variants increases,the amount of data that the SI needs to process and the number of connections occupied will increase,so the response time will increase,and the throughput will decrease.It should also be noted that with the increase in the number of variants,the CPU occupancy rate here does not increase significantly.The reason is that with the increase in the number of variants,the computational complexity has not increased according to our voting algorithm.Only the extra connection handle occupies part of the CPU,so the CPU occupancy rate does not increase significantly.
This work introduced SecIngress,an API gateway framework to secure the cloud application based on the N-variant system.We first reduce the performance degradation of waiting time for multiple responses by designing a two-stage timeout processing method.We then improved the voting mechanism of the major voting schema by designing an analytic hierarchy process voting under the metadata mechanism.Finally,we implemented a prototype of SecIngress.Through security gain evaluation and performance degradation evaluation of our prototype,SecIngress demonstrated the practicality of decreasing the rate of reachable vulnerabilities or threats with acceptable performance degradation.
The authors would like to thank the reviewers for their detailed reviews and constructive comments,which have helped improve this paper’s quality.The research reported in this paper was supported by the Foundation of the National Natural Science Foundation of China(62072467),the Foundation for Innovative Research Groups of the National Natural Science Foundation of China(61521003)and the Foundation of the National Natural Science Foundation of China(62002383).