Customizing Service Path Based on Polymorphic Routing Model in Future Networks

2019-07-24 09:28WanweiHuangChunfengDuJianweiZhangChanghaiWang
China Communications 2019年7期

Wanwei Huang,Chunfeng Du,Jianwei Zhang,*,Changhai Wang

1 School of Software,Zhengzhou University of Light Industry,Zhengzhou 450002,China

2 School of Computer and Communication Engineering,Zhengzhou University of Light Industry,Zhengzhou 450002,China

Abstract: The current Internet has evolved during the last decade to a global provider of diverse applications.However,the underlying structure of routing and addressing has not evolved in the same pace and is somewhat inflexible.How to provide diverse routing services,support emerging communication paradigms based on limited and definite network resources has become an urgent challenge.This paper investigates the adaptive matching between routing and application through network function decomposition and composition,and proposes a polymorphic routing model to support diverse applications and emerging communication paradigms.The model splits complex routing functions into its constituents,and derives customized routing mechanisms supporting various applications by composing the routing constituents.The derivation process is modeled as a Markov Decision Process (MDP),and a polymorphic derivation algorithm is also proposed to derive customized routing instances for diverse applications.The model enables the network to self-adjust routing services dynamically to adapt to the different requirements of applications,supports coexistence of multiple routing modes and communication paradigms,and provides a feasible solution for the network compatibility and evolvement.We describe the key design and demonstrate the feasibility of polymorphic derivation by simulations.We also present case studies that demonstrate key functionalities the polymorphic routing model enables.

Keywords: polymorphic; base state; routing model; MDP; reinforcement learning

I.INTRODUCTION

Although the current Internet has gained great success in providing storing and forwarding service for a large number of users,the single and fixed network layer has also become the Internet’s most challenging limitation.As the diversity of applications and communication paradigms (e.g.,service-based networking,content-based networking,etc.) continues to increase,and the demand for security greatly goes beyond the initial design,the applications require that the network can provide diverse functionalities and increase flexibility and adaptability to novel network functions.However,the single and fixed network layer cannot accommodate such adaptation and thus has limited the deployment of innovative applications.Moreover,the ossification of the network layer impedes the flexible deployment of new network functionalities (e.g.,routing algorithms,security services,etc.).

Facing the demands for diverse services,the conventional Internet service model (e.g.,addressing and routing) has become inadequate sue to its poor flexibility,adaptability,customizability,reliability,security and QoS (Quality of Service).The diversified applications and increasingly emerging communication paradigms demand a scalable,robust,flexible and adaptive Internet service infrastructure.The characteristics and service capabilities of the network are decided by the interconnecting and routing structure,which is the core of networks.By the physics principle that the function is decided by the structure,the key of solving the current problems in networking is directly enhancing the capability of Internet interconnecting and routing [1].Specifically,customizability,adaptability,security,performance,and reliability require support within the network.Therefore,it is imperative that the designs of the future network should consider how to integrate flexibility and adaptability into the network architecture from the perspective of routing.

Currently,the research on the future networks is progressing like a raging fire.However,these researches are mainly concerning on the network architectures,and hardly refer to the routing structure.We do not plan to present an exhaustive survey of all of the research efforts on the future Internet design.It is our willingness that we can present a clear current research situation on future Internet design from the perspective of routing structure.In the routing structure of future Internet,the current research can be classified into evolutionary and clean slate from the design philosophy.The former proposes patches to current Internet routing [2-9],whereas the latter argues that the Internet architecture should be redesigned fundamentally [10-18].

In the evolutionary research aspect,the authors in [2],proposed an overlay based Internet Indirection Infrastructure (i3) which acts as a communication abstraction supporting services such as multicast,anycast and mobility.To resolve the routing scalability,some proposed architectures [3,4,5] that try to separate identifier from location,and the corresponding identifier based routing mechanisms [6-9] have been proposed.To a certain extent,these patches to the Internet have eased the problems that the current Internet faces.However,these proposals cannot resolve the problems fundamentally.As a result,a clean-slate architecture design paradigm has been suggested by the research community to build the future Internet.In the program of FIA,the Named Data Networking (NDN) [10-15] project is intended to build a different model that enables the network to focus on “what” (contents) rather than “where” (addresses).The data are named instead of their location (IP addresses).Data become the first-class principal in NDN.The Expressive Internet Architecture (XIA) [16] is also one of the four projects from the FIA program,and aimed to build an expressive architecture embedded with intrinsic security.The MobilityFirst [17] project aims to address the cellular convergence trend motivated by the huge mobile population of 4 to 5 billion cellular devices,and provide mobile peer-to-peer(P2P) and infestation(delay-tolerant network,DTN) application services which offer robustness in case of link/network disconnection.The NEBULA [18] is another FIA project focused on building cloud-computing-centric network architecture.NEBULA aims to design the cloud service embedded with security and trustworthiness,high service availability and reliability,integration of data centers and routers,evolvability,and economic and regulatory viability.These architectures resolve some problems the current Internet facing form different aspects.However,these architectures are not flexible enough to adapt different requirements of various applications.The appearance of Software-Defined Networking (SDN) [19,20] and Network Function Virtualization (NFV) [21],makes flexible and customized network functions and services for various applications possible.

However,how to support increasingly emerging networking paradigms and diverse networking applications based on limited and definite network functions is still unknown.The solution of this problem need a fundamental shift in the networking routing structure:instead of limiting networks to provide fixed and single routing function,more dynamic and flexible routing model should be introduced into the network [22].Such routing model can make adaptive matching between routing and application based on the routing structure self-organization and routing functions self-adjustment,and provides diverse routing functions for diversified applications in the mixed network with the coexistence of various addressing and routing.

In this context,to promote the capacity of network transmission and routing service,we in this paper propose a polymorphic routing model,to support multiple network architectures,routing and addressing styles.The basic idea behind polymorphic routing model is as follows:to address the network routing simple and single function,we divide the network layer into base state layer and polymorphic layer in this model.The base state layer defines the basic routing and addressing capabilities,and is the set of the network diversified routing and addressing elements and capabilities.The polymorphic layer is the set of the specific routing instances which can satisfy the requirement of the specific application.The polymorphic routing model can derive diverse and customized routing and addressing functions,to dynamically promote the fundamental networking transmission and routing services by flexibly composing the different routing atomic capabilities in the base state layer.

The rest of this paper is structured as follows:in section II,we describe the polymorphic routing model in detail.In section III,we present a polymorphic derivation algorithm.In section IV,we describe our implementation of the polymorphic routing model from three aspects:framework,design,and principle.In section V,we present some experimental evaluation results and case studies.Finally,we conclude this paper in section VI.

II.POLYMORPHIC ROUTING MODEL

In current Internet,the requirements of applications are diverse and changeable.However,the network capability is definite and limited.How to provide diverse services for applications based on limited and definite network resources has become an urgent challenge.As the networking community is attempting to design the future Internet,we should consider how to design routing services which can dynamically adapt diverse and changeable applications when multiple network architectures coexist.From the perspective of the diverse requirements of applications,it is important to consider what choices are available within a network and how to pick a suitable combination for applications.

To address the issues discussed above,we first abstract and induce the various and changeable routing services.Then,the addressing and routing functions are decomposed as routing atomic capabilities which have comparatively unattached function and explicit interface.Finally,by providing different combination of the routing atomic capabilities along the data path,the network can form a specific routing protocol which satisfies the given quality of service and network dynamic behavior characteristic for a specific application.The combination of routing atomic capabilities used for communication determines the functionality,quality,and performance of the application.However,one of the major challenges of implementing such a network is how to decide the appropriate routing atomic capability composition for a specific communication requirement.

Fig.1.Polymorphic derivation model.

To efficiently implement routing atomic capability composition,we propose a polymorphic routing model which divides the routing structure into two layers:base state layer and polymorphic routing layer.As shown in figure 1,the base state layer is the set of routing atomic capabilities which satisfy all potential functions,security,quality,and performance requirements.The polymorphic routing layer realizes diverse polymorphic routing instances by routing atomic capabilities composition for various applications.Specifically,Polymorphic Routing (PR) are the routing instances which are built based on the diverse requirements of applications and dynamic network behaviors.

Polymorphic routing model can adapt to communication between a richer set of principals than many other architectures.The key of polymorphic routing model consists of three parts.The first is the description of routing service requirement,called routing service description (RSD),which expresses the requirements of applications desired byPRmodel.The second is base state routing which is the set of routing atomic capabilities that are routing basic elements and have explicit definition including name,type,input and output.The third is polymorphic derivation model,which models polymorphic derivation process as a MDP and is core of deriving polymorphic routing instances.We will focus on these aspects ofPRmodel in the following.

2.1 Routing service description

To describe the requirements of applications,we define the concept of routing service description.RSDis the description of the routing services based on the requirements of applications.RSDincludes security description,performance description and function description.The security description includes security levelSLand security classSC.The performance description includes priority requirementPQ,bandwidth requirementPB,delay requirementPDand packet loss requirementPL.The function description includes communication modeFM,routing typeFT,routing identifierFIand communication principalFH.

Based on the above definition,theRSDattributes space can be defined as follow.

According to the definition ofRSDattributes space,all possibleRSDs form the superset ofRSD,which is defined in (2).

For any applicationi,theRSDiis the subset of F,that is,RSDiÍ F.RSDican be expressed as follow.

Note that the value of any attribute can be set zero in (3),and the zero value represents the applications do not have this attribute requirement.The range of these attributes value can refer to the previous research fruits about the requirements of network applications.

2.2 Base state routing

The base state routing is the set of routing fundamental elements,which are called routing atomic capabilities.A routing atomic capabilities is an abstract resource that represents a capability of performing any function.The routing atomic capability,which has explicit definition including name,class,level,input and output,can be classified into different classes according to function,and has different levels in each class.Furthermore,a routing atomic capability may take a particular role in performing a routing.Assumerprepresents a routing atomic capability,andm,nis the class and level of the routing atomic capability respectively.The set of routing atomic capabilities,base state routing (BSR) can be defined as follow.

In (4),the routing atomic capabilities inBSRare exclusive,and cannot be replaced.We can derive polymorphic routing instance from the base state routing by selection and configuration of routing atomic capabilities.The process that the routing protocol is customized from base state routing to polymorphic routing instance can be described in (5).

WhereAis the class selection vector andBis the level selection vector.

The feasible polymorphic routing instances are defined as {PR1,PR2,…,PRi}.That is

whereKrepresents the number of feasible polymorphic routing instances.

2.3 Polymorphic derivation model

The Polymorphic routing instance is derived from base state routing.In the derivation process,which we called polymorphic derivation,the choice of next routing atomic capability is not only based on the demands of application,but also according to the output of previous routing atomic capability.And considering the polymorphic derivation process is a sequential decision making process,and has memoryless property,we use Markov decision process theory to model it.

As we use Markov decision process (MDP) to model polymorphic derivation,we first define MDP.In Ref.[23,24],a MDP is defined as:

Definition 1.(Markov Decision Process (MDP)).A MDP is a five-tuple MDP = {K,S,A(s),p(s’|s,a),r(s,a)},where

—K:Decision epochsK= {0,1,2,… ,Nπ(s)}.At each decision epoch,the process is in some statesÎS,and the decision maker may choose any actionaÎA(s).

—S:A finite set of states.Each element represents a state.When an agent arrives at a state,the agent can observe the complete state of the set.

—A(s):A finite set of actions.The set of available actions depends on the current statesÎS.

—p(s’|s,a):With current state s and action a,the process then evolves to a new states’according to a transition probability functionp(s’|s,a).

—r(s,a):Similarly,for any action that the decision maker chooses at each state,there is a corresponding rewardr(s,a).The goal of each decision is to maximize the expected total reward it can obtain during the decision making process.

Definition 2.(Decision rule and Policy) A decision ruleδt,for timet,determines a probability distribution of actions overA(s).It is the principle that guides taking action at any state.

A policy is a sequence of decision rules.A policy is denoted byπand takes the formπ= {δ1,δ2,…,δt,… }.Symbolπis used for both infinite and finite horizon problems.To obtain a finite horizon policy from an infinite horizon policy,its operation must be restricted to a finite number of time units.A policy tells how to determine actions for any time unit of the process.

The decision problem is to compute a policy that will maximize the value of some optimality criteria before the decision process begins.The criteria of Markov decision process used in this paper is finite horizon total expected reward.

1) Epochs,States and Actions

The whole polymorphic derivation process can be naturally divided into several epochs,and each epoch corresponds to one class of routing atomic capability in polymorphic derivation process.When the polymorphic derivation process evolves to one epoch,the polymorphic derivation engine should decide to select a concrete routing atomic capability,and the moment of making decision is called decision epoch.Here,we useTkto denote thekthdecision epoch withT0= 0.If there aremclasses routing atomic capabilities to be bound in a polymorphic derivation problem,each class corresponds to a state.The system state space is composed by the set of routing atomic capability classes.LetSbe the system state space.There areS= {s1,…,sk,…,sm},whereskrepresentskthclass of routing atomic capability in this polymorphic derivation process,and corresponds to theakin the class selection vectorA.sk= 1 represents this class is active and has been bound to a concrete routing atomic capability.At decision epochTk,and statesk,the actiona(sk) is selected from the set of all possible candidate routing atomic capabilitiesA(sk).A(sk) is action set and is all candidate routing atomic capabilities which have different levels in classsk.The action setA(sk) corresponds to the level selection vectorB.Then,a stationary policyπrepresents a mapping from states to actions,i.e.π:S→A.Based on the current statesk,the system that selects different actions will make a transition from its current state to a resulting statesk+1with different probabilitiesp(sk+1|sk,a(sk)).Herep(sk+1|sk,a(sk)) is the transition probability,that is ,under a concrete actiona(sk),the system transits from current stateskto next statesk+1withp(sk+1|sk,a(sk)),and the detail definition of the transition probability is stated in next section.

2) Transition Probabilities

The transition probability as defined in a MDP is Markovian,that is,the probability of reaching the next state depends only on the current state and action,and not on the history of earlier states.Inclusion of the transition probability allows MDPs to model and reason with non-deterministic (uncertain) actions.In the polymorphic derivation process,there are multiple different levels routing atomic capabilities in a class.But not every candidate routing atomic capability is an optimal or near optimal selection.So,we use matching degree to indicate the selected routing atomic capability matches the requirement to what extent.The definition of the matching degree (MD)can be stated as follow.

Wherexirepresents the attribute value of theithrouting atomic capability in the action set,andxreqandxmaxrespectively represent the requirement of application and maximum attribute value in the action set.

Based on (9),the transition probability can be defined as follow:

The definition of transition probability meets the requirement of a MDP,and indicates that the higher the matching degree,the bigger the transition probability.

3) Reward Functions

When the system chooses an actiona(s) in states,it receives an immediate rewardr(s,a).Reward is the key to make a decision in the MDP.As highlighted above,the polymorphic derivation is based on theRSDof applications.RSDis the requirement description of routing service which includes security description,performance description and function description.Thus the definition of reward function must also consider theRSDof applications.On the basis of the given analysis,the reward function can be stated as follows.

Whereω1,ω2,andω3are weighting factors for adjusting the proportion of three benefit functions in the reward function,andω1+ω2+ω3=1.fs(s,a),fp(s,a),andfo(s,a) represents the security benefit function,performance benefit function and function benefit function respectively.

The security benefit function is defined on the principle of the smaller difference between requirement and routing atomic capability the higher benefit.The goal is to reward the routing atomic capability that satisfies the security requirement and punish the routing atomic capability which does not satisfy the security requirement.fs(s,a) represents the benefit that the system gains in the term of satisfaction degree by choosing actionain states,and can be expressed as:

WhereSLreqdenotes the required security level of the applications,andSLmaxdenotes the maximum security level in the base state routing.

The performance benefit function is defined on the principle of the higher performance satisfaction degree the higher benefit.The performance description includes priority requirementPQ,bandwidth requirementPB,delay requirementPDand packet loss requirementPL.Therefore,the performance benefit functionfp(s,a) is the synthesis of all the performance requirements,and can be stated as:

WherePirepresents the observed value of theithperformance attribute,PmaxandPminrepresent the maximum and minimum values ofPifor all routing atomic capabilities.φiis the weighting factor ofPi.

The function benefit function is defined on the principle of the smaller the cost the higher benefit.The cost is the expense that achieving the function pays out.The function benefit functionfo(s,a) can be defined as:

In practice,the cost maybe is the processing delay of performing function.We can define the cost function according to the practice needs.The goal of polymorphic derivation problem is usually to find the solution which minimizes the cost.

III.POLYMORPHIC DERIVATION ALGORITHM

The polymorphic derivation process is modeled as a MDP which allows network to integrate multiple alternative routing atomic capabilities into a single polymorphic routing protocol.During the execution of a polymorphic derivation,the system can dynamically choose the optimal policy that would offer the best possible reward.The traditional method for computing the optimal policy of MDP is stochastic dynamic programming which needs all the knowledge of the MDP model,i.e.state transition probability matrix and reward function.When the complete MDP is known,the optimal policy can always be calculated theoretically.However,this is not true in practice.Firstly,we may not have complete knowledge about the state transition functions of the PDMDP (Polymorphic Derivation Markov Decision Process),as the results of routing atomic capability are not always predicable.Secondly,we may not have sufficient knowledge about the reward functions of the PD-MDP.Moreover,as the environment of a polymorphic derivation keeps changing,both the state transition functions and the reward functions change with time.

Due to the above issues,we choose to learn the optimal policy of a PD-MDP at runtime.In this section,we introduce a reinforcement learning scheme to orchestrate PD-MDP based polymorphic derivation.Reinforcement learning is primarily concerned with how to obtain the optimal policy when such a model is not known in advance.The system must interact with its environment directly to obtain information,by means of an appropriate algorithm,which can be processed to produce an optimal policy.

In this section,we first give a brief overview of a reinforcement learning called Q-learning.Following that,we show how to apply reinforcement learning to PD-MDP.

3.1 Q-learning

In reinforcement learning,the goal of the decision-maker is to learn a policy of the MDP that maximizes the expected sum of the reward.Because of the state transition probability function and reward function of the PD-MDP may not be known,the optimal policy cannot be calculated directly.It has to be learned through a trial-and-error process.LetVπ(s) denote the expected total reward between the first decision epoch and the termination epoch,given that the policyπis used with initial states.That is,

Based on (15),the PD-MDP optimization problem can be stated as follows:

Letπ* denote the optimal policy.The optimal policy is the policy that maximizesVπ(s) for all policiesπ.Therefore,

According to (16),the optimal equations of the PD-MDP are given by

Where the value functionukis the expected total reward from the decision epoch 0 to the decision epochk.The optimal value function is unique and can be defined as the solution to the simultaneous equation (16).Given the optimal value function,we can specify the optimal policy as:

One way to find an optimal policy is to find the optimal value function.Due to the state transition function and reward function of the PD-MDP may not be known,we cannot calculate the optimal policy π* directly.Q-learning algorithm [25] is usually used to obtain an optimal policy when the model is not known in advance.In order to understand Q-learning,we first give some additional notation.LetQ*(s,a) denote the expected cumulative reward of taking actionain states,then continuing by choosing actions optimally.Note thatV*(s) is the value of statesassuming the best action is taken initially,and soV*(s) = maxaQ*(s,a).Then,Q*(s,a) can be written

Algorithm 1.Q-learning algorithm.Initialize Q*(s,a);for each episode do s s← 0; for s s∉ rdo Choose as As () ()∈ based on ε-greedy policy; Execute a(s),observe reward r and new state s; Q sa Q sa ar Q sa Q sa*(,) *(,) [ max *(,) *(,)]← + + -γ a ; s s← ; end for end for

recursively as:

The Q function represents the best possible cumulative reward of taking actionaat states.Note also that,sinceV*(s) = maxaQ*(s,a),we haveπ*(s) =argmaxaQ*(s,a) as an optimal policy.

The recursive definition of Q function forms the basis of the Q-learning algorithm [25].Q-learning uses the Q function to simulate the cumulative reward.It starts with some initial values ofQ*(s,a),and updatesQ*(s,a) recursively using the actual reward received by the system in a trial-and-error process.The complete learning process is depicted in Algorithm.1.

In this algorithm,we first initializeQ*(s,a) in the beginning,and assume that there is a single initial states0and a set of terminal statesSrin a given MDP.Then,the learning process is performed recursively.In each episode,the learner begins from the initial states0,and selects a sequence of actions usingε-greedy policy.When the system reaches a terminal statesÎSr,the episode ends.After executing each action,the learner updatesQ*(s,a) using the following equation.

Wherer+ gmaxa’Q*(s’,a’) represents the immediate observed reward,andQ*(s,a) represents the previously observed reward.Note thatQ*(s,a) is not replaced completely by the immediate observed reward value.Instead,it only updates a certain portion of theQ*(s,a) value,which is quantified bya(0≤a≤1).ais called learning ratio,which is an important tuning factor in Q-Learning.Intuitively,the higher the learning ratio,the faster the learner will find the optimal policy.However,the higher the learning ratio,it is more likely that the learner be locked in a local optimal region.

3.2 Polymorphic derivation

The mechanism that applying the Q-learning algorithm to a PD-MDP based polymorphic routing model to determine the optimal or near-optimal polymorphic routing protocols is called polymorphic derivation.Rather than treating the Q-learning as a learning system,polymorphic derivation uses it directly as the execution engine.When the polymorphic routing model receives an application request,it begins executing polymorphic derivation using Q-learning algorithm.The model applies theε-greedy policy of the Q-learner to pick a routing service chain to execute.The complete execution process of a routing service chain is treated as an episode of the learning process.After each episode,theε-greedy policy and the Q-functions are updated based on the immediate observed reward.

The polymorphic derivation achieves self-adaptivity automatically by combining the executing and learning.When the resources or the requirements of applications change,the polymorphic derivation will change its policy based on its new observation of reward accordingly.The polymorphic derivation does not require prior knowledge about the attributes of the routing atomic capabilities,but is able to achieve the near-optimal policy by learning.

To apply Q-learning to a PD-MDP,an important issue is to define the reward function of the learning process.The ultimate objective of our mechanism is to derive dedicated routing protocol for customizing special service path.The definition of reward function has been given in section 2.In practice,the polymorphic routing model can obtain the information about the reward function by two ways.The first way is the requirement of applications that can be specified by the application providers,and we can compute the reward values according to the definition of the reward function.

The second way to access the requirement of applications is user feedback.The polymorphic routing model can allow application provider to rate the polymorphic derivation after each transaction.The rating is a kind of direct measure of the final reward received by the user.User feedback allows the model to capture some properties that cannot be directly quantified.Different from the specifiedRSDbased reward,users’ feedback can only be measured at the final step of an episode.Luckily,the polymorphic derivation using Q-learning is able to propagate the influence of the final reward to the intermediate routing atomic capabilities of the derivation.Even whenRSDare not used,user feedbacks can still allow the model to obtain a near-optimal policy.

IV.IMPLEMENTATION

In this section,we describe our implementation of the polymorphic routing model from three aspects:framework,design,and principle.

4.1 Framework

The framework of the polymorphic routing model adopts three layers structure which consists of administer plane,control plane,and data plane.The administer plane takes charge decision making of the polymorphic routing.The control plane takes charge executing the polymorphic routing.The data plane takes charge forwarding data.Three planes implement the polymorphic routing model coordinately through information exchange with each other.

The function of the administer plane is implemented in the control servers.On one hand,the administer plane should apperceive,maintain,and update the whole network states and nodes information which are the foundation of polymorphic addressing and routing.On the other hand,the administer plane should also achieve management and assignment of the identifiers which include locator identifier (LID),host identifier (HID),service identifier (SID),and content identifier (CID).The administer plane is the core layer and administrable center of the polymorphic routing model,which supervises and inspects the whole network states intelligently.The control plane executes the network routing function.It performs the concrete steps and actions which the administer plane commands and announces.On one hand,the control plane executes polymorphic derivation according to the administer plane.On the other hand,the control plane calculates and updates the routing information through network states which are apperceived from the administer plane.The data plane performs data forwarding,and achieves dynamic and flexible resources collocation on demand based on diversified addressing and routing.This plane implements the polymorphic routing functions which satisfy the given data transfer demand.

Figure 2 shows the framework of polymorphic routing model.When an application needs customized routing services,it should send a request which includes itsRSDto the service acceptor.The service acceptor will validate the request and execute semantic mapping.The polymorphic derivation engine will try to solve the requirement by composing the routing atomic capabilities.Then the execution engine will accept the corresponding workflow,and send the routing atomic capability specification to the atomic capability matching,which will find the most appropriate routing atomic capabilities and return the information to the execution engine.Finally,the execution engine invokes and executes each routing atomic capability,and gives birth to a polymorphic routing protocol which is used to route the application data.

As the diversified development of the networking applications,the network communication principals also present diverse trends.With the increase of content and service applications,the need of multiple network routing and addressing functions is gradually enhanced.Aiming at the network routing and addressing problem when multiple principals coexist,the polymorphic routing model builds diversified paths which satisfied the need of multiple principals according to the requirement of various applications and dynamic network behaviors.The architecture of the polymorphic routing model is depicted in figure 3.

The operation process of the polymorphic routing model is as follows:the reconfigurable routers run the polymorphic routing model to derive a specific polymorphic routing protocol according to the routing service description of the application.The polymorphic routing protocol calculates the routing table by the network topology and resource states.The next hop in the routing table connects together to form a service path which satisfies specific requirement of the application finally.Figure 4 depicts two route instances which satisfy security restriction and QoS restriction.

Fig.2.The framework of polymorphic routing model.

Fig.3.The architecture of polymorphic routing model.

In polymorphic routing model,the network function and behavior can be dynamically changed by the requirement of users through the process of polymorphic derivation.The polymorphic routing model may give birth to different polymorphic routing protocols or switch between different running states under one same polymorphic routing protocol.The mechanism of polymorphic derivation can build multiple service paths for different principals which need different routing protocols.The polymorphic routing model is the foundation of the resources isolation and assignment when diversified services coexist in future Internet.

4.2 Design

The foundation of the polymorphic routing model is the specific addressing and routing based on unified definition of identifier structure.The diversified addressing and routing which the polymorphic routing model holds out includes routing based on locator,routing based on identifier,routing based on content,and routing based on service.The new addressing and routing protocols can also be defined quickly according to the polymorphic routing model.The four styles of the identifier are defined as follows.

● Locator Identifier (LID),LIDis used in locator-centric network which corresponds to the traditional IP network.

● Host Identifier (HID),HIDis used in host-centric network.The network based on host-centric usesHIDas the communication principal to accomplish host interaction,content and service retrieve.

Fig.4.The instances of polymorphic routing model

● Service Identifier (SID),SIDis used in service-centric network.The addressing and routing based onSIDis a means which directly expresses communication intent.In this networking,SIDis the basic communication principal.

● Content Identifier (CID),CIDis used in content-centric network which focus on content retrieve.

Fig.5.The mechanism of polymorphic derivation.

Fig.6.The identifier format.

Table I.The allocation of the identifier space.

The polymorphic addressing is a specific addressing style by the specification of unified network packet format.The polymorphic routing model can derive multiple polymorphic addressing styles through unified packet format based on the aforementioned four addressing identifiers.The mechanism of polymorphic derivation is shown in figure 5.

The addresses adopt structured identifiers which include type and value based on unified base address format.We can produce multiple identifiers such asLID,HID,SIDandCIDby the definition of the type field.As shown in table 1,the allocation of the 128 bits identifier space references the IPv6 addresses which include type field and identifier value.The type field represents the type of the identifier,and the value field is the specific value of the identifier.To be compatible with IPv6,HID,SID,andCIDuse the optional field in IPv6.The type value ofHID,SIDandCIDare 010,011 and 100 respectively,and the other type values represent the identifier ofLID.The other preserving spaces can support the extension of new identifier types.The four identifier formats are shown in figure 6.

4.3 Principle

1) Principal and route announcement

The polymorphic routing protocols can be divided into two forms which include inter-domain polymorphic routing protocol and intra-domain polymorphic routing protocol.The polymorphic routing protocol dynamically establishes service paths which satisfy theRSDsof different applications.

Principal NotificationWhen hosts or servers join in the network,the hosts firstly announce the attributes and identifier names to the access routers.For the hosts,theHIDsof the hosts should announce to the network.For the service providers,theSIDsof the services should announce to the network.For the content providers,theCIDsof the contents should announce to the network.The access routers advertise the announced messages to the neighboring routers,and keep uniform topology information in inter-domain.All the routers compute the optimal routing path according to the polymorphic routing algorithms,and create routing table entries in corresponding routing tables.

The inter-domain routers compute and announce all the identifiers routes in the domain.The process of computing and announcing the routes is same forLID,HID,SIDandCID.The difference is that the topologies of different identifiers may be different.The header of the routing notification messages adopts routing identification field to distinguish the different the notification messages of different identifiers.Thus,the inter-domain routing calculations and notifications are more complex and frequent than traditional routing.When calculating the inter-domain routing path,we not only consider the path cost,but also consider the service requirements and security level.When announcing the route,the routers not only announce the link state information of itself,but also announce the service capability and security level which they can provide.The polymorphic routing model can dynamically build service path for different applications which have different service requirements and security levels according to the notified information.The intra-domain routers are responsible for exchanging and maintaining the intra-domain route information.To build a service path which meets the service requirement and security level in intra-domain,is to find intra-domain routers that satisfy the requirements.The maintenance and update of routing information are similar to the process in inter-domain.

2) Table entries maintenance

To support the coexistence of multiple principals,the routers need to support multiple Forwarding Information Bases (FIBs).The routers can be reconfigurable and scalable to support new principals.As shown in figure 7,the currently definedFIBsare as follows.

● Location Forwarding Information Base (LFIB)LFIBrecords the forwarding information reaching a specific location such as IP address.LFIBis equivalent to the traditional Forwarding Information Base.

● Host Forwarding Information Base (HFIB)HFIBrecords the forwarding information reaching a specific host.The size ofHFIBis equivalent to the number of hosts in network.

● Service Forwarding Information Base (SFIB)SFIBrecords the forwarding information reaching a specific service.The size ofSFIBis equivalent to the number of services in network.

● Content Forwarding Information Base (CFIB)CFIBrecords the forwarding information reaching a specific content.The size ofCFIBis equivalent to the number of contents which are published in network.

3) Instances

As shown in table 2,the polymorphic routing model can derive multiple addressing and routing patterns which support multiple principals based on unified addressing format.To detailedly explain the operation process of the addressing and routing,we take the service and host communication as examples to illustrate the polymorphic routing protocols in this subsection.

● service communication principal

The service communication refers to that the service requester send service request to service provider,and the service provider accomplishes service and get back the data to service requester.The process of communication is bidirectional.

Fig.7.The FIB structure.

Table II.Addressing and routing mode.

Fig.8.Service communication instance.

Fig.9.Host communication instance.

As shown in figure 8,hostHID1sends a service request packet,which encapsulatesSID1as destination identifier,andHID1as source identifier,to the network.The router R1 arbitrates ID based on ID type,and lookups forwarding information in SFIB.The intermediate routers implement the same operations and deliver the packet to the destination serverSID1.After receiving the service request,the server performs corresponding operation for the request,such as data compression and content search,and generates response packets which encapsulateHID1as destination identifier andSID1as source identifier.The router R5 arbitrates ID based on ID type,and lookups forwarding information inHFIB.The intermediate routers implement the same operations and deliver the packets to the destination hostHID1.Then,communication is terminated.

● Host communication principal

The host communication refers to the communication between host and host.The communication principal is host.The source host directly sends data to the destination host,and makes receiving data successfully as the communication purpose.The communication process is shown in figure 9.

As shown in figure 10,hostHID2sends a communication packet,which encapsulatesHID1as destination identifier,andHID2as source identifier,to the network.The router R1 arbitrates ID based on ID type,and lookups forwarding information inHFIB.The intermediate routers implement the same operations and deliver the packet to the destination hostHID1.After receiving the communication packet from hostHID2,the hostHID1generates response packets or data packets,which encapsulateHID2as destination identifier andHID1as source identifier,to the hostHID2.The communication process continues to communication terminated.

V.EXPERIMENT EVALUATION

In this section,we first evaluate the feasibility of supporting the dynamic polymorphic derivation and dynamic reconfigurable in the polymorphic routing model.Then,to evaluate polymorphic routing model support for future network requirements,we present three cases studies of supporting different communication principals that we implemented with the polymorphic routing model.

In evaluation of the feasibility,we evaluate the following three aspects of the polymorphic routing model:(i) Efficiency of polymorphic derivation:Can polymorphic routing model efficiently derive polymorphic routing protocols? We show that using reinforcement learning techniques,polymorphic routing model can efficiently derive polymorphic routing protocol which satisfies the requirement of applications.(ii) Delay of polymorphic derivation:How long time can polymorphic routing model accomplish the polymorphic derivation process? We show that the delay of polymorphic derivation is very short and acceptable for customizing service path.(iii) Dynamic adaptivity to changes:How does polymorphic routing model adapt to the changeable environment? We show that polymorphic routing

model can adapt well to change for the environment.

To simulate polymorphic derivation of polymorphic routing model,we randomly generated PD-MDP transition graphs.Each simulated PD-MDP has a signal initial state and a signal terminal state.The number of state nodes in a PD-MDP ranges between 10 and 50,and the number of optional next state for each state ranges between 1 and 4.The number of optional routing atomic capabilities in a state varies from 1 to 4.To better simulate the real state,we assign each routing atomic capability in a simulated PD-MDP graph with random reward values which follow normal distribution.To evaluate the dynamic adaptivity,we simulate the dynamic environment by varying the reward values of routing atomic capabilities periodically based on a certain frequency.In practical application,the reward values should be computed according to the reward function definition in section 4.

In the experiment,the Q-learning introduced in section 4 is applied to execute the simulated polymorphic derivation.Without special announcement,the discount factor g of the Q-learning is set to 0.8,the learning rate a is set to 0.2 and the greed factorεis set to 0.7.The number of the learning episode is set to 1000.Without loss of generality,each experiment is carried out ten times and makes the average values as results.Our experiments were performed on a 2.60GHz Intel Core4 PC with 4GB RAM.

5.1 Feasibility

1) Efficiency of Polymorphic derivation

Polymorphic routing model should achieve customizing routing protocols for different requirements of applications.In the experiments,we evaluate whether polymorphic routing model can derive polymorphic routing protocols by polymorphic derivation.We assume that the polymorphic routing model has given the knowledge about the RSD of the routing atomic capabilities,and let the Q-learner guide the polymorphic derivation to gradually reach the optimal policy.In this section,we conduct two sets of experiments to evaluate the efficiency of polymorphic derivation.

The first experiment is to evaluate the efficiency of polymorphic derivation with different learning rates.In the experiment,we fix the number of states of the simulated PDMAP to 10 and the number of optional next state for each state to 4.We evaluate the mean cumulative reward in different learning rates varying from 0.1 to 0.6.For each learning rate a,the learning was performed for 1000 episodes.Figure 10 shows the mean cumulative reward of each episode in different learning rate.From the figure,we can see that the mean cumulative rewards of allacan converge to near-optimal values.The results also show that a higher learning rate can accelerate the polymorphic derivation process.However,a higher learning rate is more likely to converge to a local optimality.

Fig.10.Efficiency of polymorphic derivation with different learning rates.

Fig.11.Efficiency of polymorphic derivation with different discount factors.

Fig.12.Convergence time with the number of states.

The second experiment is to evaluate the efficiency of polymorphic derivation with different discount factors.The discount factor controls the relative weighting of reward that happen sooner and those that happen later.The largera,the distant rewards are more important,and,typically,the more difficult the optimization problem.In the experiment,we fix the number of states of the simulated PD-MAP to 10 and the number of optional next state for each state to 4.Considering the tradeoff between speed and effectiveness,we set the learning rateato 0.2.We study the mean cumulative reward in different discount factors varying from o.4 to 0.8.For each discount factor g,the learning was performed for 1000 episodes.As shown in figure 11,the converged time varies with the discount factors.The larger g,the longer is the converged time.The results are consistent with the deductions of the theory.

2) Delay of Polymorphic derivation

The polymorphic derivation applies Q-learning to derive customized polymorphic routing protocols.In these experiments,we evaluate the delay of polymorphic derivation and compare with other methods such as value iteration and backward value iteration.In the first experiment,we fix the number of routing atomic capabilities of each state to 4 and vary the number of states in a PD-MDP graph from 10 to 40.We observe how fast the mean cumulative rewards converge to optimal value in three different methods.The results are shown in figure 12.From the figure,we can see that the convergence time increases polynomially with the number of states,and the convergence time of Q-learning is obviously lower than other two methods.In the same condition,because of the efficiency of Q-learning,the polymorphic derivation can be accomplished much faster.

In the second experiment,we fix the number of states to 10,and vary the number of alternative routing atomic capabilities in each state from 1 to 4.The results are shown in figure 13.From the figure,we can see that the convergence time of three methods almost linearly with the number of routing atomic capabilities.The results also show that the convergence time of Q-learning is excellent than other two methods because of the efficiency of Q-learning.

3) Dynamic adaptivity to changes

In the practical application,the environment is complex and changeable owing to the dynamically changeable resources.The polymorphic routing model should adapt to the changes of the environment.So,we also study the dynamic adaptivity of the polymorphic derivation.We implement two experiments to evaluate how well the polymorphic routing model adapts to the changes of the environment.

In the first experiment,we simulate the changes of the environment by changing the security and performance attributes of the routing atomic capabilities periodically.Support that the polymorphic routing model has initial knowledge about the routing atomic capabilities’ attributes,and let the attributes of the routing atomic capability change in three kinds of frequencies during the learning process.The frequencies are respectively set 1%,3% and 10% for every 100 episodes of learning.We observe how these changes would influence the effectiveness of polymorphic derivation.

The results show the growth of the mean cumulative reward during different changing frequencies in the figure 14.From the figure,we can see that by increasing the change rate we can delay polymorphic derivation to reach its optimal execution policy.However,the changes do not stop the optimization process.The execution policy is still being continually optimized when the learning goes on.

In the second experiment,we evaluate how initial knowledge of the routing atomic capabilities’ attributes can accelerate the polymorphic derivation process.We study the influence of initial knowledge in three cases.The first case is that all routing atomic capabilities’ attributes are known.The second case is that 50% of routing atomic capabilities’ attributes are known.The third case is that all routing atomic capabilities’ attributes are unknown.The initial method of Q-function is introduced in section 4.2 during the learning.The results are shown in figure 15.From the figure,we can see that the polymorphic derivation can be significantly speed by exploring the prior knowledge about the routing atomic capabilities.

5.2 Case studies

We have evaluated the feasibility of polymorphic routing model from the perspective of polymorphic derivation.In this section,we study the performance of polymorphic routing model for supporting applications by implementing and evaluating several application cases.While it is hard to quantitatively measure a model’s support for applications,we demonstrate that polymorphic routing model can retain many desirable features of the current Internet and subsume the benefits of other future networks.Through these studies,we show that polymorphic routing model can meet multiple communication requirements,accommodate network evolution and accelerate application performance.

Fig.13.Convergence time with the routing atomic capabilities in one state.

Fig.14.Influence of changes on polymorphic derivation.

Fig.16.The comparison of single content transfer delay.

To evaluate the performance of polymorphic routing model for supporting applications,we choose several typical applications such as content support,service migration and mobility support.To closely model realistic application behavior,we implement simple version of polymorphic routing model using Linux where support content and service principles,and we also built a simple simulation environment.The NSFNET which has 14 nodes and 21 links,is used as the simulation topology.Each link is set a bandwidth of 100Mbps,and the delay of link is 10ms.

1) Support content application

To evaluate the support for content applications,we study two scenarios:single content transfer delay and average network transfer delay.For the single transfer delay,we choose a node as content requester and a node as content provider.Then,we use tradition network routing which runs OSPF,and polymorphic routing model respectively to transmit content.Finally,we compare the single content transfer delay in the two contexts.For the average network transfer delay,we choose a node as content provider and other nodes as requesters.The arriving request datagram obeys Poisson process.The average sending rate is 10 datagrams in each second.The content provider stores 1000 contents.The chunk sequence is set to 1-1000,and the size of chunk is 1024 bytes.The probability distribution of content requests follows theZipfdistribution with parameter 10.Each simulation lasts 30s.Then,we respectively run tradition network routing and polymorphic routing model,and compare the average network transfer delay in the two contexts.

Figure 16 shows the single content transfer delay in two contexts.From the figure,we can see that the single content transfer delay in polymorphic routing model is obviously lower than the tradition network routing.In the tradition network,if we want to retrieve content,we have to request the web service where the content stores.So,the request contains two steps.The first step is to request web service.After the web service responding,we implement the second step to request content from the web.Conversely,the polymorphic routing model supports content principal,and we can directly request content based on content identifiers.This greatly reduces the delay of content request even though we have to resolute the identifier and locator mapping from the mapping system.

To accommodate evolution,the polymorphic routing model supports in-network cache and content communication principal.Figure 17 shows the average network transfer delay in two contexts.Form the figure,we can see that the average network transfer delay keeps straightly at 43.5ms in the traditional network.This is because network nodes do not support content cache and content communication principal in tradition network.Compared with the tradition network,the polymorphic routing model which uses its support cache and multiple principals,can reduce the average network transfer delay when there are lots of requests for same contents by 25%.

2) Support service application

As shown in the previous section,the polymorphic routing model supports service communication principal based onSID.The service migration is ever-present in service communication process.To evaluate our service migration support,we run service on virtual machine,which initially resides in a network node in the simulation topology,and later migrates to another network node.We apply existing virtual machine live migration technology to implement stateful migration of a running service.Figure 18 shows the timeline of service migration.The service requester makes continual requests to the service,who responds to these requests.When service migration begins,the service normally responds to the service request,but the underlying virtual machine starts copying its state to the new network node in the background.When most of states have been migrated,the service needs to be frozen to perform the final state transfer.From the figure,we can see that the frozen time lasts from 100ms to 392ms.In this period,service requests cannot be responded,and are cached to redirect to the new service node after migration over.After migrating to the new network node,the service can respond to the cached request packets,but new service requests are not responded.This is because thatSFIBshould be updated after service migration,and the service usesHFIBto respond to the cached requests.After updating theSFIB,service requester can communicate with the service provider based on updatedSFIB.

3) Support mobility

Fig.17.The comparison of average network transfer delay.

Fig.18.Service migration.

In this case,we study mobility support of the polymorphic routing model.The mobility support contains two situations:intra-domain and inter-domain.As shown in figure 19,the intra-domain handover process contains three steps.Firstly,new TR detects the attachment of the mobility node (MN).The detection delay isTr.Secondly,TR1 sends a update request to MS1 to register the mapping from MN’s ID onto TR1’s locator.The transfer delay of update request isTu.Finally,after receiving the update request,MS1 send a response message indicating that the mapping has been updated,and the transfer delay isTu.Therefore,the handover delay is calculated as follows:

Fig.19.Mobility handover process.

Fig.20.Mobility handover delay.

Now,we consider a handover of an MN from an old TR in a domain to a new TR in another domain.As shown in figure 19,the inter-domain handover process contains six steps.Firstly,the new TR detects the MN,and the detection delay isTr.Secondly,TR2 sends a registration request to MS1 to register the mapping from MN’s ID onto TR2’s locator.The transfer delay isTu.Thirdly,MS2 sends an update message to mapping system to update the mapping from MN’s ID onto MS2’ locator,and the update delay isTu.Fourthly,after updating the mapping,the mapping system sends update response messages to MS1 and MS2 respectively.The transfer delay isTu.Fifthly,MS1 sends an update message of the mapping from MN’s ID onto TR2’ locator to MS2,and the delay isTx.Finally,MS1 sends an update message to TR2 indicating that the update has accomplished.Therefore,the handover delay is calculated as follows:

In figure 20,we plot the handover delay whenTr= 30ms,Tx= 120ms,and Tu varies from 0 to 120ms.Notice that we setTr30ms since that the detection delay is fairly small in practice.We setTx120ms since theTxis the delay between two MSs in different domains and the average end to end delay in the current Internet is about 115ms.From figure 20,we observe that the intra-domain handover delay is between 30ms and 200ms whenTuvaries from 1 to 100ms.The intra-domain handover delay is fairly small.We also can see that the inter-domain handover delay is between 200ms and 600ms whenTuvaries from 1 to 100ms.Comparing with the handover delay of mobile IPv6,the delay is significantly smaller.

VI.CONCLUSION

In order to promote the capability of network transfer and routing service,we propose a new routing model,called polymorphic routing model which can dynamically derive customized routing protocols for diversified applications.The model splits complex routing function into its constituents and derives customized routing mechanisms by polymorphic derivation.To support efficient polymorphic derivation,we model the polymorphic derivation process as a MDP,apply reinforcement learning to solve an optimal or near-optimal derivation policy.The results from our experiments show that this approach is feasible and can derive customized routing protocols efficiently.Several cases studies also prove that the model enhances network flexibility,enables network innovation and has the potential to support and amalgamate diverse sets of ideas from other clean-slate designs.However,this being a first try to design future networks from the perspective of routing and addressing,the polymorphic routing model needs to be deeply studies and widely proved.These also are our next step works.The model presents an important step toward a flexible,customized and autonomic Internet of the future.

ACKNOWLEDGMENT

This work was supported in part by the Cernet Network (NGII20160103),National Natural Science Foundation of China,National Natural Science Foundation of China(No.61672471),Fundamental Research Funds for the He’nan Province University (No.17KYYWF0202),He’nan Province University science and technology innovation team(No.18IRTSTHN012) and Plan For Scientific Innovation Talent of Henan Province (No.184200510010) Zhengzhou University of Light Industry Doctoral Fund (2016BSJJ041) funding.