Ziyang XING ,Hui QI ,Xiaoqiang DI,3 ,Jinyao LIU ,Rui XU ,Jing CHEN,Ligang CONG
1Jilin Key Laboratory of Network and Information Security,Changchun 130022,China
2School of Computer Science and Technology,Changchun University of Science and Technology,Changchun 130022,China
3Information Center,Changchun University of Science and Technology,Changchun 130022,China
Abstract: With the reduction in manufacturing and launch costs of low Earth orbit satellites and the advantages of large coverage and high data transmission rates,satellites have become an important part of data transmission in air-ground networks.However,due to the factors such as geographical location and people’s living habits,the differences in user’demand for multimedia data will result in unbalanced network traffic,which may lead to network congestion and affect data transmission.In addition,in traditional satellite network transmission,the convergence of network information acquisition is slow and global network information cannot be collected in a fine-grained manner,which is not conducive to calculating optimal routes.The service quality requirements cannot be satisfied when multiple service requests are made.Based on the above,in this paper artificial intelligence technology is applied to the satellite network,and a software-defined network is used to obtain the global network information,perceive network traffic,develop comprehensive decisions online through reinforcement learning,and update the optimal routing strategy in real time.Simulation results show that the proposed reinforcement learning algorithm has good convergence performance and strong generalizability.Compared with traditional routing,the throughput is 8% higher,and the proposed method has load balancing characteristics.
Key words: Software-defined network (SDN);Quick user datagram protocol Internet connection (QUIC);
Low Earth orbit (LEO) provides uninterrupted data transmission to users in various Earth environments through full coverage.It has the advantages of wide coverage,low cost,and flexible deployment,and can be widely used (Jia et al.,2020).The current representative LEO satellite networks are Motorola’s iridium satellite mobile communication network,LQSS’s Globalstar mobile communication satellite system,and Musk’s satellite network,providing mainly SpaceX,OneWeb,etc.As shown in Fig.1,ground and the iridium constellation can be used by ground and air users to communicate with other users without relying on the ground network,but the distribution of the Earth’s population is not uniform.Most of the world population is distributed in easily habitable areas,and the population in harsh areas (oceans,deserts,and mountains) is small.Users in different geographical locations have significant data-demand differences,which leads to unbalanced network traffic,network congestion,and even data transmission failure.In addition,the topology of LEO satellites changes dynamically,the constellation is large,and the overall perspective is difficult.It cannot obtain real-time traffic in a finegrained manner (Zhang et al.,2020),which affects optimal routing decisions and ultimately transmission performance.
Fig.1 Low Earth orbit satellite and multi-user communication architecture
In addition,the service demands of different users in the LEO satellite network are different.Most of the traditional routing algorithms are based on the minimum number of hops,as shown in Fig.2,and do not distinguish the type of service demand.Based exclusively on source Internet protocol (IP)and destination IP,selecting one path causes the data streams of all service demands to be mixed together;that is,the paths of all service demands between the source IP and the destination IP are basically the same,which increases the burden of some paths and easily causes network congestion.The unselected path is always in an idle state,and the link utilization rate is low.In summary,it is necessary to obtain the network state in real time and dynamically generate the optimal route according to the real-time service demand and network state,to ensure the service quality of users with different needs.
Fig.2 Multi-service demand routing: (a) different service demands are mixed together;(b)multiple services are demand-oriented (IP: Internet protocol)
Software-defined network (SDN) separates the control plane from the data plane,realizes function virtualization through programming,and can customize the functions to be implemented.The controller can manage the whole network in a unified manner,and the device hardware can perceive the global information.Using the multipath quick user datagram protocol Internet connection (MPQUIC)packet header extension to distinguish different services,deep reinforcement learning generates reward values through continuous interaction with the environment,makes dynamic decisions,and can achieve multi-service demand routing.In this study,SDNs and deep reinforcement learning are used to solve the above problems.
The main contributions of this paper are as follows:
1.A sketch-based network traffic analysis method based on an SDN has been proposed to realize fine-grained operations,such as classification of traffic and estimation of large or small flows.
2.We propose an artificial intelligence multiservice driven and traffic-aware routing scheme to convert different service demands and network states into reinforcement learning reward functions to achieve real-time dynamic routing.
3.To verify the multipath routing of the LEO satellite network based on service demand and traffic awareness,the iridium constellation is designed with a dynamic topology.After a large number of simulations,the throughput is improved by 8%.
The essence of multipath routing in LEO satellite networks is to select a suitable path from the source host to the destination host via the constellation,to allocate substreams according to the strategy,and to transmit data through multiple paths.Therefore,the overall transmission performance can be improved only when the selected path cooperates with the transmission subflow.Based on these issues,in this paper we present the research from the following aspects:
The routing in the current satellite network does not support serving demand for multiple services at the same time.When there are multiple service requests,the transport layer protocol needs to be modified to meet the demands of multi-service characteristics,and the service demands cannot be adapted.The current routing research is divided mainly into two categories.One routing research category concerns allocating a reasonable number of subflows to avoid the subflow collisions that form bottleneck links.For example,Gao et al.(2019)proposed a method for low bandwidth utilization of multipath transmission control protocol (MPTCP)links.QSMPS,an SDN multipath routing optimization mechanism aimed at improving quality of service (QoS),adjusts the number of subflows according to service demands,which can significantly improve throughput.SDNs have been standardized by IETF(https://www.rfc-editor.org/info/rfc7426).Many scholars have studied software-defined satellite networks using software-defined abstract network hardware devices,which are no longer constrained by traditional TCP/IP network physical devices or business logic,and eliminate the hardware architecture for the entire network transmission.Through programming,SDN can realize the basic functions and customize the network according to the actual service demand.Liu ZG et al.(2020) proposed a software-defined information-centric satellite network (SDICSN) based on an SDN and an information-centric network (ICN),which aimed at the problems of low routing efficiency and complex satellite network control process.The network architecture,based on the periodicity and predictability of the satellite network,reduced the time complexity of the routing algorithm.Li et al.(2022) proposed the joint routing and task placement (JRTP) algorithm to optimize the number of transmission paths in an integrated air-ground network.Through the task topology graph model,the route and task placement were jointly determined,and the route problem was transformed into the classic shortest path problem.The second routing research category focuses on whether the allocation data on the selected path are overloaded and cause congestion.For example,Wang et al.(2021)applied SDN to obtain the global state of the network and used traffic prediction based on the autoregressive moving average model.The method classifies the data of the real-time distribution network nodes of the link,improves the throughput,realizes the adaptation of the bandwidth required by different services to the distribution path,and avoids network congestion.In summary,the current research on SDNs in satellite network routing still focuses on traditional performance optimization,cannot cope with multi-service demands,lacks intelligent routing,and cannot obtain the best route in real time when the load changes significantly(unbalanced resource distribution).Due to high dynamics,limited bandwidth,and slow convergence of LEO satellite networks,traditional routing optimization algorithms need to rely on specific network environments and states,and their generalization capabilities are weak.
The multipath transmission protocols include MPTCP and MPQUIC.MPQUIC can implement more refined data flow operations (Rabitsch et al.,2018).The QUIC (https://www.rfceditor.org/info/rfc9000)advantages include 0-RTT,forward error correction,flexible congestion control,and connection migration.The unique connection ID in the connection mechanism is more suitable for satellite networks.Many scholars have studied the transmission performance of QUIC in 5G networks(Mogensen et al.,2019)and satellite networks(Yang WJ et al.,2021).Yang SY et al.(2018) analyzed the performance of QUIC in satellite networks and compared the performances of QUIC and TCP in LEO and geosynchronous Earth orbit (GEO).The experimental results showed that QUIC has better transmission performance than traditional TCP by virtue of the 0-RT handshake advantage.Arfeen and Uddin (2020) applied QUIC to the air-sea network and compared the page-loading speed for different browsers.The experimental results showed that in most cases,the throughput of QUIC is better than those of TCP and SPDY.Shi et al.(2021)designed a QUIC-based MPDTP system according to the characteristics of satellite networks.MPDTP determines whether to send redundant data packets according to the time when the acknowledgement (ACK) returns data packets.Kuhn et al.(2020) summarized the advantages of QUIC in satellite network communication from the perspective of network operators.The new QUIC protocol can address many challenges in LEO satellite networks and has advantages in many aspects,such as periodic on-offof satellite links,bandwidth saving,multipath routing,and secure transmission(Bujari et al.,2020).Liu D et al.(2022) proposed a deep learning routing algorithm to predict a network topology to improve the QoS of satellite paths and to address the inability of traditional air-ground network routing schemes to meet the needs of heterogeneous services.Han et al.(2020),aiming at the high dynamics affecting the routing problem in satellite networks,proposed a routing algorithm of deep reinforcement learning to obtain a subset of available routes,with lower routing cost and better anti-jamming performance.In addition,some scholars have applied reinforcement learning to save energy.Liu JH et al.(2021)found that in a giant constellation,the incorrect use of satellite batteries in the routing phase may increase energy consumption and quickly lead to node failure.A new deep reinforcement learning based high-efficiency and energy-saving routing protocol,DRL-ER,avoids the battery energy imbalance of constellations and can extend the lifespan of giant constellations.
In summary,some research results have been achieved on multipath routing in the current LEO satellite network.However,due to the periodic on-offof the constellation topology and unbalanced traffic distribution,the controller cannot realize the finegrained operation of the traffic,and the demand adaptability is also insufficient and cannot achieve multipath routing well,which affects the multipath transmission performance of MPQUIC.In this study,an SDN controller is used to obtain the state of the entire network,and a sketch is used to calculate the size of the data stream.The MPQUIC data packet header extension field distinguishes different services to deal with the above drawbacks.
The system architecture is shown in Fig.3,consisting of an SDN controller,some switches supporting OpenFlow,a client,and a server.The transport layer uses the MPQUIC protocol,the SDN controller manages the switches through the OpenFlow protocol,and all switches install count-min sketch(CMS).
Fig.3 System architecture (a client sends a request to the remote server,and the server will push data to a client with multiple paths by the request)
The SDN controller collects the basic information of each switch node (network topology,remaining bandwidth,etc.),deploys a demand-driven and traffic-aware routing algorithm based on deep reinforcement learning,and uses CMS to obtain the MPQUIC flow size.
When the client sends a request to the server,it distinguishes specific services by setting different service demand flags(multi-service flags in Fig.3).The multi-service flags are stored in the extension field of the QUIC packet header.To achieve service-oriented transmission,when the client initiates a request to the server,the user can set three service demands on the client–low latency,high bandwidth,and unlimited service demand–which are marked asD,B,O,respectively.When the user’s client multi-service flag is set toD,it means that the current service request requires low-latency transmission;when the multi-service flag is set toB,it means that the current service request requires high-bandwidth transmission;when the multi-service demand flag isOor the flag is not set,it means that there is no demand for bandwidth or delay,and round-robin transmission is used.
The structure of the QUIC data packet header is shown in Fig.4,where [Header Extensions] is a 16-bit extension used to store multipleServicesFlag.ConnectionID is the MPQUIC connection identifier,a globally unique 64-bit number generated by the client to identify the connection.SourceIP and destinationIP are the source IP and destination IP of this connection,respectively.FlowID is the number of data flows,used to identify different data streams in the same connection.PacketNumber is the data packet number,used to identify the number of data packets transmitted in a data stream.
Fig.4 Location of the multi-service flag in QUIC packets
The above MPQUIC packet header fields are in a public and unencrypted state,and can be parsed and extracted by the controller to be used as input parameters for online learning of routing decisions(https://datatracker.ietf.org/doc/draft-xingalto-sdn-controller-aware-mptcp-mpquic/).
Knowing the size of the network data flow is extremely important for multipath routing.Thecurrent SDN controller link layer discovery protocol (LLDP) is not fine-grained enough to obtain network state information.The method of querying the number of packets through the REST API(https://floodlight.atlassian.net/wiki/spaces/floodli ghtcontroller/pages/1343539/Floodlight+REST+A PI) to obtain network state information requires a large number of flow table entries(Huo et al.,2022),and it is impossible to obtain network state information in larger-scale satellite networks.In the process of multipath transmission,the path is selected according to the transmission path bandwidth,roundtrip delay,packet loss rate,and attributes of the transmission data stream (elephant stream,mouse stream,etc.).A scientific routing algorithm should try to estimate the transmission performance of the path and the size of the data waiting to be transmitted to avoid network congestion or path load imbalance caused by the mutual influence of elephant flow and mouse flow.Commonly used data flow measurement methods include machine learning,sketch,and so on.Because the data flow needs to be sensed in real time,the memory needs to occupy less storage space,to have low complexity and high reading speed,and to guarantee a highly accurate data flow rate measurement method.
A sketch is a hash-type data structure (Tang et al.,2019) that is widely used in network traffic measurement.It has the advantages of requiring less memory space,quick deployment,and highly accurate measurement results.The principle of the sketch’s measurement of data flow is shown in Fig.5.The columndconsists ofwcounters(Ya et al.,2021).When the data flow arrives,the data packet ID is added to the corresponding position of the counter in the sketch after the hash functionh(i) operation,from which the size of the data flow is obtained.Sketch is a method for accurately determining the frequency of an element in a set of data sets.It uses a multi-dimensional array counter.When an element is added,the hash function is used to calculate its position in the counter and update the count.The disadvantage is that different elements will collide due to the same value being obtained from the hash calculation.When we want to query the occurrence frequency of an element,we need only to use the hash calculation to find its corresponding counter.Because of possible conflicts,there is a certain error in the sketch method of counting the occurrence frequency,and the result is higher than the true value.The sketch statistical method is accurate and fast,requires less space,and has been widely used in data flow statistics,especially in statistics with large data volumes.These search methods (such as HashMap,binary search tree,and binary sort tree) are limited by the memory size,so they cannot complete data flow statistics well in satellite networks.
Fig.5 Sketch data flow measurement principle
Liu LT et al.(2021)designed an FO-Sketch with hierarchical storage to address the difficulty of global traffic statistics and classification in cloud environments.Murua and Reviriego(2020)applied CMS to detect elephant flow and mouse flow.Because the LEO satellite network also has mixed flows such as elephant flow and mouse flow,we design a flow measurement framework (Fig.6) based on OpenSketch(Yu et al.,2013) and deploy CMS in all switches.Before the measurement,the controller sets up the switches as needed,and the switches implement the statistics of the data flow according to the algorithm and report the results to the controller.
Fig.6 Software-defined network (SDN) application sketch flow measurement framework
The whole process of calculating the data flow size in a sketch is shown in Fig.7.It consists of flow table entries,CMS,flow quotient filter,and so on.Flow table entries store the basic structure of the data flow,CMS counts the number of data packets in a data stream,and the flow quotient filter stores the statistical results of CMS,wherein the CMS steps for implementing data packet statistics are as follows:
Fig.7 Principle of calculating the MPQUIC data flow size in a sketch
Suppose that the structure of CMS is a twodimensional array CSC[x,y].Its initial value is 0,xis set to the row value,andyis set to the column value.The statistical function counter is counter[1,1],counter[1,2],···,counter[x,y],and the hash(element)function is calculated for mapping,in which there arexfunctions,namely,hash(element)1,hash(element)2,···,hash(element)x.The construction method of the hash(element) function is as follows:
whereiandjare randomly generated integers in the range [0,x+y],“element” is the element to be calculated,pis a hash constant,and“mod” is a modulo operation.It can be seen from the above that the result of hash(element)rowis mapped in [1,x].
When a new element(rowtime,valuetime) arrives at the row/value time,after calculation by hash(element)row,the statistical function counter is updated to
When the element is updated,the query method is as follows:
The streaming quotient filter,similar to the bloom filter,can quickly retrieve the number of elements in a large-scale database.The fewer the hash functions used,the shorter the query time.When a certain element needs to be counted,its quotient and remainder are calculated and stored in the specified slot.When other new remainders are encountered in the future,it is necessary to compare only the size with the value stored in the slot.
When we want to query the size of the MPQUIC flow,we read only the results saved in the flow quotient filter according to the connectionID,flowID,and packetNumber.
To verify the performance of the sketch flow measurement in this study,the controller method of collecting the flow table through the REST API by the controller in the SDN is compared with the method of OpenSketch.As shown in Fig.8a,theXaxis is the number of switches,and theYaxis is the transmitted data.Because the controller REST API method of saving traffic needs to occupy many flow table entries,as the number of switches increases,the ability to count the number of data packets per unit time is reduced.In the sketch method,the performance of processing data packets is stable,and there is no performance degradation.When the number of switches reaches 300,the performance gap between these two methods is the largest.As shown in Fig.8b,theXaxis is the number of flows,which means the size of the data stream to be measured,and theYaxis is the runtime of the measurement scheme.The controller REST API method is limited by computational complexity and space,and runtime is much higher than that of the sketch method.The runtime of the sketch method is stable and is less affected by the size of the data flow.
Fig.8 Comparison of the performance of non-sketch and sketch methods in SDN:(a)transmitted data;(b)runtime
In summary,the SDN sketch flow measurement in this study can realize tasks such as traffic statistics and classification in the LEO satellite network.
The routing decision online learning module is the core of the entire architecture.As shown in Fig.9,routing decision online learning uses deep reinforcement learning to generate real-time optimal routing according to the network state information collected in the network state collection module.First,the agent extracts features through the convolution layer of the neural network according to the previously obtained state,and then maps them into the probability of a certain action by the fully connected layer.To reuse the previous important experience,an experience playback mechanism is introduced.This mechanism can avoid related sequences in the samples that affect the training.Finally,the transmission path is output according to Algorithm 1,and the flow rules are sent to the switch in the SDN.
Fig.9 Deep deterministic policy gradient procedure for routing decisions: (a) online learning of routing decisions;(b) actor network and critic network
The process of achieving the optimal routing is a continuous control problem.Different from the finite discrete actions,the continuous action space is a continuous set.The commonly used algorithm for dealing with discrete actions is deep Q-network(DQN),which can outputndimensions ofnactions,but DQN is not suitable for dealing with continuous control spaces,in which DQN is solved with a policy network.The principle of the discrete problem is shown in Fig.10.
Fig.10 Deep Q-network(DQN)and policy network to solve discrete control problems: (a) DQN for discrete action space;(b) policy network for discrete action space
As shown in Fig.11,the deep deterministic policy gradient (DDPG) consists of a policy network (called an actor) and a value network (called a critic).The actor generates actionaaccording to states,a=π(s;θ).The value network scores the actiona,denoted asq(s,a;w).The value network has two inputs: one is the statesand the other is the actiona.The output is for a certain action evaluation,and the better the action,the greater the output value.DDPGs consider the advantages of DQN self-learning.
Fig.11 Principle of the deep deterministic policy gradient
State: Define statescomposed of the following parts: connectionID,flowID,packetNumber,sourceIP,destinationIP,multipleServicesFlag,and remainingBandwidth:
1.connectonID is the connection number obtained when the controller parses the MPQUIC packet header,and is the unique number of a connection.
2.flowID is the data flow number obtained when the controller parses the MPQUIC data packet header,and is the unique number of the data flow in a connection.
3.packetNumber is the data packet number obtained when the controller parses the MPQUIC data packet header,and is the unique number of the data packet in a data stream in a connection.
4.sourceIP and destinationIP are source IP and destination IP,respectively.
5.multipleServicesFlag is a service demand flag set by the user on the client side.
6.remainingBandwidth is the current remaining bandwidth,obtained by the controller by collecting the CMS on the switch and storing it in the streaming quotient filter.
Action: Define actionaas a set of selected paths,namely,pathList={p1,p2,...,pn}.The pathList is generated by the online learning module in the controller,and the flow rules are added to all switches in the network to achieve optimal routing.
Reward: Define the rewardRas the feedback after an action is executed,andRxis the value of the[Header Extensions]field of the MPQUIC packet header parsed by the controller,which is the value of the multi-service flag.
The meaning of each row in Algorithm 1 is as follows:
In lines 1–2,when the service demand is low delayD,a route is generated based on the shortest path.
In lines 28–30,when the service demand isOor the flag is not set (the flag length is zero),a roundrobin route is generated.
There are four neural networks in routing decision deep reinforcement learning,namely:
1.Actor network,which is responsible for updating the policy network and selecting an actionaaccording to states.
2.Critic network,which is responsible for updating the value network and calculating the currentQvalue.
3.Target actor network,which randomly samples one element from experience replay,chooses the next states′,and chooses its actiona′.
4.Target critic network,which is responsible for calculating the targetQvalue.
Among them,the objective function is soft update;that is,the targetQnetwork is slowly updated,and its formula is
whereτis a relatively small number,ωis a parameter of the critical network,andω−is the value ofωafter updated.
Algorithm 1 is an offline strategy algorithm.The inputs are the current topology and network state information,and the output is the selected path set.The decision is finally made to perform the action that is conducive to load balancing and has the best performance.
The iridium constellation is set in the STK software (https://stksteakhouse.com/),the satellite orbit parameters are exported in Table 1,the network topology is configured in IPMininet(https://ipmininet.readthedocs.io/en/latest/) according to the parameters,and the MPQUIC client and server are interconnected through the iridium constellation.Parameters such as transmission delay and throughput are used to evaluate the performance of the algorithm.
Table 1 Iridium constellation parameters
Computer: IntelⓇCoreTMi5 12400F CPU@2.50 GHz with six processors,16 GB memory.Operating system: Ubuntu 22.04.
MPQUIC version: QUIC-go v0.22.0(https://github.com/lucas-clemente/quic-go).
Controller:Floodlight v1.2(https://github.com/floodlight/floodlight) is deployed in the GEO.The operation period is equal to the rotation period of the Earth,it is stationary relative to the ground station,and the satellite state information in LEO can be obtained.
Analysis tool: Wireshark v3.6.3(https://www.wireshark.org) is used for network protocol analysis,routing analysis,and packet analysis.
Network traffic simulation tool: Manimahi(http://mahimahi.mit.edu) is used to simulate network parameters in simulations and can record and playback operations.
The simulation ground stations are located in Jiamusi,China,in the Northern Hemisphere,and Mandurah,Australia,in the Southern Hemisphere.
According to the operation law of the iridium constellation,the satellites will be switched every 1 min.When the satellites are running,there may be a phenomenon that one corresponds to multiple satellites and produces a continuous number of satellites.This work ignores the satellite that is too far away from the satellite and discusses only the one-toone corresponding satellites.After the above analysis,the distance matrixD′is converted into a visible matrixD.
When there is a link between satellites,it is recorded as 1;otherwise,it is recorded as 0.Dis the satellite visible matrix composed of 0 and 1 after being sorted byD′.The pseudocode of the dynamic switching link of each satellite in the iridium constellation is shown in Algorithm 2.
Under the dynamic network with periodic connect-disconnect in Section 4.1,the network throughput,server CPU utilization,algorithm convergence,load-bandwidth utilization,and load balancing are compared among the scheme in this study,minimum-hop routing (Chen et al.,2022),roundrobin routing,and other schemes.To decrease the influence of other factors on the test results,the first 100 results of the test are averaged,and then every 10 consecutive results are averaged into one value and included in the statistics.
Without the scheme in this study,the default path selected by the LEO satellite network is based on the minimum number of hops,as shown in Fig.12a,and other paths are idle.After using the scheme in this study,multiple paths are allocated,as shown in Fig.12b,which improves the link utilization and realizes parallel transmission of multiple paths.
Fig.12 Transmission path in the iridium constellation: (a) only one path is selected by the others’routing;(b) two paths are selected with the proposed method
4.2.1 Throughput
Throughput is the ratio of the actual amount of transmission to the unit time,and is an important indicator of network transmission performance.For example,in Eq.(10),the higher the throughput,the stronger the transmission capacity:
The LEO satellite network is dynamic and timevarying,and the link will be disconnected and connected during the period.As shown in Fig.13,the performance of the traditional routing algorithm is not good at 60,120,and 180 s.The throughput decreases due to link switching,while the algorithm in this study has the ability to perceive traffic and is less affected by dynamic network link switching.The algorithm in this study uses traffic perception and obtains state information of the overall network to comprehensively select the optimal route,which can avoid the influence of network dynamics and organize throughput.The throughput of the algorithm in this study is obviously higher by 8% than that of the minimum-hop algorithm in the transmission process.
Fig.13 Throughput of several schemes under a dynamic low Earth orbit network (References to color refer to the online version of this figure)
The path selected by the minimum-hop algorithm works well in a limited time.When the link is switched,the hop number needs to be recalculated,which will cause different subflows to converge,increase the burden of some paths,cause congestion,and reduce throughput (the red line in Fig.13).Round-robin routing has the same drawbacks as minimum-hop routing.Round-robin routing has no obvious advantages in dynamic networks and will also cause congestion and reduce throughput.Both minimum-hop routing and round-robin routing fluctuate due to link switching.
To further study the impact of the dynamic network formed by link switching on multipath routing,the HSR-CC algorithm(Xu and Ai,2021)was compared in the simulations.This algorithm uses DQN to deal with the data transmission in the air-ground network.The transmission protocol is MPTCP.Fig.13 shows that HSR-CC can deal with performance degradation of link switching in dynamic networks,but MPTCP is prone to header blocking and affects transmission,and the overall throughput is not as good as that of the proposed method.
For the impact of network traffic awareness on the entire transmission,in the simulations the throughput of multiple clients is discussed with traffic awareness enabled or disenabled,as shown in Fig.14.When traffic perception is not used,the selection of the optimal path is affected by unbalanced traffic,resulting in low throughput.The HSR-CC algorithm cannot perceive network traffic,and the throughput decreases sharply with the increase in the number of clients.The traffic perception is relatively stable,the throughput does not drop sharply,and the traffic is allocated according to the idle traffic of the path,which can avoid congestion.
Fig.14 Throughput of several schemes under a dynamic network
4.2.2 CPU utilization
The computing and storage resources in the satellite network are limited.Excessive consumption of computing resources may cause program suspension and affect network transmission.The MPQUIC server CPU deals mainly with encryption,user datagram protocol (UDP) data packet sending and receiving,and MPQUIC state maintaining (Langley et al.,2017).In the simulations,the CPU utilization within 100 s is analyzed,as shown in Fig.15.The CPU utilization of the proposed method is relatively stable,and after 80 s,it tends to be stable and is suitable for use in LEO satellite networks.
Fig.15 CPU utilization
4.2.3 Convergence
To verify the convergence of the algorithm,the proposed scheme,DQN(Wu et al.,2021;Oroojlooyjadid et al.,2022),and the HSR-CC algorithm are compared in different training steps and normalized costs,as shown in Fig.16.Fig.16 shows that the proposed algorithm can accelerate the convergence and significantly reduce the training time.The proposed method is the best one in convergence.
4.2.4 Load-bandwidth utilization
Load-bandwidth utilization is an important indicator for measuring whether the network is congested during transmission(https://www.cisco.com/c/en/us/support/docs/ip/simple-network-management-protocol-snmp/8141-calculate-bandwidth-snmp.html).It is the ratio of the sending bandwidth to actual bandwidth.The higher the load-bandwidth utilization,the lighter the network congestion.The calculation method is as follows:
where TotalByte is the size of the transmission data stream,Bandwidthsis the stream sending bandwidth,and Bandwidthois the actual bandwidth occupied.As shown in Fig.17,theXaxis represents load intensity.The larger the value,the higher the load.TheYaxis is the load-bandwidth utilization value.When the network load is not high at first,the load-bandwidth utilization of several schemes is 98%.When the load increases,the load-bandwidth utilization begins to decrease.The route with the minimum-hop number decreases the fastest,indicating that the network is congested.The proposed method supports traffic awareness and can dynamically generate the optimal route.By comparing the start time of network congestion,the proposed method has the highest bandwidth utilization.
Fig.17 Load-bandwidth utilization comparison
4.2.5 Load balancing
In the multipath transmission of the LEO satellite network,the links are complex,and the goal of load balancing is to make the traffic of the selected links as equal as possible to avoid congestion caused by overloading some paths.As shown in Fig.18,theXaxis is time,theYaxis is the path number selected at a certain time,and the colors in the heatmap are from light to dark,indicating that the flow value in the path is from small to large.The path traffic increases steadily without major fluctuations,indicating that the proposed method has stable transmission.The maximum traffic difference between these two paths selected by the algorithm within 240 s is approximately 1 Mb,indicating that the proposed method has achieved load balancing.
Fig.18 Traffic distribution of inter-satellite links(References to color refer to the online version of this figure)
This paper uses an SDN controller to obtain the low Earth orbit satellite network state information,the MPQUIC packet header extension field to distinguish different service demands,and CMS to measure the size of data streams (live,video,or others),and proposes a deep reinforcement learning multipath routing algorithm to dynamically generate real-time routing decisions.It can solve the problem of network congestion caused by unbalanced network traffic due to the significant differences in data demands of low Earth orbit satellites.Through the STK construction of low Earth orbit satellite network simulations,it is known that this algorithm has high convergence speed and high throughput (the throughput is 8% higher),and achieves load balancing.In future work,we will continue to study the routing characteristics of SDNbased information-centric networks in low Earth orbit satellite networks,and to study other efficient algorithms for deep reinforcement learning to solve routing problems.
Contributors
Ziyang XING and Xiaoqiang DI designed the research.Hui QI processed the data.Ziyang XING drafted the paper.Jinyao LIU and Rui XU helped organize the paper.Jing CHEN and Ligang CONG revised the paper.Ziyang XING and Xiaoqiang DI finalized the paper.
Compliance with ethics guidelines
Ziyang XING,Hui QI,Xiaoqiang DI,Jinyao LIU,Rui XU,Jing CHEN,and Ligang CONG declare that they have no conflict of interest.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Frontiers of Information Technology & Electronic Engineering2023年6期