Zhao Zhenning Fan Lihan Chan Xiao
(Nanjing R&D Institute, ZTE Corporation, Nanjing 210012, China)
Abstra c t:Designing an Ethernet sw itch that can assure normal interaction of p rotocol packets between sw itches in a network environment ofmassive traffic is an im portantmatter.Taking the L3 Ethernet sw itch based on App lication Specific Integ rated Circuit(ASIC)as an examp le,this artic le analyzes several typ ical issues about packet receiving and send ing by the CPU in a multi-p rog ress environment,inc luding CPU load,software and hardware queue settings,and comm unication m echanism between CPU and the switch chip.This artic le gives solutions to these issues m entioned above.The solutions are app licab le to Network Processor(NP)issues as well.
I n availab le Layer3(L3)Ethernetsw itch devices,the Layer 2(L2)sw itch and L3 routing ofmessage packets are imp lementedmainly by the switch chip and Network
Processor(NP).CPU then is notinvolved in the sw itching and routing p rocess.It functions asmanaging and controlling the sw itch chip[1].
In such a case,the CPU load ismainly from the regular p rotocold river,user configuration d riverand externalevent d river.
Among those d rivers,the externaleventd river is themost random and unp redictab le one.The typ icalexternalevents inc lude the Up/Down state changing ofa port,reporting Med ia Access Control(MAC)add ressmessages(inc lud ing learning,aging,transferring and more)to upper-layerdevices,and CPU receiving and sending packets via DirectMemory Access(DMA).Among the above listed externalevents,the p rocessing after the CPU received packets through DMA is themost comp licated.When data packets are transferred from the bottom layer to the upper layer software,d ifferentp rotocolsw ill have d ifferentp rocessing ac tions.These ac tionsmay be associated w ith packetsend ing,portmanipulation,multi-tab le manipulation and more.
Therefore,solutions to the issues aboutCPU's send ing and receiving packets help guarantee the normalinteraction of upper layer p rotocols,and according lymake the L3 Ethernet sw itch running stab ly.
The analyses ofseven issues aboutpacketsend ing and receiving by the CPU here,are allbased on the typicalpacket send ing and receivingmechanism of the CPU:queues are made at the CPU port;packet receiving is fulfilled through DMA;and the ring-type queue is used.
The packet-receiving rhythm of the CPU is setbased on the packetquantity sent to the CPU perunit time.And this packet quantity is determ ined by the packetp rocessing capability of a sw itch.
Figure 1 p resents two typ icalp rocessing manners for this issue:CPU fetches packets from DMA ata constant rate;CPU fetches packets from DMA in a burstmanner.The upper lim itof quantity ofpackets sent to CPU perunit time is supposed to be x per second in the figure.
(1)CPU's Fetching Packets ata ConstantRate
When CPU fetches packets from DMA ata constant rate,there is little im pacton CPU queue.Moreover,CPU queue is not required to have a high buffering capability;so,itis unnecessary for the queue to be long.
(2)CPU's Fetching Packets in BurstManner
The hardware receiving queue atone side of the ASIC-based sw itch chip,togetherw ith the ring-type queue in DMAmemory,p rovides certain buffering capability to the sw itch(against the packets thatCPU fetches from the DMA).This buffering capability can be used to extend the controlperiod,to set the controlg ranularity(i.e.,the upper lim itofpackets that CPU fetches in a unitcontrolperiod),and to use amechanism,which is sim ilar to degenerative feedback in circuits,of dynam ically enab le and c lose the CUP's function of packet-fetching.In this way,themacro controlofCPU's packet-fetching rate is realized.Moreover,if the ASIC-based sw itch chip supports the flow monitoring orshap ing func tions based on the Token BucketAlgorithm on the eg ress d irection of the CPU[2-3],and if them inimum thresholds ofmonitoring and shaping satisfy the rate lim itof the CPU,this capability can also be used to controlthe rhythm of the CPU's packet-fetching.This helps reduce the CPU load.As a result,the software p rocessing ismuch sim p lified.
▲Figure 1. Two ways to control CPU's packet-receiving rate.
Ifonly the buffering capability of the CPU portis taken into consideration,a longer CPU portqueue is p referred.However,the impactof the leng th on other functions and performances should also be taken into account.Ford ifferentASIC chips,the length p lanning of the CPU portqueue varies.
During the p rocessing p rocedure ofa data packet,the CPU does notcopy the entire packet,butuses pointers as parameters.This way is called zero copy,which g reatly im p roves the p rocessing efficiency of the CPU.
However,the app lication ofzero copy deg rades the software p rocessing flexibility to a certain extent.A p rob lem accord ing ly emerges:ifp rotocolstack requires changing the contentofa packet,itw illbe done at the receiving buffer directly,butif it needs deleting oradd ing certain fields(such as add ing or deleting a layerof tag)in the packet,that is,if the packet length has to be changed,then itis a p rob lem.
Adding ordeleting some fields w illcause the position change ofeithera packetheaderora packet tailer.If the position ofa packet tailer changes,it's required that the total leng th of the packetdoes notexceed the bufferboundary;so,the p rob lem is a sim p le one.However,the field add ing and deleting usually occurs near to the header,so the position change of the header ismore efficient,and the p rotocolstack is more inc lined tomove the position of the header.In this case,the CPU is required to make certain p rocessing when distributing buffer.The p rocess is as follows:
(1)When receiving a packet,the header pointer cannotpoint to the bufferboundary;rather,itneeds tomove backwards to allow a certainmargin.In add ition,a sing le buffer should have enough space for the Maximum Transm ission Unit(MTU)and themargin mentioned.
(2)When releasing a packet,the buffer initialpointerneeds to do normalized p rocessing(as shown in Figure 2).
Currently,externalinterrup ts of the sw itch is mainly generated from the sw itch chip.The majorexternalinterrup ts inc lude DMA manipulation(such as receiving packets,end ing ofpacketsend ing and new add ress message)and some errormessages.If interrup t requests are toofrequent,the context shiftbetween Interrup tService Routine(ISR)and otherp rocesses w illoccupy large amount ofCPU time.If there are continuouslymassive interrup t requests,the CPU w illbe in a busy state.Various p rotocols don'thave sufficient scheduling time and as a result,some serious faults,such as p rotocolstatemachine timeout,w illoccur.
Polling mechanism can be adop ted to avoid events trigger w ith uncontrollab le frequency.The regularoperation is to use the CPU timer to trigger ISR thatis originally triggered by an externalinterrup t.Since the intervalbetween timer triggers is fixed,the execution frequency of ISR is controlled.
Compared to the externalinterrup t,polling makes rhythm controllab le.The rhythm ofexternalinterrup ts lies on the frequency ofexternalevents;thus,the CPU cannotcontrolit.
However,polling has its own inevitab le defect—slow in response,which cannotmeet the requirements ofsome functions thatdemand higher real-time support.In addition,it is found thatwhen using Ping command to test large packets in L3 interface of the sw itch,the time delay of the sw itch app lying the pollingmode is obviously larger than thatof the sw itch using the interrup tmode.
Ifa certainmechanism is ab le to avoid continuous and massive interrup t requests,the CPU w illnotbe too busy,and the real-time p rocessing advantage of the interrup tmode can be reserved as well.
CPU's receiving packets and fetching MAC add ress message are typicalbehaviors thatw illgeneratemassive interrup tevents.Taking packet-receiving as an exam p le,the burstmode in the p revious part,“CPU Load and Packet-receiving Rhythm Control,”controls DMA sw itch in accordance w ith real-time traffic,which enab les interrup t sources to be controlled.This type ofmechanism sim ilar to degenerative feedback is ab le to avoid continuous interrup t events sent to the CPU.
▲Figure 2. Normalized processing of a packet.
Generally,polling is controllab le but lacks real-time capability;interrup tis good in real-time capability butd ifficultin interrup tsource control.Therefore,in the initialphase forsystem designing,specific requirements and the chip's p rocessing manner to externalevents should be taken into account,to determ ine whichmode should be adop ted—interrup t,polling,orboth.
The regularexternalevents(interrup tevents)happen in packet receiving and sending of the CPU,inc lud ing receiving MAC add ressmessage and com p leting MAC tab lemanipulation.
Putting various interrup tevents p rocessing in one p rocess factitiously boosts up the coup ling ofeach eventand increases mutualrestric tion ofd ifferentevents.
In order tomake the eventp rocessing flexib le and to reduce themutualrestric tion ofevents in themulti-task operating system,events would initiate theirown p rocesses,or they would be c lassified into severalp rocesses in accordance to their p rocessing manners.
For ASIC-based sw itch,specific p rotocolpackets require a p riority when being sent to the CPU via DMA queue;therefore,certainmechanisms of the ASIC chip are used to assign the p rotocolpackets to designated portqueues ensuring their p rioritization.This is called p rotocolpacketp rotec tion.CPU p rotection avoids unnecessary data packets taxing the CPU as much as possib le.
The two necessary cond itions for realizing p rotocolpacket p rotection are as follows:
(1)The CPU portmustsupportStrictPriority(SP)or Weighted Round-Robin(WRR)scheduling algorithm.
(2)The switch chip needs to have a strong flow c lassification capability and need to be ab le to designate differentport queues to different flows.
In system scheme design,the follow ingmeasures can be used tofulfillboth p rotocolpacketp rotection and CPU p rotection:
(1)Packet-receiving and packet-sending paths should be as c learas possib le.
(2)The Access Control List(ACL)of the ASIC chip should be fully used to exactlymatch different types ofp rotocolpackets,and thematching of Layer4 field is required[4].
Moreover,other functions of the CPU and performance of the sw itch restrict the im p lementation ofp rotocolpackets p rotection and CPU p rotec tion.
Inmulti-task operating system,various events need to be hand led w ithin the shortestpossib le time to ensure thatother tasks have adequate chances forscheduling.
So,itis necessary to consider the execution efficiency ofany algorithm used.Besides the algorithms them selves,frequent access to certain hardware also takesmuch time and affects the execution efficiency.However,this is always likely to be neg lec ted in realp rac tice.
With the developmentofEthernet technology,the p rocessing capability of the sw itch chip and network p rocessorhas been significantly imp roved;in contrast,the p rocessing performance of the CPU ofdata sw itch devices is lagging behind.Moreover,the increasing service types thatdata sw itch devices support have higher requirements forservice traffic hand led by the CPU.In this situation,the lim ited CPU resource becomes a bottleneck to the developmentof Ethernetsw itch.Therefore,itis a p rerequisite to the safe and stab le operation of the sw itch equipment that the CPU,the sw itch chip and the network p rocessor interface conductgood bufferingmanagement,queue scheduling and traffic monitoring to ensure p roperuse of CPU resources.CPU's send ing and receiving packets is one of the key sub jec ts ofcurrentand future research and developmentofdata sw itch devices.