A VLIW Architecture Stream Cryptographic Processor for Information Security

2019-07-08 02:01LongmeiNanXuanYangXiaoyangZengWeiLiYiranDuZibinDaiLinChen
China Communications 2019年6期

Longmei Nan*,Xuan Yang,Xiaoyang Zeng,Wei Li,Yiran Du,Zibin Dai,Lin Chen

1 ASIC & System State Key Laboratory of Fudan University,Shanghai 201203,China

2 Institute of Information Science and Technology,Zhengzhou 450001,China

3 Jiangnan Institute of Computing Technology,WuXi 214083,China

Abstract: As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they could not achieve a better tradeoff between high speed processing and high flexibility.ASIC has fast processing speed,but its flexibility is poor,GPP has high flexibility,but the processing speed is slow,FPGA has high flexibility and processing speed,but the resource utilization is very low.This paper studies a stream cryptographic processor which can efficiently and flexibly implement a variety of stream cipher algorithms.By analyzing the structure model,processing characteristics and storage characteristics of stream ciphers,a reconfigurable stream cryptographic processor with special instructions based on VLIW is presented,which has separate/cluster storage structure and is oriented to stream cipher operations.The proposed instruction structure can effectively support stream cipher processing with multiple data bit widths,parallelism among stream cipher processing with different data bit widths,and parallelism among branch control and stream cipher processing with high instruction level parallelism; the designed separate/clustered special bit registers and general register heaps,key register heaps can satisfy cryptographic requirements.So the proposed processor not only flexibly accomplishes the combination of multiple basic stream cipher operations to finish stream cipher algorithms.It has been implemented with 0.18µm CMOS technology,the test results show that the frequency can reach 200MHz,and power consumption is 310mw.Ten kinds of stream ciphers were realized in the processor.The key stream generation throughput of Grain-80,W7,MICKEY,ACHTERBAHN and Shrink algorithm is 100Mbps,66.67Mbps,66.67Mbps,50 Mbps and 800Mbps,respectively.The test result shows that the processor presented can achieve good tradeoff between high performance and flexibility of stream ciphers.

Keywords: stream cipher; VLIW architecture processor; reconfigurable; application-specificinstruction-set

I.INTRODUCTION

Stream cipher algorithms,as the critical technology for data encryption in information security,have been widely used due to the simple realization,high speed in encryption or decryption and the characteristic of small error propagation in cipher transmission,such as A5 algorithm in GSM communications in Europe,E0 algorithm in Bluetooth communication etc.So their efficient and flexible implementation is very important.

The traditional implementation of stream cipher algorithms is mainly based on universal processor,universal programmable logic device,and application speci fic integrated circuit(ASIC).

A) An universal processor based implementation means by an universal processor including a Digital Signal Processor (DSP)and General Purpose Processor (GPP).A specific stream cipher algorithm is implemented by writing software programs.This method has high processing flexibility and can implement any stream cipher algorithm,but the performance is very low because of the lack of special instruction and recon figurable hardware implementation for the stream cipher algorithm,especially for the stream cipher algorithm facing bit level operation,the processing performance is lower.

B) The universal programmable logic device,represented by Field Programmable Gate Array (FPGA),has a large number of interconnected resources and rich logic units,and can implement any or several stream cipher algorithms based on data stream drive.It has high cryptographic processing performance and flexibility,and is widely used in the domain of stream cipher processing.However,because the FPGA logic unit has a little bit width and too rich interconnect resources,there are many problems,such as mismatching of processing granularity,low utilization of resources,complex layout and wiring,large configuration information,long con figuration time and large power consumption in the implementation of stream cipher algorithms.

C) A special ASIC-based implementation means that the developer designs a special chip to implement the processing task for a speci fic stream cipher algorithm.This method can provide the highest security processing performance,but once the chip is completed,its function cannot be modified,cannot be developed two times,there is poor flexibility,not easy to expand and other problems.In addition,the development process of dedicated cryptographic chips is complex.It has many design links,and has problems of long development cycle and high development cost.

In a word,no matter which method discussed above,it is very difficult to gain high flexibility and high efficiency at the same time.To solve this problem,the application specificinstruction processor (ASIP) is used,including Block Crypt-ASIP1 in [1],Block Crypt-ASIP2 in [2],Cryptonite in [3],CryptoManiac in [4],and TVLSI'10 in [5].ASIPs can accomplish complex combination of multiple cryptographic operations based on specificinstruction set.However,the function of basic execution unit in ASIPs is fixed.For instance,the execution unit in the Cryptonite is only aimed at DES algorithm,the Crypt-ASIP just focuses on block algorithms.However,the operation modes and parameters in stream ciphers are quite different from which in block ciphers,so the ASIPs lack flexibility with respect to stream algorithms and parameters switching.The special instruction research for the bit level of the stream cipher algorithm have gradually expanded,some progress has been made,but the contradiction between the performance of stream cipher processing and the flexibility is not fundamentally solved.This is the original purpose of our study.

This thesis compromised the advantages of the ASIP,reconfigurable computing,operational characteristic of stream ciphers and very long instruction word (VLIW) architecture,deeply researched on the cryptographic architecture,the structure of register file,application specific instruction-set for stream cipher process,configurable hardware units which are available to support different operation modes and parameters of stream algorithms.

This paper studies a stream cryptographic processor which can efficiently and flexibly implement a variety of stream cipher algorithms.

II.PROCESSING CHARACTERISTICS FOR STREAM CIPHERS AND VLIW ARCHITECTURE

2.1 Analysis of processing characteristics for stream ciphers

The structure processing models of stream ciphers are complex and varied,but the basic model can be gained after analyzing different kinds of stream cryptographic operations and processing characteristic.As is shown in figure 1,the basic model of stream ciphers contains initial input transformation,middle processing transformation and final output transformation.In stream ciphers,the initial input transformation executes one time.The middle processing transformation executes multiple times to accomplish key stream bits generation.The structure of each middle processing transformation is the same,and includes the feedback computing operation,states updating operation and internal controlling signals computing operation of each Feedback Shift Register (FSR);non-Linear Boolean Functions (NLBF) operation,arithmetic operation,logic operation,bit extract operation,Look-up-table substitution operation etc.of key stream bits generation.Through several rounds operations,the result is fed to the final output transformation in order to accomplish the serial-to-parallel conversion of key stream.Furthermore,the middle processing transformation function could be divided into multiple branch operations,and some operations can be implemented in parallel.For example,the internal controlling signals computing operation,the feedback computing operation and bit extract operation can be implemented in parallel,then,the states update of FSRs and NLBF functions can be implemented in parallel.

As shown in Table1,13 kinds of stream ciphers have been analyzed.It can be found that most of the data processed by stream ciphers are unsigned integers,and there are no floating-point or other data types used in stream algorithms.In general,the data widths of stream operations are diversiform,especially FSR operations shown in table 1,but they can be classified into four kinds haply: within 32 bits,within 64 bits,within l28 bits,within 256 bits.Among them,256 bits,128 bits,64 bits and 32 bits are mainly for FSR operation processing,NLBF operation processing,extraction operation processing; 32 bits below is mainly for dynamic extraction operation processing.

Furthermore,the parallel characteristicsof stream cipher operations have also been analyzed in order to advance the efficiency of stream ciphers processing.The parallelism of stream cipher processing is mainly to analyze the parallel operation in the running process of the algorithm,and consider the problem of data processing bit width horizontally.According to the analysis,there are several parallel ways in stream cipher processing.

TableI.Analysis of operation granularity in common stream cipher algorithms.

Fig.1.The basic operation model of stream ciphers.

A) Parallel update of multiple FSRs: For stream cipher algorithms,the main operation units are mostly inseparable from FSRs,and most algorithms have multiple FSRs.Generally,multiple FSRs are required to update simultaneously in the current state.

B) Parallel computation of FSR secondary state information and key stream generation:For the basic structured stream cipher algorithms,the update of the FSRs and the computation of the key stream generation are all functions of several state bits in the current time FSRs state sequence,so they can be executed simultaneously.

C) Parallel computation of clocks/structure control signals: For clock-controlled structure and variable structure stream cipher algorithm,the clock-controlled control signal or structure control signal is a function of several state bits in the state sequence of one or more FSRs at a certain time,so it can be solved in parallel.

D) Parallel computation of clocks/structure control signals and key stream generation:For clock-controlled and variable-structured stream cipher algorithms,both the clock-controlled/structured control signal and the key stream generating function are functions of the relevant state bits in the state sequence of the FSRs in the current time,and can be calculated in parallel.

Fig.2.The structure of a VLIW processor.

The data processing granularity and operation parallelism analyzed above will guide the design of following stream cipher processor.

2.2 Analysis of characteristics for VLIW architecture

For a typical VLIW processor shown in figure 2,a compiler combines several instructions that can be executed in parallel into a single very-long instruction word.After the instruction word is fetched,each decoding unit decodes the corresponding instruction segment separately,thus providing control signals for each processing unit,register heap,access unit to control.Data access,processing and register transfer operations,and ultimately achieve multiple operations in parallel.The structure has the characteristics of simple hardware decoding unit and control logic while achieving higher instruction level parallelism.The main reason is that the instructions that make up the very-long instruction word have been determined to have neither data dependence nor control dependence when compiler combines them.The instructions are independent and orthogonal to each other.Therefore,the decoding circuit is simple and can provide a single control flow.

VLIW instruction words,also known as instruction bundles,consist of a feature field(also known as instruction templates) and multiple independent instruction slots (also known as instruction segments),as shown in figure 3.Feature fields are used to represent the structure of instruction bundles,indicate the position of each instruction slot in the instruction bundle and its execution mode,and identify different instruction bundle structures by assigning different values to the feature fields.The instruction slot corresponds to the basic processing instructions,indicating and controlling the corresponding functional units to complete the corresponding operation.If this operation is called an operator,then each instruction slot in the instruction structure points to a set of operators.For a specificinstruction,each instruction slot only needs to be able to work in parallel with the corresponding operators in its own set of operators and those in other instruction slots.

After analyzing the data processing granularity and parallelism of stream ciphers processing in detail,it is concluded that stream ciphers processing has a large granularity of data processing,a large development space of parallelism and a relatively simple control logic,which conforms to the VLIW instruction system's larger data processing bit width,multi-channel parallel structure,convenient instruction decoding and centralized control logic.And so on.It can be seen that VLIW instruction system is very suitable for stream ciphers processing,so this paper determines the VLIW instruction system and its structure as the design foothold.

Considering parallel implementation for stream ciphers,and the data processing granularity (within 32 bits,within 64 bits,within l28 bits,within 256 bits) of stream ciphers,the architecture of a VLIW processor is proposed,which adopts 4 operation solts and one control solt,and each data-path width is 64 bits,so it can finish one 64-bit stream cipher operation by a solt,one 128-bit stream cipher operation by two solts,one 256-bit stream cipher operation by four solts,and parallel stream cipher operations between different data widths processing,which will be introduced emphatically in the following chapters.

III.THE ARCHITECTURE OF STREAM CIPHER PROCESSOR

3.1 The stream diagram of processor

Fig.3.VLIW processor instruction bundle.

Fig.4.The architecture of reconfigurable VLIW stream cipher processor.

Through analysis in the above section,a reconfigurable VLIW processor aimed at stream ciphers processing is proposed in figure 4.It consists of four main clusters,the stream ciphers instruction fetch unit,the stream ciphers control unit,the stream ciphers cluster execution unit and the stream ciphers register files.In the stream ciphers instruction fetch unit,the instruction set designed is divided into five slots,of which four slots are designed for stream cryptographic arithmetic operations and one slot is designed for branch control.The instruction fetch unit can issue four operation instructions with one branch control instruction in one cycle.That is to say the four stream cipher operations and the branch control operation can be implemented in parallel.

The VLIW processor designed for stream ciphers processing includes four cluster execution units: Bank1-Bank4.Each bank aimed at one 64-bit stream cipher operation,so one 128-bit stream cipher operation could occupy two banks,and one 256-bit stream cipher operation could occupy four banks.Each bank includes multiple reconfigurable stream ciphers operation units and general operation units,such as Linear Feedback Shifter Register operation (LFSR); non-Linear Feedback Shifter Register operation (NLFSR); extract and insert operation (EXINT); non-Linear Boolean Functions operation (NLBF); rotary shift,logic shift,immediate variable shift,register variable shift,constant shift and random shift (SHIFT); three variable logic operation (TVL); two variable modular operation(MASMN); multiplication in a Galois field(MGF),Look-up-table substitution (SBOX)etc.Furthermore,each operation supports different processing modes.For example,LFSR instructions designed can sustain basal mode,serial-inputted mode,clock-controlled mode and different operation data widths.In order to increase the performance of LFSR instructions,parallelism is also developed to support multistep update during 1 to 8 at one time.

It is obvious that the execution of instructions needs corresponding hardware circuit,so this paper adopted reconfigurable technology to design stream cipher operation hardware units.Each reconfigurable hardware unit can be reconfigured as different operation modes under the control of configuration data.The configuration data includes the length and taps of FSR,extract and insert position,SBOX table,polynomial in Galois field and so on.

The control unit is designed to finish different stream ciphers operations according to user application.It also contains an operand isolation unit in order to decrease the power dissipation,because the input data of executing hardware unit comes from register files,while the input data of other reconfigurable stream cipher units is zero.

3.2 The storage structure of the stream cipher processor

A) Analysis of storage characteristics for stream ciphers

Based on VLIW architecture,in order to support the parallel processing ability of instructions,it is necessary to increase the number of functional units to achieve parallel computing,and set up a large capacity register heap to achieve data storage and exchange.By analyzing the process of stream ciphers processing,the data that occupies the storage resources mainly include the input initial key,the initial IV vector,the state sequence of the FSRs,the intermediate operation result,the memory unit,the constants needed in the calculation process and the key stream calculated.According to the way of use,these data can be divided into the following two categories.a) Input / output / intermediate result data: This kind of data mainly includes FSRs state sequence,input initial key,initial IV vector,constant,intermediate operation results and so on.The state memory register of FSRs occupies only one flip-flop for each bitbased level,so the register occupied by FSRs mainly depends on the number and series of the FSRs contained in the algorithm.b) Special memory unit stores data: In the design of sequence cipher algorithm,special bit registers are often used to implement memory logic,which will greatly enhance the performance of the algorithm in resisting linear and differential analysis.Memory logic refers to the existence of some functions in the algorithm.The output values of these functions not only affect the current state transformation,but also need to be stored to affect the next beat and many subsequent beats.That is,the dedicated bit-level memory unit acts before the feedback function,such as two 2-bit registers in the E0 algorithm; there are also dedicated bit registers in the key stream generation process after the feedback function,such as the contraction unit in the DECIM algorithm,and the algorithm needs to be updated in real time during the process of running.

There are great differences in the usage and characteristics of these two types of data,and the storage capacity requirements are also very different.Therefore,the stream cipher processor designs a separate and independent general register heap for large bit-width data processing and a special bit register for dedicated memory cell logic.Furthermore,considering the scalability of the block cipher implementation,the sub key register stack is also added.

B) The storage structure of the stream cipher processor

Through analyzing the storage characteristics of stream ciphers,this thesis puts forward a separate cluster register file scheme shown in figure 5.The general data register file is designed to store initial key,initial vector and temporary results.The structure of general data register file is different with the register file in general processors.In order to support multi-operation processing in parallel,the parallel read-write operation of general data register for stream ciphers should be considered.In this paper,the general register file is divided into four clusters.Each cluster contains three reading ports and two writing ports,as shown in figure 5.Each cluster of the proposed stream ciphers processor can process three source and two destination operations,such as LFSR operations (two source store 64-bit current states of one LFSR,one source stores 32-bit serial input data,and two destination store 64-bit next states of one LFSR).Furthermore,the read-write frequency of general register file is very critical to the processor,so the storage capacity of general register files of this stream ciphers processor is designed 32×64 bits.

Then,the special bit register file is designed to store cascade information of NLBF operation,internal feedback information of NLFSR operation,clock control information of clock control mode LFSR operation.The number of the information is not very large,but the flexibility is very high,so the special bit register file is designed as a whole with 1×8 bits storage.It can process 8-bit source and 8-bit destination operation by each cluster,and can be shared by each cluster.

Finally,considering the extension function to support more types of cryptographic algorithms such as block ciphers,a sub-key data register file is designed for the round key reading and writing operation in the round arithmetic operation.The structure is also divided into four clusters,with four reading and four writing ports.Each cluster has two source operands from general register file and one source operand from key register file,the capacity of key register file is also 32×64 bits,owing to the characteristic of block cipher round key storage shown in figure 5.

In a word,this thesis researched the storage structure and access principles of general data register file,sub-key data register file and special bit register file.All of these register files decrease the delay time effectively to load or store data supporting the parallel instruction architecture.

Fig.5.The separate cluster storage structure of stream cipher processor.

3.3 The speci fic instructionset system of the stream cipher processor

The instruction set of the stream ciphers processor proposed includes three kinds: configuring instructions,stream cipher operation processing instructions,and control instructions.Configuring instructions are designed to reconfigure the operations and parameters of stream ciphers,such as FSRs length,taps position,NLBF variables position,extract and insert position,SBOX table,finite field polynomial and so on.Control instructions are designed to implement the operations of program jump,sub-program call and return.The stream cipher operation processing instructions are designed to implement the cryptographic operation of stream ciphers,and can be summarized in three categories: word processing instructions,double-word processing instructions and four-word processing instructions.Each word processing instruction can accomplish 64-bit stream cipher operation,each double-word processing instruction can accomplish 128-bit stream cipher operation,and each four-word processing instruction can accomplish 256-bit stream cipher operation.

Fig.6.The VLIW instruction format of stream cipher processor.

In order to enhance speed,the parallel design is also proposed through exploiting instruction-level parallelism by bundling stream cipher processing instructions and control instruction to one long instruction word based on VLIW system,as shown in figure 6.The long instruction word designed can sustain ten parallel processing modes by bundling processing instructions and control instructions as follow: four 64-bit operations in parallel; four 64-bit operations with one control in parallel;two 128-bit operations in parallel; two 128-bit operations with one control in parallel; one 256-bit operation; one 256-bit operation with one control in parallel; two 64-bit operations with one 128-bit operation in parallel; two 64-bit operations,one 128-bit operation with one control in parallel; one 192-bit operation with one 64-bit operation in parallel; one 192-bit operation,one 64-bit operation with one control in parallel.

This means adopting hardware/software cooperation and depending on compiler to compose/assemble instructions without data relativity,so low complexity,low resource and high speed can be attained.

IV.INSTRUCTION MAPPING AND ANALYSIS OF STREAM CIPHER PROCESSOR

4.1 The instruction mapping for stream cipher algorithms

A) The instruction mapping for W7 algorithm

The W7 algorithm includes three parts:three LFSRs with length of 38,43 and 47 respectively; clock control signals for each LFSR to update; the calculation of key stream.Figure 7 shows the W7 mapping in our processor,with the utilization of four clusters hardware resources.a) The processor designed uses 256-bit extract instruction to gain all of the tap variables to calculate the clock control signals for each LFSR updating and key stream.b) The processor designed uses four 64-bit NLBF instructions in parallel to calculate the clock control signals for each LFSR updating and key stream with the tap variables having been gained.c) The processor designed uses three 64-bit LFSR instructions in parallel to implement the update of each LFSR under the clock control signals having been gained.So it can be seen unambiguously that the number of instructions to produce 1-bit key stream for W7 is 3.

Actually,W7 algorithm is used as an example to map in this paper.The other algorithms can be mapped with similar methods.

B) The instruction mapping for Grain-80 algorithm

The Grain-80 algorithm is composed of LFSR,NLFSR and key stream generating function.The lengths of LFSR and NLFSR are both 80.The Grain-80 mapping in our processor is shown in figure 8.a) The processor designed uses 256-bit NLBF instruction to calculate key stream generating function,whose variables are in the current states of LFSR and NLFSR.b) The processor designed uses 128-bit NLFSR instruction and 128-bit LFSR instruction in parallel to implement the update of LFSR and NLFSR at the same time.

So it can be seen unambiguously that the number of instructions to produce 1-bit key stream for Grain-80 algorithm is 2.

C) The instruction mapping for MICKEY algorithm

The MICKEY algorithm also includes three parts: two NLFSRs (NLFSR_R,NLFSR_S) with length of 100 respectively; structure control signals for each NLFSR to update; the calculation of key stream.Figure 9 shows the MICKEY mapping in our processor.a) The processor designed uses three 64-bit NLBF instructions in parallel to calculate the structure control signals for each NLFSR and key stream generating function.b) The processor designed uses one 256-bit NLFSR instruction to implement the update of NLFSR_R.c) The processor designed uses one 256-bit NLFSR instruction to implement the update of NLFSR_S.So it can be seen unambiguously that the number of instructions to produce 1-bit key stream for MICKEY algorithm is 3.

D) The instruction mapping for Shrink algorithm

The Shrink algorithm includes two parts:two LFSRs with length of 128 respectively;the extract of key stream.Figure 10 shows the Shrink mapping in our processor.a) The processor designed uses one 64-bit to 8-bit dynamic extraction instruction to finish 8-bit key stream extraction.b) The processor designed uses two 128-bit LFSR instructions in parallel to implement the 8-bit update of each LFSR.So it can be seen unambiguously that the number of instructions to produce 8-bit key stream for Shrink algorithm is 2.

Fig.7.W7 algorithm mapping result in the stream cipher processor.

4.2 The comparison of instruction numbers with other processors

In order to contrast with other works,this paper adopts IA32 and ARM processors to implement stream ciphers too.

Actually,IA32 and ARM general processors used in this paper are both based on RISC(Reduced Instruction Set Computer) architecture,and MIPS Instruction Set.

Fig.9.MICKEY algorithm mapping result in the stream cipher processor.

Furthermore,whether i7,i8,or Xeon processors,are using more advanced manufacturing processes to enhance the system's main frequency,while using multi-core and multi-threading to further improve processing performance.The stream cryptographic processor proposed in this paper can also be implemented in a more advanced process to further improve the frequency,can also be extended to multi-core and multi-threaded mode of work,but the process conditions,implementation environment is limited.

In addition,unlike block ciphers,stream ciphers work in a one-time-one-cipher manner.The key stream generation rate is more concerned instead of encryption or decryption speed.Once the initial key,the initial IV vector and other initial data are given,the key flow generation process basically does not require the participation of external data,and is generated sequentially.It is difficult to process in pipeline or single instruction multi-data mode,which is the main difference between stream cipher and block cipher.Block algorithm can be accelerated by many methods,but it is not feasible for stream cipher algorithm,which is determined by the processing characteristics of steam cryptography algorithm.

Therefore,for high performance processors i7,i8 and so on,multi-core and multi-threading have little effect on the performance except for the high main frequency caused by advanced technology.In order to have a relatively fair performance comparison,the paper adopts a similar evaluation method with Li Wei's Block Cryptographic Processor.

The result comparison with our processor is shown in Table2.Whether in the initialization process or in the normal work process,it can be gained evidently that the instruction numbers of our processor have been decreased significantly.For example,the instruction numbers of normal work process for Grain-80 in IA32 processor is 126.However,the instruction numbers of normal work process for Grain-80 in our processor is only 2.

The root cause of this difference lies in: A)High efficient special instructions for stream cipher are proposed.For example,in order to finish A5-2 initialization process,IA32 processor takes about 3448 instructions,but our processor takes only 11 instructions,because FSR operation instructions that support serial input and multi-step updates are proposed in our processor.In order to finish ACHTERBAHN algorithm,IA32 processor takes about 1442 instructions but our processor only needs 4 instructions,because NLBF operation in-structions are proposed in our processor.B)Based on the structure of VLIW,the instruction level parallelism is developed too,which supports the requirement of parallel processing of stream cipher.The instruction can support different parallel data processing,and can support the data processing with control branch in parallel.C) The design of a separate and cluster register stack effectively reduces the access delay of the register.

Fig.10.Shrink algorithm mapping result in the stream cipher processor.

V.RESULT AND PERFORMANCE ANALYSIS

5.1 Analysis of performance

Based on the analysis mentioned above,a prototype test chip designed for the processor has been fabricated under 0.18um 1.8V,six-metal-layers CMOS technology.The information concerning the chip is summarized in Table3.The key stream generation performance is shown in Table4.The key stream generation throughput of Grain-80,W7,MICKEY,ACHTERBAHN and Shrink algorithm is 100Mbps,66.67Mbps,66.67Mbps,50 Mbps and 800Mbps,respectively.

5.2 Comparison of performance

As the structure of stream ciphers is diverse,the operation of stream ciphers varies,and there is no unified processing model,it is extremely difficult to exploit unified special units and instructions.Few literatures researched on stream cipher special instructions systematically,they all researched on block algorithmssuch as [1-6],or how to enhance performance of a single stream algorithm or some fixed stream algorithms such as [7-9],or how to enhance performance of a single stream operation or some fixed stream operations such as[10-12].

TableII.The instruction comparison for each stream cipher processing.

TableIII.The stream cipher processor chip information.

TableIV.The key stream generation throughput of our processor.

Therefore,this paper compares the performance of the stream cipher special instruction processor designed with the general processor way and the ASIC way.The detailed key generation rates of Grain-80,W7,MICKEY,and Shrink algorithms by Intel Pentium4,ASIC and this paper are shown in figure 11.The executive performance of typical stream algorithms by universal processor such as Intel Pentium4,without stream cipher special accelerative instructions and circuits is very low,but the speed of our design is much higher apparently.The executive performance of typical stream algorithms by ASIC can gain high speed,but having little agility.

In detail,the Grain-80 algorithm is a typical basic structural stream algorithm.Although the frequency of the universal processor is high,the key stream generation rate of Grain-80 is limited due to the large number of instructions consumed.The processor in this paper is designed with the special instructions systematically,so its performance can reach nearly 6 times higher than that of the general purpose processor,under the premise of ensuring flexibility.The W7 algorithm is a typical clock controlled structural stream algorithm,the key stream generation rate of it is only 6.58Mpbs for the general processor,because of a large number instructions consumed by the clock control signal variables extracting and calculating.Our design supports 4 NLBFs within 64 bits in parallel operation,and the clock control calculation and key control signal generation in the premise of extraction operation,can be completed by one VLIW instructions.So the W7 algorithm only needs 3 instructions,and its performance can reach nearly 10 times higher than that of the general purpose processor,under the premise of ensuring flexibility.The Shrink algorithm is a typical contraction and autogenous shrinkage algorithm.Because the LFSR operation instructions designed in this paper support 8-step update in parallel,and the extract instructions designed can eff iciently complete contraction function,so only two instructions can produce 8-bit key,vastly improving the shrinkage and self-shrinkage algorithm efficiency,compared with general processor performance by up to 30 times more.The rest of the stream algorithms are analyzed in the same way and no longer described here.

Taking into account the time and experimental platform,there is no detailed test of power consumption in the article,but this can be made for a brief estimate.In this paper,the design method,power consumption is greater than the ASIC mode,because of the requirement of flexibility and the increase of appropriate resources.On the other hand,the method designed in this paper consumes less power than universal im-plementation,because the resource consumption is less,furthermore,there is no complex wiring and mutual connection.

Fig.11.The comparison of typical stream ciphers key generation rates.

VI.CONCLUSIONS

In this paper,a reconfigurable stream cryptographic processor for information security is presented,which can achieve tradeoff between high-speed processing and flexibility.By analyzing the operation modes,data length,and processing characteristic of stream ciphers,a reconfigurable VLIW architecture for stream ciphers is presented.By analyzing the basic operations and storage characteristics of stream ciphers,a separate and cluster register stack is proposed,reconfigurable cryptographic hardware units are designed,and application-specificinstruction set for stream ciphers is presented.The critical frequency of our processor is 200MHz,the power consumption of our processor is 310mw.The key stream generation throughput of Grain-80,W7,MICKEY,ACHTERBAHN and Shrink algorithm is 100Mbps,66.67Mbps,66.67Mbps,50 Mbps and 800Mbps,respectively.The comparison result has demonstrated that our stream processor could achieve good tradeoff between high performance and flexibility.