Symbol Detection Based on Temporal Convolutional Network in Optical Communications

2022-02-16 05:51YingzheLuoJianhaoHu
China Communications 2022年1期

Yingzhe Luo,Jianhao Hu

Key Laboratory of Science and Technology on Communication,University of Electronic and Science Technology of China,Chengdu 611731,China

Abstract: Deep learning (DL) is one of the fastest developing areas in artificial intelligence, it has been recently gained studies and application in computer vision,automatic driving,automatic speech recognition,and communication.This paper uses the DL method to design a symbol detection algorithm in receiver for optical communication systems.The proposed DL based method is implemented by a non-causal temporal convolutional network(ncTCN),which is a convolutional neural network and appropriate for sequence processing.Meanwhile,we adopt three methods to realize the training process for multiple signal-to-noise ratios of the AWGN channel.Furthermore, we apply two nonlinear activation functions for the noise robustness to the proposed ncTCN.Without losing generality, we apply the ncTCN-based receiver to the 16-ary quadrature amplitude modulation optical communication system in the simulation experiment.According to the experiment results,the proposed method can obtain some bit error rate performance gain compared to some conventional receivers.

Keywords: deep learning; optical communicaitons;quadrature amplitude modulation;symbol detection

I.INTRODUCTION

Neural network (NN) has been a hot topic in recent years and receiving extensive attention in computer version, automatic speech recognition, natural language processing,etc.Owing to the powerful learning ability,the NN has been gaining more and more studies and application of NN in communication, including network layer and physical layer.In the network layer,the NN has been applied in the network slice for 5G core and 5G radio access network in real-time[1],the data center interconnection to improve the accuracy of fault location in the scenario of massive alarm sets [2], the burst traffic prediction and scheduling to solve the traffic problem caused by diverse and heterogeneous user demand and adapt to the complex structure of hybrid electrical/optical switch data center network[3],and the traffic grooming problem in the elastic optical networks to groom services with minimizing energy assumption [4].In the physical layer, the NN has been applied in communication systems for signal processing,such as channel coding[5],channel estimation [6], equalization [7], demodulation [8, 9],channel decoding[7,10],and even the entire communication link [11-16].Those methods utilize neural networks to strive to optimize the performance of the communication system.As the results of wide and in-depth studies from academic and industrial areas,many high efficient platforms have been provided for the neural networks, including tensor processing unit of Google, vision processing unit of Intel, and neural network processing unit of Huawei.Thus,the high efficient hardware designs for the neural networks can be well implemented for the digital signal processing in communication systems.In this paper, we focuses on the physical layer in optical communications.

In optical communication, the signal is propagated through optical fiber.As the transmission speed and distance increases, the effect of dispersion becomes serious resulting in transmission capacity limit.Chromatic dispersion(CD)is the major dispersion for single polarization multiplexing transmission especially in the standard single mode fiber (SSMF) [17].CD causes different group velocities in different spectral components, which lead the signal pulse to be broadened during propagation.As the transmission distance increases,the time delay will increase,the spread will increase, and the front and back pulse will overlap,which causes inter-symbol interference(ISI)in the received signal [18].ISI deteriorates bit error performance and need to be compensated in the communication systems.

There has been various NN-based schemes and researches to realize dispersion compensation.Researches[13-16]proposed the end-to-end method that merged the entire communication link as an NN.Although they can guarantee the global optimal working status for the entire system,they didn’t consider bandwidth property in the signal transmission,and the gradient can’t back-propagated accurately since the channel is noised, distorting, or time-varying.As a result,lots of schemes reserved the transmitter and focused on NN-based design for receiver.Researches[19,20]applied the fully-connected NN to receiver for equalization.However, these neural networks had a limit amount of units in the input layer, which led the input signal sequence to be divided into several blocks before entering the network.Moreover, these methods were not be applied in some higher-order modulation systems like quadrature amplitude modulation(QAM).Some other studies proposed NN-assisted methods to further improve the equalization performance,these methods designed a fully-connected NN after the equalizer to achieve the complement of dispersion compensation and symbol detection for some conventional algorithms, such as FFE+NN [21] and NN+BCJR [22].Nonetheless, neither of the NN in above methods implemented both the equalization and the symbol detection together.Research[23]proposed a symbol detection algorithm based on Bi-directional long short-term memory(BiLSTM)networks consisting of a forward LSTM and a backward LSTM,thus,it can capture both historical and future information and was proposed for ISI.Although LSTM has been a popular scheme for sequence modeling, it has the vanishing gradient problem, it can’t achieve parallel computing because its forward pass process needs to be executed in the temporal order, and it must be fed with data containing all historical information by default which can’t realize precise control of the input information.

Based on the current situation in NN for ISI compensation, this paper adopts the non-causal temporal convolutional network(ncTCN)[24]to realize symbol detection from the base band signal in optical communications.ncTCN is a kind of fully convolution neural network for sequence prediction, unlike BiLSTM, the convolution operations can be done parallel making ncTCN calculate faster than BiLSTM.In addition, ncTCN is equipped with batch normalization and residual connection which can avoid the vanishing gradient and learning degradation problems respectively.Moreover,ncTCN can also be fed in arbitrary lengths of sequence like BiLSTM since the sliding 1-dimension convolution kernels,and it returns the sequence with the same length as the input sequence.Last, ncTCN can be utilized for classification tasks since the Softmax function can be equipped in the output layer.Therefore,this paper utilizes ncTCN to implement symbol detection from the uncoded base band signal,which merges the equalization and detection of the receiver.The diagram of the proposed method can be depicted in Figure 1.The main contributions of our work are listed as follows.

Figure 1. The optical communication systems and ncTCN method conceptual model.

Figure 2. The architecture of the proposed method.This figure describes the proposed method in detail.The leftmost figure depicts the overall composition of the ncTCN.The second figure from the left shows a residual block composition.The right two figures show a dilated convolution operation for an example in which the kernel size is 7 and dilation rate is 2,and the rightmost figure depicts the details of the 1-dimension convolution operation.

• We analyze the theoretical feasibility of ncTCN to detection for signal distorted by ISI, and we propose a scheme that adopts ncTCN to jointly realize equalization and detection in optical communication.After the simulation,the ncTCN obtained performance gain to the equalizer-based conventional methods.

• Considering the optic-fiber channel with timevarying SNR, we utilize three training methods to train ncTCN, including mixed-SNR training method, each-SNR training method, and transfer learning method.And we conducted them to our simulation experiments and verified the three methods are all feasible.

• Based on the definition differences of rectified linear unit (ReLU) and parametric rectified linear unit(PReLU),we believe PReLU can lead to better noise robustness than ReLU.We verified this conjecture through the simulation, PReLU brought higher noise robustness than ReLU in Gaussian noise.

The rest of this paper is organized as follows.Section II introduces the collection of the data set and the ncTCN architecture in details.Section III introduces the three kinds of proposed training method.Section IV demonstrates the experimental results and analysis.Section V draws some conclusions of our contributions and looks forward to future work.

II.MODEL SETUP

2.1 Data Set

According to[25],the CD’s transfer function and impulse response are defined in(1)and(2),

whereDdescribes the time width broadened by CD of impulse.The coefficientsc,λ,zandωdenote the light velocity,wave length,transmission distance,and angular frequency respectively.

Supposingx(t) denotes the constellation signal respectively,x(t)∈{0,1,...,M},Mdenotes QAM order,s(t)andr(t)denote the transmitted and received base band signal respectively,andy(t)denotes the received signal.Therefore,the transmission process can be described as follows,

where Δfdenotes the frequency offset in synchronization,nc(t) denotes the band-limited Gaussian noise,gr(t)denotes the square-root raised cosine rolloff filter in this article.As mentioned in section I,CD causes the front and back pulses to overlap leading to the ISI in the received signal.Supposing the sequencehkdenotes channel sequence,nckdenotes the band-limited Gaussian noise sequence, the received sequence can be described as follows,

whereyk ∈ y(n) andy(n) is sampled fromy(t).Then we separate all the real and imaginary parts of in sequencey(n) and compose them into two vectors as yrand yi, then we concatenate them into a matrix Y = [yr,yi]∈RL×2, whereLdenotes the length ofy(n) and yr= [R(y0),...,R(yL)]⊤, yi=[I(y0),...,I(yL)]⊤.

2.2 Architecture

The temporal neural network was proposed for solving sequence modeling problem, the target implemented can be described briefly in the following formula[26],

Specifically,the 1-dimension convolution kernels slide through the input sequence for characteristic extraction, then the output sequence with the same length as the input sequence will be returned.Generally,the prediction is implemented by nonlinear function such as Softmax or Sigmoid.The ncTCN is a temporal neural network equipped with the dilated non-causal convolution operation,described in the follows,

wheremdenotes the reception filed or the range of feature extraction.Thus, the convolution operation take both the past and future information of a certain input temporal sample into account.Thus, we believe this sequence modeling method is appropriate for recovering message from the ISI distorted signal.In this article,the ncTCN composes of feature extractor and symbol classifier.Supposingdenotes the estimated decimal symbol and∈{0,1,...,M},whereMdenotes the QAM order.The feature extractor extracts information and characteristic from the base band signal then produces the intermediate feature.Then the symbol classifier will cluster the feature into the symbol space.Specifically,the fully connected layer calculates probability distribution of each symbol according to the intermediate feature using the Softmax function.This can be described as following formula,

Then the estimator returns the decimal symbol which has the maximum probability,described as follows,

An appropriate reception field 2m+1 can better extract information in an ISI range, therefore, it can be well determined by kernel size,network depth,and dilation rate.In this paper,it consists of 3 residual blocks which contains 2 dilated convolution layers, 2 batch normalization layers, and nonlinear activation.The super-parameters and diagram of the proposed method are shown in Table 1 and Figure 2 respectively.Next,the above components will be described in detail.

Table 1. Super-parameters of the proposed ncTCN.

Table 2. Simulation parameters.

Non-causal Dilated Convolutions.Given the received signal defined in Eq.(4),we consider the noncausal convolution operation is a feasible way to extract information and characteristic from the received signal and implement modeling sequence prediction like Eq.(6).Owing to the proposed method doesn’t require the channel state information, the kernel size should be carefully considered.When the ISI is severe,the reception filed needs to enlarge then the kernel size needs to increase too.Dilated convolution is a technique which increases the reception field but maintain the kernel size by inserting zero values in the kernels [27].Thus, a size-limited convolution operation with a certain dilation rate will care about both locality and extensiveness while extracting information.The insertion usually controlled by dilation rate.Generally, the dilation rate become bigger when the layer gets deeper in ncTCN, and the dilation rate is increased as a exponential way.In this paper,the dilation rate increases followingd= 2i,whereddenotes dilation rate andidenotes thei-thresidual block.Moreover, the kernel size in this paper is 7.The diagram of non-causal dilated convolution is depicted in Figure 2.

Figure 3. The images of two nonlinear activation ReLU and PReLU.

Nonlinear Activation.Rectified linear unit(ReLU)[28] is a common nonlinear activation function that can avoid vanishing gradient problem.However, the input sequence is polluted by the Gaussian noise and ReLU turns the negative variable into zeros,described in Figure 3a,which may result in information lost and instability and inaccurate inference.As a result, this paper adopts parametric rectified linear unit(PReLU)[29]which depicted in Figure 3b.It doesn’t neglect the negative data because it has the learnable slope parametera.Therefore,we believe PReLU can fully extract feature from data then can improve the noise robustness of the ncTCN.In this paper,the initial value ofais 0.25.

Residual Connection.As mentioned above, a deeper ncTCN makes contributes to enlarge the reception filed.However, the premature saturation usually occurs in the deep layer while training resulting in performance deterioration.As a result,ncTCN adopts the residual connection[30]architecture to guarantee the performance.A residual connection includes two convolution layers, two batch normalization layers, and two nonlinear activation functions, these compose a residual block which is depicted in Figure 2.In this article,three residual blocks constitute the feature extractor.

III.TRAINING METHOD

In the simulation, we conduct experiments in which the Gaussian noise ranges from 6 dB to 12 dB.Aiming at the multiple SNRs environment,this paper proposes three training methods for ncTCN.In different training method,the data set is built differently.We utilize the frame with a length of 1024 to constitute the data set.

3.1 Conventional Training Methods

The conventional training method includes two methods.The first method is the each-SNR training method,which the ncTCN is trained for each SNR environment respectively.When the training phase in one SNR environment completed, the model parameters of ncTCN are reinitialized for the next training phase for the next SNR environment.The training data set,validation data set,and test data set contains 7,000,1,400,and 21,000 frames of sequence.

The second method is the mixed-SNR training method, in which the ncTCN is trained once for several SNR values.In the training phase, this paper evenly distribute all SNR values in the training data set and the validation data set.In the test phase, the ncTCN will be tested for each SNR value.In the training, validation and test data set, we have 1,000, 200 and 3,000 frames of sequence for each SNR environment respectively.

3.2 Transfer Learning Method

Owing to the model parameters are reinitialized instead of inheriting from the previous in the each-SNR training method,it inevitably takes too much time for training.Transfer learning is a kind of training method that extends the experience obtained in one training task to other training tasks.Based on the shortcoming from the each-SNR method,this paper applies transfer learning for training too.At the beginning,the ncTCN is trained under the highest SNR environment, the model parameters are saved when the training phase is completed.When the training is conducted for the other lower SNR values, the saved model parameters will be loaded as for the initial parameters to the ncTCN.The construction method and scale of the data set are the same as the each-SNR method.

IV.SIMULATION RESULTS AND ANALYSIS

In this section, we will conduct experiments to exam the feasibility of our method in optical communication.We simulated a 40-km long SSMF which has the CD of 17 ps/nm/km and the transmission wavelength of 1550 nm, and we also considered CD error or CD varying in SSMF then we assumed the CD value obeys a Gaussian distribution with a mean of 17 and a standard deviation of 0.01.Without losing generality,this paper adopts ncTCN to 16QAM transmission systems with 42-Gbps and 100-Gbps data rate and 1-kHz frequency offset.The simulation parameters are listed in Table 2.This paper constructs the ncTCN-based receiver on the TensorFlow [31] library and other conventional receiver on the MATLAB platform.

In the first experiment, we trained the proposed ncTCN utilizing the three training methods when CD is stably 17 ps/nm/km and data rate is 42 Gbps.The purpose is to verify the symbol detection performance of ncTCN,the effectiveness of the three training methods,and noise robustness resulting from nonlinear activation.We calculated the confidence interval of the simulation results with a confidence of 0.95.The result is shown in Figure 4.First,the three training methods can bring almost the same BER performance at the stable CD value.Second, PReLU brings a better noise robustness than ReLU,which is consistent with research[32].This is because PReLU doesn’t lose the information from the negative variables(shown in 3),thus it can extracts more valuable feature from the noised data and can perform more robustly.

Figure 4. BER performance of the proposed method.Data rate is 42 Gbps,CD is 17 ps/nm/km.The“Each”,“Mixed”,and “TL” in the legend means the three training methods respectively, the following brackets indicate the nonlinear activation equipped in ncTCN.The confidence is 0.95.

Figure 5. BER performance comparison of the ncTCNbased method and the traditional equalizer-based method.The nonlinear activation function of ncTCN is PReLU.CD=17 ps/nm/km.The “AWGN” in the legend means the theoretical AWGN BER performance.The “Each”,“Mixed”, and “TL” in the legend means the three training methods respectively,the following brackets indicate the nonlinear activation in the proposed ncTCN.

Figure 6. BER performance of the proposed method when the CD sampled randomly around the center 17 ps/nm/km.The“AWGN”in the legend means the theoretical AWGN BER performance.the numbers in the legend indicate the CD values.(a)BER performance of ReLU-based ncTCN;(b)BER performance of PReLU-based ncTCN.

In the second experiment, we applied the feed forward equalizer(FFE)and decision feedback equalizer(DFE) based receivers in system when CD and data rate are still 17 ps/nm/km and 42 Gbps respectively,and took their performance as the baselines.The FFE has 25 taps for equalization and 15 taps for reference,AWGN 16.97 ps/nm/km 17.02 ps/nm/km 17.06 ps/nm/km

TL(ReLU) Mixed(ReLU)16.92 ps/nm/km 16.94 ps/nm/km the step size is 0.006 for LMS algorithm and the forgetting factor is 0.99 for RLS algorithm.The DFE has 25 taps for both the feed-forward filter and the feedback filter, and the values of step size for LMS algorithm and forgetting factor for RLS algorithm are 0.003 and 0.99 respectively.Their computation complexity are listed in Table 3,whereLdenotes the input sequence length,ndenotes the order of e exponential function.All the computations are for real numbers.Although our method’s calculation complexity is higher than the FFE and DEF equalizers, our method obtains superior performance to FFE and DFE and can be deployed on the dedicated NN acceleration platform which has high calculation efficiency.Therefore,it is worth trying our method in optical communication systems.Both equalizer have 2000 symbols for training.We trained PReLU-based ncTCN still utilizing the three training methods.The result is shown in Figure 5.

Table 3. Calculation complexity.

Next, we will consider the time-varying situation or measurement error of CD and test the ncTCN’s performance.In the third experiment, we simulated this situation by randomly sampling five CD values from the range[16.99,17.01].Then we tested ReLUand PReLU-based ncTCN trained by mixed-SNR andtransfer learning methods.We still conducted this experiment in 42-Gbps system.The result is shown in Figure 6.It is obvious that ncTCN still maintains a stable BER performance in the random CD situation,which indicates the proposed method is feasible to the varying CD optical fiber.Besides, the PReLU-based ncTCN has a closer performance to AWGN theoretical performance than ReLU-based ncTCN.

In the last experiment,we tried our method to 100-Gbps transmission system and conducted experiment on 17 ps/nm/km optical fiber.We also calculated the confidence interval of the simulation results with a confidence of 0.95.The result is shown in Figure 7 next page.It indicates that the performance brought from the three training methods is pretty different,the mixed-SNR training method brings the superior performance than other two methods.Moreover, the PReLU-based ncTCN also has better noise robustness than ReLU-based ncTCN.

Figure 7. BER performance of the proposed method when the data rate is 100 Gbps.CD=17 ps/nm/km.The“AWGN”in the legend means the theoretical AWGN BER performance.The “Each”, “Mixed”, and “TL” in the legend means the three training methods respectively,the following brackets indicate the nonlinear activation in the proposed ncTCN.The confidence is 0.95.

V.CONCLUSION

Because of impact from chromatic dispersion in optical communication, this paper applied ncTCN to receiver for 42-Gbps and 100-Gbps 16QAM transmission systems to detect decimal symbol from the base band signal distorted by inter-symbol interference.Given the Gaussian noise, we proposed each-SNR method, mixed-SNR method, and transfer learning method for training.In other to improve the noise robustness of ncTCN,we applied PReLU nonlinear activation.After having conducted several experiments to exam our method’s feasibility,we draw some conclusions.First,all the three training methods we proposed could make ncTCN have superior performance to the baselines, and in 100-Gbps system the performance brought by those three had difference.Given the results from 42-Gbps and 100-Gbps system,a combination method of mixed-SNR and transfer learning can be an ideal training method in the practical communication scenarios.Second,the PReLU-based ncTCN had more noise robustness than ReLU-based ncTCN,the experimental results confirmed our analysis.Last,owing to random CD values in the training data set,our method also had robustness in CD time-varying or measured error situations.Based on this work,we will focus on the 100-Gbps transmission system and study about ncTCN-based soft demodulation technique, we also will concentrate on higher modulation and more practical optical fiber physical situation.

ACKNOWLEDGEMENT

This work was supported by National Key Research and Development Plan (2018YFB1801500)and Manned Space Pre-research Project(N0.060501).