Xin Dong,Wentao Fan,and Jun Gu
(1.Beijing University of Posts and Telecommunications,Beijing,100876,China;2.ZTE Corporation,Shanghai 201203,China)
Predicting LTEg LTE Throughput Using sing Traffic Time Series
Xin Dong1,Wentao Fan1,and Jun Gu2
(1.Beijing University of Posts and Telecommunications,Beijing,100876,China;2.ZTE Corporation,Shanghai 201203,China)
Throughput prediction is essential for congestion control and LTE network management.In this paper,the autoregressive integrated moving average(ARIMA)model and exponential smoothing model are used to predict the throughput in a sin⁃gle cell and whole region in an LTE network.The experimen⁃tal results show that these two models perform differently in both scenarios.The ARIMA model is better than the exponen⁃tial smoothing model for predicting throughput on weekdays in a whole region.The exponential smoothing model is better than the ARIMA model for predicting throughput on week⁃ends in a whole region.The exponential smoothing model is better than the ARIMA model for predicting throughput in a single cell.In these two LTE network scenarios,throughput prediction based on traffic time series leads to more efficient resource management and better QoS.
ARIMA;exponential smoothing method;throughput prediction
I n recent years,there is a trend towards users accessing the Internet from a variety of applications and without restriction in terms of geographic location.This has re⁃sulted in an exponential increase of wireless traffic.In 2012,global wireless data traffic grew 70 percent year on year[1].Thus,mobile network operators have to make a use of limit⁃ed resources to meet ever⁃increasing traffic demands.To plan and run networks efficiently,it is important to understand the statistical characteristics of data traffic by analyzing the real traffic.
In[2],the authors use the throughput measured from a real⁃work cellular network to statistically model time⁃varying throughput per cell and the distribution of instantaneous throughput per cell over different cells.The proposed statisti⁃cal models can be used to simulate the time⁃varying and loca⁃tion⁃varying throughput of cells.In[3],the authors analyze sev⁃eral widely accepted throughput network⁃performance indica⁃tors in LTE.Their analysis is based on counters and call traces of a live network.However,neither[1]nor[2]describe a sce⁃nario where throughput in a whole region changes over time.In[4],the authors estimate this throughput using a formula that expresses the behavior of TCP throughput.We consider throughput data as a time series that can be predicted using da⁃ta measured in the past.
In this paper,we consider two practical scenarios:whole re⁃gion and single cell.In the first scenario,we constructed a bet⁃ter model than both the individual ARIMA model and exponen⁃tial smoothing model for predicting downlink throughput on weekdays and weekends in a whole region.In the second sce⁃nario,the traffic load in a single cell is uncertain and varying over time.We construct a model for predicting the instanta⁃neous downlink throughput in a single cell of a large urban cel⁃lular network.
2.1 Data Description
Our data set includes records of Internet downloads and up⁃loads in Hong Kong.The data was collected from 1352 cell sites across the city over 21 days between February and March 2014.Each data session includes the throughput of the down⁃link and uplink,timestamp,and cell ID.Each cell ID is also associated with the GPS coordinates of the corresponding cell. In this paper,LTE throughput is modeled as a time series and then predicted using an ARIMA model and exponential smoothing method.
2.2 Time Series Analysis
Time series data is an important class of data.Any change of an attribute value as a function of time can be considered time series data.Such data may derive from the atmosphere,commodity production,geography,sensors,the stock market,or inventory control.The throughput data in an LTE network can also be viewed as a time series.Prediction of time series is based on the idea that historical data related to past behavior can be used to predict the future behavior.
2.2.1 ARIMA Model
The autoregressive integrated moving average(ARIMA)model was introduced by Box⁃Jenkins[5].ARIMA(p,d,q)is an autoregressive moving average(ARMA)model based on dif⁃ferenced time series data.The original time series data is dif⁃ferenced on the order d to make the data stationary.A station⁃ary time series can be modeled as an ARMA model of order(p,q),where p is the order of the AR process and q is the order ofthe MA process.ARMA⁃modeled current time series data is given by:
where yt-1,yt-2...yt-pare the data at past time points,et-1,et-2...et-2are the errors at past time points,etis a pre⁃sent error(ARMA assumes this error is Gaussian⁃distributed),a1,a2,...,apare the AR coefficients,and b1,b2,...,bqare the MA model coefficients[6].
ARIMA(p,d,q)modeling involves making the data station⁃ary,then identifying suitable values for the model order,then predicting the time series data from the model.
2.2.2 Exponential Smoothing Model
Exponential smoothing is a trend⁃analysis and prediction method based on the moving average method.Exponential smoothing method has three main submethods—linear expo⁃nential smoothing,secondary exponential smoothing and cubic exponential smoothing—that differ in terms of smoothing times[7]-[8].The most common of these methods is secondary expo⁃nential smoothing,given by:
2.3 Metrics
Root⁃mean⁃square error(RMSE)and R⁃squared are used to determine how well the model fits.RMSE represents the mean⁃squared error statistics of the output model.These statistics show the difference between the model’s predictions and real values,i.e.,the standard deviation of the residuals.The unit of measure is consistent with the original data.The RMSE is giv⁃en by[10]:
R⁃squared[11]is the square of the correlation between the measured(empirical)value and the predicted value.A higher R⁃squared means a better⁃fitting model.The maximum R⁃squared value is 1.When the time series contains seasonal trends,a stationary R⁃squared statistic is better than a normal R⁃squared statistic.
In this paper,we use stationary R⁃squared as the evaluation index for data with obvious seasonal trends.We use RMSE as the evaluation index for data with no obvious seasonal trend,such as throughput data from a single cell.
Here,we analyze two practical scenarios.In the first scenar⁃io,each cell is divided into regions,and the throughput of an entire region is predicted.In the second scenario,the through⁃put of a single cell is predicted according to historical data.
The reason for creating these two scenarios is that network operators are constantly constructing,adjusting,and optimiz⁃ing their network,and single cell throughput prediction alone is not enough.If a new cell is built next to cell A,then the throughput of cell A is bound change,and the former data is discarded.Therefore,the first scenario is proposed.QoS can be improved by knowing the network throughput in advance.
3.1 Throughput Prediction for a Whole Region
We first investigate how downlink throughput in a whole re⁃gion changes over time.Fig.1 shows the mean throughput in a region on weekdays and weekends.The weekday mean throughput was obtained by averaging the throughput in the whole region over 10 consecutive weekdays,and the weekend mean throughput was obtained by averaging the throughput over two consecutive weekends(four days).For both weekdays and weekends,the mean throughput in the whole region was at its lowest at 05:00.On a weekday,the mean throughput peaked at 09:00 and 19:00.On the weekend,throughput peaked at 13:00.We divided the throughput in the whole re⁃gion in weekdays and weekends for further statistical analysis
To analyze the throughput on weekdays,we used the hourly data of ten consecutive weekdays.Five days of this data was used for modeling,and the other five days was used to deter⁃mine the accuracy of the prediction.
▲Figure 1.Weekday and weekend mean throughput in a whole region over 24 hours.
In Fig.2,the real throughput on weekdays in the whole re⁃gion is seasonal.Therefore,we use the ARIMA(2,0,1)model and exponential smoothing withα=0.600to predict through⁃put on weekdays in the whole region.Although there are gaps between the measured and predicted throughput in the whole region,the predictions by both models are highly accurate.The ARIMA model is more accurate in the valleys of the real throughput curve,which occur at around 05:00,11:00 and 15:00 every weekday.
Table 1 shows the degree of fit statistics for the prediction models.Both the fit of the curve and the stationary R⁃squared statistic indicate that the ARIMA model is better than the expo⁃nential smoothing model for predicting throughput on week⁃days in a whole region.
To study the throughput on weekends,we used hourly throughput data from two consecutive weekends.Two days of this data was used for modeling,and the other two days of data was used to determine the accuracy of the prediction.
The prediction models for throughput of weekends in a whole region is ARIMA(1,0,2)and exponential smoothing method withα=0.500.Fig.3 shows predicted weekend throughput in a whole region using the ARIMA model and ex⁃ponential smoothing model separately.The throughput predict⁃ed using the exponential smoothing model is closer to actual throughput that that predicted using the ARIMA model on a weekend in a whole region(Table 2).The degree of fit statis⁃tics supports this.Hence,we obtain the result,that exponential smoothing method is better to predict the weekends’through⁃ put in a whole region.
▲Figure 2.ARIMA model and exponential smoothing model are used to predict the throughput on weekdays in a whole region.
▼Table 1.Degree of fit statistics for models used to predict throughput on weekdays in a whole region
▲Figure 3.ARIMA model and exponential smoothing model for predicting the throughput on weekends in a whole region.
▼Table 2.Degree of fit statistics for models used to predict throughput on weekends in a whole region
3.2 Throughput Prediction for a Single Cell
A single⁃cell traffic time series is highly unpredictable and has no obvious seasonal trend.Even within the same cell,throughput changes greatly on different days.Although there are gaps between the real and predicted throughput curves,a time series model for a single cell still has some use in network optimization.Here,we use the throughput data of an LTE net⁃work over eight consecutive days.Seven days of this data is used for modeling,and the other day of data is used to deter⁃mine how well the model fits.
The stationary R⁃squared statistic is usually used as an eval⁃uation index when the time series contains seasonal trends.Be⁃cause there is no significant seasonal trend in the throughput of a single cell,we use RMSE as an evaluation index.
Fig.4 shows the throughput prediction for single cell.The prediction models are ARIMA(1,1,1),and exponential smoothing withα=0.100.Fig.4 shows that these two models do not accurately predict abrupt changes of throughput in the single cell.The exponential smoothing model is a little more accurate between 17:00 and 23:00.Table 3 shows the accura⁃cy statistics of the two models.
▲Figure 4.ARIMA model and exponential smoothing method for predicting the throughput of a single cell in an LTE network.
▼Table 3.RMSE of the prediction model(single⁃cell throughput)
▲Figure 5.RMSE statistics for throughput prediction in 100 cells.
We chose 100 cells randomly and modeled them.Then we obtained the RMSE statistics for these cells.Fig.5 shows the distribution of RMSE for prediction using the ARIMA model and exponential smoothing model in 100 cells.The RMSE of the exponential smoothing method is mainly distributed be⁃tween 0 and 0.3,and that for the ARIMA model is mainly dis⁃tributed above 0.3.In general,the exponential smoothing mod⁃el is better for predicting throughput in a single cell.
In this paper,LTE throughput is modeled as a time series, and future values of the traffic time series are predicted using the ARIMA model and exponential smoothing model.Using different time series models,we studied throughput in both a single cell and a whole region within an LTE network.When studying throughput in a whole region,we considered weekday and weekend separately because their throughput patterns were different.The ARIMA model is better than exponential smoothing for predicting throughput on weekday in a whole re⁃gion,and exponential smoothing model is much better than the ARIMA model for predicting throughput on weekends in a whole region.Exponential smoothing is more accurate than the ARIMA model for predicting throughput in a single cell. Throughput prediction based on time series models can be used in the design,management,planning,and optimization of networks.
[1]C.V.N.Index,“Global mobile data traffic forecast update,2012⁃2017,”Cisco White Paper,2013.
[2]E.Nan,X.Chu,W.Guo,and J.Zhang,“User data traffic analysis for 3G cellular networks,”in Proc.CHINACOM,Guilin,China,Aug.2013,pp.469-472.doi:10.1109/ChinaCom.2013.6694641.
[3]V.Buenestado,J.Ruiz⁃Aviles,M.Toril,et al.,“Analysis of Throughput Perfor⁃mance Statistics for Benchmarking LTE Networks,”IEEE Communications Let⁃ters,vol.18,no.9,pp.1607-1610,Sept.2014.
[4]M.Mirza,J.Sommers,P.Barford,and X.Zhu,“A machine learning approach to TCP throughput prediction,”in Proc.SIGMETRICS,New York,USA,2007,pp. 97-108.
[5]G.E.P.Box and G.M.Jenkins,Time Series Analysis Forecasting and Control,2nd ed.San Francisco,CA:Holden⁃Day,1976.
[6]C.Babu and B.Reddy,“Predictive data mining on average global temperature using variants of ARIMA models”,in Proc.ICAESM,Tamil Nadu,India,2012,pp.256-260.
[7]Q.Chen and X.Li,System Engineering⁃Theory and Practice.Beijing,China:Na⁃tional Defense Industry Press,2009.
[8]X.Shang,W.Lin,and Y.Tang.“Development and application of a combined wa⁃ter quality prediction model based on exponential smoothing and GM(1,1),”En⁃vironmental Science&Technology,vol.34,no.1,pp.191-195,May 2011,doi:10.3969/j.issn.1003⁃6504.2011.01.046.
[9]W.Sun and R.Yang,Economic Forecast.Beijing,China:Agricultural University Press,2005.
[10]R.J.Hyndman and A.B.Koehler,“Another look at measures of forecast accu⁃racy,"International Journal of Forecasting,vol.22,pp.679-688,2006.
[11]J.R.Taylor,An Introduction to Error Analysis:The Study of Uncertainties in Physical Measurements.Mill Valley,USA:Univ.Science Books,1996.
Manuscript received:2015⁃07⁃31
Biographies
Xin Dong(dongxin2014@gmail.com)is pursuing her master’s degree in telecommu⁃nications at Beijing University of Post and Telecommunications(BUPT).Her re⁃search interests include data mining and time series analysis.She has previously re⁃searched the prediction of time serials of traffic flow.
Wentao Fan(ffantastic@126.com)is pursuing his master’s degree in telecommuni⁃cations at BUPT.His research interests include data mining,and network analysis and optimization based on mobile devices.He has researched the prediction of time serials of traffic flow using the SVR method.
Jun Gu(gu.jun@zte.com.cn)is a chief engineer of 4G radio network planning at ZTE Corporation.He has 10 years’research and field experience in network princi⁃ples,standardization,simulation,algorithm design,and planning and optimization.