New Method for Computer Identification Through Electromagnetic Radiation

2018-10-23 08:05JunShiZhujunZhangYangyangLiRuiWangHaoShiandXileLi

Computers Materials&Continua 2018年10期

Jun Shi, Zhujun Zhang, Yangyang Li, , Rui Wang, Hao Shi and Xile Li

Abstract: The electromagnetic waves emitted from devices can be a source of information leakage and can cause electromagnetic compatibility (EMC) problems. Electromagnetic radiation signals from computer displays can be a security risk if they are intercepted and reconstructed. In addition, the leaks may reveal the hardware information of the computer,which is more important for some attackers, protectors and security inspection workers.In this paper, we propose a statistical distribution based algorithm (SD algorithm) to extracted eigenvalues from electromagnetic radiate video signals, and then classified computers by using classifier based on Bayesian and SVM. We can identify computers automatically and accurately through electromagnetic radiation by using the algorithm in our experiment environment.

Keywords: Computer security, information security, compromising emanations,electromagnetic interference, signals sources identification, SVM.

1 Introduction

Computer displays emit electromagnetic waves and eavesdroppers can intercept these electromagnetic waves and reconstruct the information [Kuhn (2006); Sekiguchi and Seto(2013); Elibol, Sarac and Erer (2012). This can be a potential information security threat as the sensitive information can be stolen from a distance without any network connection. In addition, electromagnetic emanation also leaks the hardware information of the computer itself which is more important for some attackers. For example, attackers can find and lock the target computer if they can recognize the single computer by using the computer recognition algorithm. Besides, as for protectors, the recognition algorithm has significance for prevent information from leaking. Moreover, for the security inspection workers, they need not to check the specified computer in an anechoic chamber. They can check the computer in office environment and individually recognize the emanations to determine whether the compromising emanations belong to the specified computer or not.

In 2003, Markus Kuhn demonstrated that the electromagnetic radiation signals of different graphics are different [Kuhn (2003)]. Markus Kuhn analyzed the electromagnetic radiation signals of different LCD TV sets and he found that the signals vary much between devices. This conclusion based on the reconstruction of the display image [Kuhn (2013)].A work covering some aspects regarding the electric and electronic equipment detection and recognition by their electromagnetic emission profile is presented in Mo et al. [Mo,Lu and Zhang (2012)]. Their approach was to compare original video signal spectrum,measured on RED channel with intercepted emissions from computer. However, the RED channel of tested computer cannot be connected with attackers’ devices in the practical non-cooperative attack scenario. Besides, they did not give specific measure features and recognition results. Another computer recognition-related article is Mo et al. [Mo, Lu,Zhang et al. (2013)], which proposed a method to identify the computer display electromagnetic emissions based on support vector machine (SVM). However, they did not analyze the reason that electromagnetic emissions from computer vary between devices and their method needs a large number of training data.

In this paper, we propose a statistical distribution based algorithm (SD algorithm) to extracted eigenvalues from electromagnetic radiate video signals, and then classified computers by using classifier based on Bayesian and SVM. We can identify computers automatically and accurately through electromagnetic radiation by using the algorithm in our experiment environment.

2 Modeling of electromagnetic radiate video signal

Electromagnetic radiate video signal in time domain can be represented as [Elibol, Sarac and Erer (2012)]:

The range of vertical synchronization frequency is from 40 Hz to 86 Hz, while the range of horizontal synchronization frequency is from 30 kHz to 115 kHz. In addition, the range of pixel frequency is from 31.5 MHz to 297 MHz [Elibol, Sarac and Erer (2012)].Pixel signals change with the display image so that it cannot reflect the internal features of computers. Thus, to prove that differences do exist among different computers, the horizontal synchronization signals are the most suitable. Thus, this paper is modeled on the waveform of horizontal synchronization signal.

A horizontal synchronization signal is periodic and there is a blank in each line.Considering the periodic property of horizontal synchronization signal, we modeled the horizontal synchronization signal as Fig. 1.

Figure 1: Model of horizontal synchronization signal

In Fig. 1, T is the period of horizontal synchronization signal. A is the amplitude of signal. τisthe scan time of each line. τrand τfare the pulse width of each signal.

where, b is blank time of each line.

According to the principle of Fourier transform, the frequency spectrum of this signal can be represented as:

If τr=τf,the single-sided (positive frequency only) spectrum is :

It can be seen that, ifτ=T2, the formula (6) simplifies to formula (8).

When n is even, formula (7) is equal to 0. It means that when, there are no even harmonics. In addition, it is easy to prove that the nearer τapproximates to, even harmonics is smaller than odd harmonics. Thus, ratio between the scan time of each line and the period of horizontal synchronization signal influences the variation trend of harmonics. The ratio can be represented as

As a matter of fact, being unintentional, both scan time of each line and period of horizontal synchronization signal vary much between devices due to different production processes. Thus, it can be said that the variation trend (or shapes) of harmonics of electromagnetic radiate video signal spectrum vary between different computers. This paper proposes a new algorithm to describe the variation trend of harmonics of electromagnetic radiate video signal spectrum.

3 Algorithm

3.1 Basics of wavelet transform

Discrete Wavelet transform (DWT) is the discretization of the Continuous Wavelet Transform (CWT) through sampling particular wavelet coefficients. Sampling of CWT is achieved by letting a=2−1and b=m2−1, in W( a, b). l is the discrete translation and mis the discrete dilations. DWT of a signal f( t)is given by

DWT [Soon, Koh, Yeo et al. (1997)] has its own advantages such as the ease of implementation and less computation time when compared to time domain. Here the signal is decomposed into approximated and detailed coefficients, where approximated coefficients consist of low frequency information and the detailed coefficients represent high frequency information. Approximated coefficients are obtained by passing the signal through a low pass filter and a dyadic down sampler. Detailed coefficients are obtained by passing the signal through a high pass filter and a dyadic down sampler.

3.2 Statistical distribution based algorithm (SD algorithm)

In this sub-section, a statistical distribution based algorithm (SD algorithm) is proposed.In addition, we use wavelet transform here because the wavelet coefficients describe the variation trend of harmonics of electromagnetic radiate video signal spectrum. We analyze the statistical distribution of wavelet coefficients by calculating the histogram of wavelet coefficients and fitting many different curves. The fit results of different distributions are given in Fig. 2. It can be observed that, the exponential distribution fits the histogram best.

Thus, the first step of the algorithm is calculating signal power spectrum.

Secondly, we calculate the wavelet coefficients of signal power spectrum. We choose two-tap Haar wavelet transform to implement our algorithm due to its simplicity.

where DWT accords to the Eq. (5). X is the signal power spectrum.are the wavelet coefficients.

Thirdly, we make Maximum Likelihood Estimation (MLE) of exponential distribution.MLE of exponential distribution parameter is given in formula (10) and we need to calculate μof wavelet coefficient.

Then, in order to realize automatic recognition, the classifier of the Bayesian and the classifier of the SVM are used.

Training data generated based on 400 signals from four different computers，which are Think Center, DELL OPTIPLEX GX520 and DELL OPTIPLEX 7020. This choice considered sampling the computers of different brands and computers of the same brand.Moreover, to analyze the individual characteristics of computers, we used two computers of the same model DELL OPTIPLEX 7020. To distinguish between the two same model computers, hereafter called DELL OPTIPLEX 7020-1 and DELL OPTIPLEX 7020-2.Each computer generated 100 signals.

As for the classifier of the Bayesian, conditional probability density functions (PDFs) of μare obtained from training data and shown in Fig. 3. The algorithm judges to which computer the observed signal belongs. An observed signal belongs to one computer only if its conditional probability density function (PDF) f is the largest among other computers.

where, μOrepresents μof the observed signal. Cirepresents the computer source. The conditional PDFs of μ, given class label Cican be obtained from training data as given in Fig. 3.

In conclusion, the algorithm can be divided into the following steps:

(1) Calculate the signal power spectrum.

(2) Calculate the wavelet coefficients of signal power spectrum.

(3) Calculate μof wavelet coefficients.

(4) Look up the joint conditional PDFf( μ|Ci)obtained from training data.Calculatef( μ|C1),f( μ|C2)and f( μ|C3).

(5) Compare thef( μ|C1),f( μ|C2)andf( μ|C3), and find the maximum.

(6) The maximum off(X=XO|Ci)corresponds to the right computer source.

Figure 1: Fit results of statistical property

Figure 2: Conditional probability density functions (PDFs) of μ

As for the classifier of the SVM, the advantage of selecting SVM for classification is that it can map multi-dimensional feature input to high-dimensional kernel space, which is more conducive to classification.

SVM tries to find a hyperplane based on following optimization criterion [Hastie,Tibshirani and Friedman (2001)].

where the margin is given by. Thus, minimizingis equivalent to maximizing the margin. Solving this quadratic problem gives the hyperplane parameter as follows:

where S is a set of support vectors for both classes, andkαis a trained weight on the corresponding support vectors. Based on this solution, one can classify an arbitrary new input x using

The entire platform can be generalized to a nonlinear case. This generalization can be accomplished by mapping the samples to a certain high-dimensional space H:

Under such a high-dimensional space, usually called the feature space, the original overlapping data could become linearly separable. Constructing a separating hyperplane in that space yields a nonlinear decision boundary in the input space [Kim, Park, Toh et al.(2010)]. However, since the dimensionality of this new feature space could be very high(possibly infinite), a direct data mapping often becomes intractable. Nevertheless, by adopting a kernel function, the nonlinear SVM can be formulated in a tractable manner without explicitly carrying out the mapping into the feature space:

We still use μof wavelet coefficients calculated by formula (10) to construct a set of feature vectors and they are input into a classifier of the SVM for computer identification.

4 Experimental results

In this section, the proposed algorithm is applied to experimental data and the results and analysis are given.

Four computers used here are Think Center, DELL OPTIPLEX GX520, DELL OPTIPLEX 7020-1 and DELL OPTIPLEX 7020-2. The measurement setup is shown in Fig. 4. The resolution of the computer display was set at 1024×768. A log-periodic antenna (ZN30505E) designed for 30-3000 MHz was placed in front of the tested computer and its height was the same as the height of the computer display center. It is important to note that we placed the antenna 1 m-10 m from the tested computer to obtain signals. In addition, the performance of the algorithm under different antenna distance is presented in Section 5.

Figure 3: Measurement setup for data collection

The antenna is connected to a data collector, which can be a data acquisitioncard, digital oscilloscope and spectrum analyzer. A spectrum analyzer was used here. Theoretically,the noise received here is white noise which can be attributed to external noise sources as well as data collector internal noise, such as the noise figure of some filters, mixers, and semiconductors [Song and Yook (2015)]. Additionally, the antenna receives environmental white noise with many other man-made noises. As for sample frequency, according to the VESA standard, the scope of the pixel frequency is from 31.5 MHz to 297 MHz. When the resolution of the computer is 1024×768, the scope of pixel frequency is from 44.9 MHz to 94.5 MHz. Considering these video interface signals include harmonics of the fundamental signal frequency, we chose 500 MHz as the sample frequency.

To evaluate the SD algorithm, the four computers display the same images which were filled with letter “H”. Fig. 5 shows the power spectrums of sub-band of four computers emanations. It can be seen that the variation trends of spectrum harmonics are different among the computers.

Figure 4: Power spectrums of sub-band of four computers emanations (a) Think Center(b) DELL OPTIPLEX GX520 (c) DELL OPTIPLEX 7020-1 (d) DELL OPTIPLEX 7020-2

We tested 400 sets of received signals and each computer contains 100 sets of data. It should be noticed that the test data here is different from the training data used in Section 3.Recognition result of SD algorithm by using the classifier of the Bayesian is shown in Tab. 1.Definitions of POD and FAR are:

where, “Positive” labels the location that the detector judges as the true computer, and“Negative” labels the location that the detector judges as the wrong computer. In Tab. 1,True Positive (TP), False Negative (FN), False Positive (FP), True Negative (TN) and FAR of the data are summarized.

Recognition result of SD algorithm by using the classifier of the SVM is shown in Fig. 5.It can be seen that the SD algorithm has a higher POD when using the classifier of the SVM then using the classifier of the Bayesian.

Figure 5: Recognition result of SD algorithm by using the classifier of the SVM

Table 1: Recognition result of SD algorithm by using the classifier of the Bayesian

5 Algorithm comparison

Considering that reference Sun et al. [Sun, Shi, Wei et al. (2016)] proposed a SP algorithm based on spectral centroid to identify the computer electromagnetic emissions,we compared SP algorithm with our proposed SD algorithm in this section.

We compared the performance of the SP algorithm with SD algorithm by using the classifier of the SVM and both algorithms use the same number of training data (400 sets of received emanation signals). The process of experimentation and the test data are the same as Section 4 which results in Tab. 1. The measurement setup is shown in Fig. 4.Four computers tested were Think Center, DELL OPTIPLEX GX520, DELL OPTIPLEX 7020-1 and DELL OPTIPLEX 7020-2.

Figure 6: POD and SAR of the SD and SP algorithm as a function of antenna distance

The performance of the SP and SD algorithm under different antenna distance is presented in Fig. 6. The antenna distance is from 1 m to 10 m. It should be noticed that the POD and SAR in Fig. 6 are the average value of PODs and SARs of the four computers. It can be observed that, the accuracy of the SP and SD algorithms decrease with the increasing of antenna distance. The performance of SD algorithm is better than SP algorithm.

6 Conclusion

This paper proposed a new algorithm to realize computers recognition through electromagnetic radiate video signals. We proposed statistical distribution based algorithm(SD algorithm) to identify computers. By using the algorithm, we can automatically and accurately identify computers through electromagnetic radiate video signals in our experiment environment. In addition, the performance analysis of the SD algorithm comparing with the method proposed in Sun et al. [Sun, Shi, Wei et al. (2016)] under different antenna distance indicates that the SD algorithm has a better robustness.

The proposed method of identifying displays has practical significance. First of all, this method has significance for reconstruction of the compromising emanations. Attackers can lock onto the target computer so that they can just reconstruct the image of the objective display after the identification, especially when the attackers intercept the information in big organizations where lots of different computers are used. Secondly, in the same scene, protectors can selectively protect computers which have high compromising emanations rather than protect all computers aimlessly.

To prevent computer identification, special EMC (Electro Magnetic Compatibility)measures can be taken, such as shielding the computer and shielding cables. Then they can substantially reduce the compromising emanations and it would decrease the signal to noise ratio of received signals. The next step of our work will be combining our method with other signal processing methods to acquire a more accurate result in the low signal to noise ratio circumstances.

Acknowledgement:This work was supported by the Innovation Foundation of China Academy of Electronics and Information Technology (Grant No. 17109701). This work was also supported by the Innovation Fund of CETC (Grant No. 16105501) and the Joint Fund of CETC (Grant No. 20166141B08020101).

Computers Materials&Continua2018年10期

Computers Materials&Continua的其它文章: Analyzing Cross-domain Transportation Big Data of New York City with Semi-supervised and Active Learning; Improved VGG Model for Road Traffic Sign Recognition; Multi-task Joint Sparse Representation Classification Based on Fisher Discrimination Dictionary Learning; Snow Cover Mapping for Mountainous Areas by Fusion of MODIS L1B and Geographic Data Based on Stacked Denoising Auto-Encoders; Crowdsourcing-Based Framework for Teaching Quality Evaluation and Feedback Using Linguistic 2-Tuple; Method of Time Series Similarity Measurement Based on Dynamic Time Warping