Application of Big Data Technology in Evaluation of Operating Status of High-pressure Hydrogenation Heat Exchanger

2018-10-22 08:47LiLujieLiuXuDuRuiXuPeng
中国炼油与石油化工 2018年3期

Li Lujie; Liu Xu; Du Rui; Xu Peng

(1. International School, Beijing University of Posts and Telecommunications, Beijing 100876;2. Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876)

Abstract: The high-pressure hydrogenation heat exchanger is an important equipment of the refinery, but it is exposed to the problem of leakage caused by ammonium salt corrosion. Therefore, it is very important to evaluate the operating status of the hydrogenation heat exchanger. To improve the method for evaluating the operating status of hydrogenation heat exchangers by using the traditional method, this paper proposes a new method for evaluating the operation of hydrogenation heat exchangers based on big data. To address the noisy data common in the industry, this paper proposes an automated noisy interval detection algorithm. To deal with the problem that the sensor parameters have voluminous and unrelated dimensions, this paper proposes a key parameter detection algorithm based on the Pearson correlation coefficient. Finally,this paper presents a system-based health scoring algorithm based on PCA (Principal Component Analysis) to assist site operators in assessing the health of hydrogenation heat exchangers. The evaluation of the operating status of the hydrorefining heat exchange device based on big data technology will help the operators to more accurately grasp the status of the industrial system and have positive guiding significance for the early warning of the failure.

Key words: hydrogenation; heat exchanger; big data; state assessment

1 Introduction

In recent years, petrochemical industrial systems have grown toward large-scale, automation, intelligencebased operation, and low energy consumption. However,once the production system fails, it will cause the entire equipment or system to run rampant, affecting the normal operation and economic efficiency of the enterprise and leading to the occurrence of casualties, which will cause negative impact on the enterprise. Therefore,real-time monitoring and operational status assessment of petrochemical production processes or operating equipment, eliminating potential failures, and preventing major accidents have become one of the urgent problems in the petrochemical industry[1].

Traditional petrochemical production generally uses some monitoring equipment to monitor the production process and ensure the production safety. However,most of them are traditional industrial control systems.They only make threshold alarms for production failures that have already occurred, but they cannot take preventive measures at the onset of failures. In the early stages, abnormalities in the production process were discovered and forecasting and early warning could be implemented.

With the rapid development of science and technology,especially the computer and information management technology, more and more variables in the chemical industry production equipment can be measured,processed and monitored. In this context, this paper aims to design and implement a set of data based on big data. The hydrogenation heat exchanger operating status evaluation system establishes the monitoring model through the collected historical data, fully mines the information hidden in the process data, and analyzes the operation of the production process to achieve the purpose of assisting the fault early warning system through the evaluation of the operating status.

2 Conditions for Using Big Data Technology in Assessing the Operating Status of High Pressure Hydrogenation Unit

The theoretical research on the evaluation of the operating status of the petrochemical industry originated from the theory proposed by Dr. Beard in 1971 to use analytical redundancy instead of hardware redundancy[2]. In the 1970s, with the development of evaluation algorithms such as the detection filters and generalized likelihood ratios, the operational status assessment theory and applications have been rapidly developed.

Industrial operating status assessment methods are mainly divided into three categories, viz.: the methods based on mathematical models; the methods based on process knowledge; and the methods based on data drives[3]. In recent years, modern industry has been continuously developing toward large-scale and more complicated systems. In today’s large-scale systems, on the one hand,the mathematical model-based approach is not possible to obtain every detail of the complex mechanism model;on the other hand, the monitoring method based on process knowledge requires many complex and profound expertise and long-term accumulated experience. Beyond the scope of the general engineer’s knowledge, it is not easy to operate the complicated control process. The datadriven method does not rely on an accurate mathematical model, since it only depends on the historical data obtained in industrial production to establish a monitoring model, which has received extensive attention. Currently,most companies generate and store more data on operations, equipment, and processes every day. These data are divided into the data collected under normal conditions and the data collected under certain fault conditions, which contain all aspects of the process. This is called the data-driven method, to which the operating status assessment method can provide the basis.

The data-driven operating state assessment methods include the statistical analysis methods, the signal analysis methods, and the quantitative knowledgebased data analysis methods. Among them, the statistical analysis methods are based on the control chart method of the operating state assessment technology[4], and on the PCA operating state assessment technology[5], as well as on the PLS operating state assessment technology[6].

The method based on the signal analysis mainly consists of the operational state assessment methods that are based on the wavelet transform and the S transform.Quantitative knowledge-based methods do not require quantitative mathematical models, and artificial intelligence techniques are used to implement operational state assessments. Typical examples are methods based on artificial neural networks[7-8], methods based on support vector machines[9], and methods based on fuzzy logic[10]. Although the method based on artificial neural network has achieved good application results in many fields, but for the petrochemical field, the data has many dimensions, because the type is single and there are less problematic data to be used for training. The statistical analysis method is superior to the method based on neural network.

3 Big Data Technology Solution in High Pressure Hydrogenation Unit Operating Status Assessment

For the method of evaluating the operating status of hydrorefining equipment based on big data technology,the design and solution are mainly started from the model development and system construction. As regards the model development based on the statistical analysis,the distribution of system parameters under steady state is analyzed, and the time points to be evaluated are compared with the health model one by one. Finally, the system health score is estimated based on the operating state assessment algorithm. From the system construction,data analysis links that can meet needs of data acquisition,storage, analysis, and display are mainly built to solve the characteristics of large industrial data and periodic operations.

3.1 Big data model development

The big data model is mainly based on the statistical methods to assess the operating status. Prior to running the operating state assessment algorithm, the challenges of noisy failure due to human operations in industrial data need to be addressed along with the challenges of large numbers of sensors generating irrelevant dimensional data in an industrial context. In the assessment of operational status, it is mainly necessary to solve the problem that the output indicators of the existing mathematical models are not intuitive and do not take into account the correlation of data time.

3.1.1 Automatic noisy interval detection algorithm

In view of fluctuations in sensor data occurring due to human adjustments and other point failures, an automatic noisy interval detection algorithm based on sliding window detection is used to automatically diagnose fault intervals, and through the idea of collating with log files,the noisy intervals and true concerns fault zones are distinguished.

The fault interval detection is first judged by a sliding window algorithm. The sliding window algorithm is shown in Figure 1. For each interval of detection length L1, take a period of time L2 and L3 as the reference interval, and calculate the statistical index of the L1,L2, and L3 intervals. Finally, compare the ratios of the statistical indicators of L1, L2, and L3, if they are greater than the threshold, and it is considered that the verification interval is abnormal. After the abnormal interval is recorded, the entire window slides backward by L1 steps.

Figure 1 Noisy detection sliding interval

After the abnormal interval is detected using the sliding interval, the joint analysis with the operation log and the running log is used to determine the abnormality caused by the leak and the anomaly caused by the human operation. As regards the noisy data, they should be cleaned to avoid the impact of noisy data on subsequent modeling.

3.1.2 Related parameters selection algorithm based on Pearson correlation coefficient

For the problem that the acquired sensor data has a voluminous dimension and a large number of parameters are irrelevant to the fault, the relevant parameter selection method based on the Pearson correlation coefficient is used to select the fault-related dimension as the model input. The Pearson coefficient describes the degree of correlation between two random variables, ranging from-1 to 1, and the closer the correlation coefficient of two random variables to -1 or 1, the greater the correlation is,and the closer the correlation coefficient to 0, the smaller the correlation would be. The formula is described as follows,

This system is based on the acquired 122-dimensional parameters. The algorithm first determines the threedimensional key parameters of the pressure, temperature,and instantaneous flow in the heat exchanger, which are directly related to the leakage fault, and calculates the absolute value of the Pearson correlation coefficient between the remaining 119-dimensional data and key parameters. The absolute value of the correlation coefficient is arranged according to the size, and the first 90% of the absolute value of the overall correlation coefficient is taken as the relevant parameter so that the 22-dimensional and fault-related parameters are finally obtained.

3.1.3 Operational status assessment algorithm based on principal component analysis

The algorithm firstly decomposes the model input based on the principal component analysis and uses the Hotelling’s T-squared and Q statistical test methods for health diagnosis. However, the use of the statistical test method has two disadvantages. Firstly, the Hotelling’s T-squared and Q values are two independent indicators.Observing two indicators, especially when the two indicators have a “high-low”, they cannot directly determine the situation of the health of the system.Secondly, the Hotelling’s T-squared and Q statistical method only examined the distribution of system indicators at a specific moment, and did not consider the situation of system changes from a time point of view.Therefore, a further analysis of statistical test results is required. On the basis of the combination of the health index fluctuation values and historical health values on the day, a health diagnosis algorithm was proposed. The overall algorithm flow is shown in Figure 2.

Figure 2 Operational status evaluation algorithm flow

Principal Component Analysis (PCA), is the main method of fault detection and diagnosis in process industry control. The main idea of PCA is to convert high-dimensional space into low-dimensional space, and to retain the high-dimensional information as much as possible. Assuming that the current data set X contains m dimensions, the observations at n time points are represented by the matrix presented as follows.

By transforming the matrix P, the matrix X can be linearly transformed into a scoring matrix T, expressed as follows:

In the scoring matrix T, the column vector of T is the score vector, also called the principal component; P is called the load matrix, and the column vector of P is called the load vector. The load matrix can be decomposed by the characteristic of the covariance matrix of X. Assume that the covariance matrix of sample matrix X is:

Since the covariance matrix of the sample matrix is a square matrix, the feature vector p1p2…pmand the corresponding eigenvalue λ1λ2…λmcan be obtained via feature decomposition of the covariance matrix. The feature decomposition is as follows:

In the formula, P=[p1p2…pm] is the required load matrix P, and P is a unit orthogonal matrix, that is, the following formula holds:

It is not difficult to get the score matrix T:

When the error allows, the 1-dimensional principal element scoring matrix Tˆ replaces the m-dimensional scoring matrix T, so as to obtain the principal component of X. How many principal elements are retained depends on the percentage of the accumulated variance of the retained portion in the sum of variance. In PCA, the variance cumulative contribution rate is calculated from the values of the sample covariance matrix. The formula is shown below:

The matrix X can be finally decomposed into the following two parts:

where Xˆ is the principal component matrix of X, E is the noisy matrix of X, and the principal component and noisy matrix are calculated as follows:

After conducting the principal component analysis of data set X, the data distribution of the data model is obtained.Statistically, the hypothesis testing of the statistical indicator can be used to determine whether the current process conforms to the model data distribution. As regards the hypothesis test of PCA, this testing is usually performed by two statistics of Hotelling’s T-squared and Q. Among them, the Hotelling’s T-squared statistic shows the extent to which each data to be tested actually deviates from the model in terms of trend and magnitude of change, and can be calculated by calculating the degree of projection deviation of the test data in the principal element subspace. Q indicates that the distance of the data to be tested at this moment to the model space can be calculated by checking the degree of coincidence of projection of the data to be tested in the residual space.The larger the value obtained thereof, the more the current test data which do not conform to the principal element model would be.

The calculation formula for the Hotelling’s T-squared statistics for the observation data Xiat the i-th time is:

Among them, ‘Λ’ is the diagonal matrix consisting of the first principal feature values,is the main negative load matrix, andis the normalized input.

The Q statistic formula for the observation data Xiat the i-th time is:

For each observation point data, by calculating the Hotelling’s T-squared and Q values, the degree of deviation of the observation point from the main model and the deviation from the main model change trend and the amplitude from the main model can be quantitatively represented.

Based on PCA, and the Hotelling’s T-squared and Q test, the indicators at each time point do not take into account the historical data and the potential inconsistency of multiple indicators. To solve the above problem, the algorithm performs further processing on the statistics calculated by Hotelling’s T-squared and Q values. Firstly,the algorithm maps the Hotelling’s T-squared value and the Q value from [0, +∞) to [0, 100]. At any moment, the calculation formula for the health value in the mapping is as follows:

where M is the health threshold, and the Hotelling’s T-squared and Q values that are greater than this threshold are smoothed by the class sigmoid function.For the calculation of the mapped values, the integrated map values are obtained by weighting. The formula is presented as follows:

For health calculations, the algorithm also considers the effect of the variance of health values over a one-day period. The calculation formula is as follows:

Here, x is the input value at the current time, std is the standard deviation of the health condition in the recent day, and α is the weight coefficient. In order to make the system more cautious in assessing health conditions, the health value smoothing strategies have been added. The formula is as follows:

Among them, health(x) is the final score, his(xt) is the health value of the current observation time, his(xt-1) is the health value of the previous observation time, and λ is the weight coefficient. After the smoothing strategy,the health assessment score of the system tends to“abnormally rapidly decay, abnormally reduce the slow recovery” state, which helps the on-site operator to pay careful attention to the system status.

3.2 Big data system construction

Build big data evaluation system based on big data model algorithm. The big data system architecture is shown in Figure 3. It is mainly divided into three subsystems: the data storage acquisition subsystem, the data scheduling analysis subsystem, and the data query and display subsystem. The data acquisition and storage subsystem is mainly responsible for the data source acquisition and persistence; the data scheduling analysis subsystem is the core of the implementation of the big data module and is mainly responsible for the assessment and analysis of the system operating status based on the original data; the data display subsystem is mainly responsible for reading the analysis results and can provide a visual system assessment presentation for field operators.

3.2.1 Data acquisition and storage subsystem construction

The acquisition and storage subsystem is mainly responsible for providing data sources and data persistence for post-sequence analysis. It is mainly divided into the data acquisition module, the Hbase agent module and the Hbase data in three parts. The function of each module is as follows:

Figure 3 Big data system architecture

Data acquisition module: The main function of the data acquisition module is to periodically read data from various heterogeneous databases, use time as a key,combine multiple rows of data, and send the sensor’s collected data to the Hbase proxy server through an HTTP request.

Hbase agent module: The Hbase agent module provides Hbase’s RESTful HTTP interface for other modules and performs authentication. Upon writing to Hbase, the Hbase module preprocesses the write data to comply with the platform data format specification. Upon reading the Hbase data, the Hbase agent module directly requests data results from Hbase.

Hbase database: Hbase is a distributed, column-oriented database. Using Hbase to store data by column and to store it according to the timestamp can effectively cope with the characteristics of hydrorefining of big data stored in time and with many parameters.

3.2.2 Data scheduling analysis subsystem construction

The data scheduling analysis subsystem is mainly responsible for the encapsulation of the big data model into specific computing tasks for preprocessing and analyzing the original data, and periodically performing calculation tasks on demand, which are mainly divided into an analysis module and a scheduling module. The function of each module is shown as follows:

Analysis module: Based on the big data model mentioned in Section 2.1, the data processing flow is abstracted into three calculation tasks, namely: the data preprocessing and cleaning, the health model generation, and the health data score calculation. Each computing task is based on Hbase as a data source and implements the corresponding big data model.

Scheduling module: For the periodical collection of raw data, it is required to perform the pre-processing and cleaning and health score calculation tasks, and according to the production needs, it is necessary to execute the health model to generate calculation tasks and to update the system health model.

3.2.3 Data display query subsystem construction

The data display query subsystem is mainly responsible for obtaining the data of the lower module and displaying the health status. The data display subsystem includes a data display module, which is mainly based on Hbase’s reading of the health status of the system at a specific moment in time to achieve visualization of the display. The display interface is shown in Figure 4.The query interface visually shows the current system’s comprehensive assessment scores, the recent trend of the system health scores, the near-term system Q score trend and the recent stage Hotelling’s T-squared trend. In addition, based on the contribution ratios of each dimension of each principal component analysis,the system also shows a large deviation from the normal range at each detection point, and displays each parameter’s recent monitoring index.

4 Conclusions

Figure 4 Query display interface

This article discusses how to use big data technology to warn of the leak in high pressure hydrogenation heat exchanger. Due to the complexity of the internal chemical process of the hydrogenation heat exchanger and the difficulty in the equipment exploration, the idea of using the big data to perform the early warning of faults has greater advantages than traditional thinking based on experience and process knowledge. In the fault early warning for high pressure heat exchangers,the sliding window algorithm for automatic detection of abnormal intervals was first discussed. Then,based on the characteristics of multiple sensors in the chemical production system and high dimensionality of the parameters to be analyzed, a Pearson correlation coefficient selection method based on key parameters was proposed. Finally we choose to use statistical test which is based on PCA to assess the health status of the system.And upon aiming at the problem of the dissimilarity of evaluation index and the deficiency of relevance to healthy data events in traditional statistical test, we designed a set of operating status assessment algorithm,which can allow for a more intuitive and comprehensive assessment of system health.

The assessment of the operating status of hydrorefining heat exchange devices based on big data technology helps construction personnel to more accurately grasp the status of industrial systems, and also has positive guiding significance for early warning of failures, thereby reducing the possibility of accidents and improving the economics of industrial systems to enhance the social benefits.

Acknowledgement:This work is supported by the National Natural Science Foundation of China (U1534201), the open project of Science and Technology on Communication Networks Laboratory and the National Key Research and Development Program of China (2016QY01W0200).