基于函数型主成分分析的过程批次响应序贯建模

2023-08-06 07:08刘洋洋刘飞
化工自动化及仪表 2023年4期

刘洋洋 刘飞

摘 要 针对间歇生化过程操作条件的批次响应建模问题,结合试验设计方法,提出一种基于函数型主成分分析的序贯建模策略。首先,使用B样条基函数平滑法将离散的批次响应序列转化为连续的响应函数曲线;然后,运用函数型主成分分析得到响应函数的均值曲线、主成分函数和主成分得分;最后,构建主成分得分与操作条件之间的Kriging模型,用于预测试验区域内任意操作条件所对应的主成分得分,从而建立批次响应关于操作条件的模型。为了提高模型预测精度,依据改进的收敛条件,采用序贯设计迭代更新模型。通过生化反应网络试验仿真,验证了该建模策略的有效性,且仿真结果表明该建模策略具有较好的数据可视化和模型解释能力。

关键词 函数型主成分分析 序贯设计 批次响应 试验设计 Kriging模型 生化类间歇过程

中图分类号 TP274.2   文献标识码 A   文章编号 1000-3932(2023)04-0439-08

实际工业生产中,大量生化类间歇过程的机理不清楚或工艺过于复杂,使得机理建模难度大且优化求解困难,因而开发数据驱动模型成为可行的替代方案[1]。结合试验设计(Design of Experiments,DoE)的响应曲面法(Response Surface Methodology,RSM)是一种兼具建模与优化的数据驱动方法[2],其在生化分析和药物研究方面被广泛应用[3]。RSM只能够建立生产中某一时刻响应与操作条件之间的数据驱动模型,通常是终端时刻。但构建整个批次响应关于操作条件的模型则更为重要,并且随着自动化实验平台的普及,短期内并行试验能够快速获取批次数据,这进一步促进了学者们对批次响应建模的研究。

文献[4]对RSM进行推广,提出了动态响应曲面法(Dynamic Response Surface Methodology,DRSM),通过在响应面模型的系数中引入与时间相关的移位勒让德多项式(Shifted Legendre Polynomials,SLP),將RSM中仅描述某一时刻的模型系数转化为可以表示整个批次的时变系数;WANG Z和DONG Y等针对估计高阶SLP微小偏差造成的模型局部振荡问题分别提出相应的改进策略[5,6],并拓展了DRSM的应用范围[7]。文献[8]使用改进DRSM建立吡啶酮环化反应模型;文献[9]提出基于半参数模型的批次响应建模流程,应用于甲酯化学选择性水解反应分析。此外,还可以考虑高斯过程[10]、机器学习[11,12]等方法来分析批次响应建模问题。

以上方法把批次响应看作生产过程的离散数据序列。笔者将把批次响应视作一个整体,表示为连续的响应函数曲线,即函数型数据[13]。函数型主成分分析(Functional Principal Component Analysis,FPCA)是研究函数型数据的主要方法。FIDALEO M采用面心立方复合设计构造试验,利用FPCA建立搅拌球磨机批次响应与操作条件之间的函数模型,确定了操作条件的设计空间[14]。其中,FPCA作用于批次响应得到均值曲线、主成分函数和主成分得分。FIDALEO M使用RSM构建主成分得分关于操作条件的二阶多项式预测模型。但当批次响应的非线性较强且试验区域较为复杂时,就需要采用精度更高、灵活性更强的建模方法;另一方面,如果根据一次试验设计所得模型未达到预期精度,还需考虑如何进一步提高模型精度。

因此,笔者采用精度更高的Kriging模型预测主成分得分,并结合极大均方误差准则的序贯设计[15],在当前模型预测精度较低区域进行新的试验,以提高所建模型精度。使用改进的曲线拟合度量指标与均方误差共同组成序贯设计收敛条件。通过FPCA序贯建立生化反应网络产物浓度模型的试验仿真,验证了所提方法的有效性。

1 基于Kriging模型的FPCA建模

1.1 函数型主成分分析

1.2 预测主成分得分

2 FPCA序贯建模算法

3 生化反应网络建模示例

对一个含10种物质的模拟反应网络进行FPCA序贯建模。该反应网络具有8个独立反应,反应1、4为可逆反应,动力学方程和参数见文献[6],物质间的关系如图2所示,其中,数字代表反应,圆圈代表物质,蓝色表示反应物,灰色表示中间体,橙色表示副产物,绿色表示目标产物。

综上,结合DoE方法,通过FPCA序贯建模算法实现了对生化反应网络试验区域内任意操作条件下产物批次浓度的预测,验证了所提建模策略的有效性。

4 结束语

笔者结合DoE,提出了一种基于FPCA序贯建立过程批次响应模型的方法。通过对生化反应网络物质浓度建模的试验仿真,验证了该方法的有效性。所建模型具有较好的数据可视化和解释能力,能够非常准确地预测试验区域内未知操作条件的批次响应,可用于生化过程的在线监测、控制和优化。

本课题中考虑的操作条件是不随时间变化的,笔者后续将推广所提方法,使其可以建立随时间变化的操作条件的过程批次响应模型。

参 考 文 献

[1]   GEORGAKIS C.Design of dynamic experiments:A da-ta-driven methodology for the optimization of time-varying processes[J].Industrial & Engineering Che-mistry Research,2013,52(35):12369-12382.

[2]   BAS D, BOYACL I H. Modeling and optimization Ⅰ:Usability of response surface methodology[J].Journal of Food Engineering,2007,78(3):836-845.

[3]  HANRAHAN G,LU K.Application of factorial and response surface methodology in modern experimental design and optimization[J].Critical Reviews in Ana-lytical Chemistry,2006,36(3-4):141-151.

[4] KLEBANOV N,GEORGAKIS C.Dynamic response surface models:A data-driven approach for the analysis of time-varying process outputs[J].Industrial & Engi-neering Chemistry Research,2016,55(14):4022-4034.

[5]   WANG Z,GEORGAKIS C.New dynamic response sur-face methodology for modeling nonlinear processes over semi-infinite time horizons[J].Industrial & Engi-neering Chemistry Research,2017,56(38):10770-10782.

[6]   DONG Y,GEORGAKIS C,MUSTAKIS J,et al.Constr-ained version of the dynamic response surface metho-dology for modeling pharmaceutical reactions[J].In-dustrial & Engineering Chemistry Research,2019,58(30):13611-13621.

[7]  DONG Y,GEORGAKIS C,SANTOS-MARQUES J,et al.Dynamic response surface methodology using Lasso regression for organic pharmaceutical synthesis[J].Frontiers of Chemical Science and Engineering,2022,16(2):221-236.

[8]   JURICA J A,MCMULLEN J P.Automation Technolo-gies to Enable Data-Rich Experimentation:Beyond Design of Experiments for Process Modeling in Late-Stage Process Development[J].Organic Process Research & Development,2021,25(2):282-291.

[9]   WANG K,HAN L,MUSTAKIS J,et al.Kinetic and da-ta-driven reaction analysis for pharmaceutical process development[J].Industrial & Engineering Chemistry Research,2019,59(6):2409-2421.

[10]   TANG Q,LAU Y B,HU S,et al.Response surface methodology using Gaussian processes:Towards optimizing the trans-stilbene epoxidation over Co2+-NaX catalysts[J].Chemical Engineering Journal,2010,156(2):423-431.

[11]  DOMAGALSKI N R,MACK B C,TABORA J E.Analysis of design of experiments with dynamic res-ponses[J].Organic Process Research & Development,2015,19(11):1667-1682.

[12]   WILSON Z T,SAHINIDIS N V.The ALAMO approa-ch to machine learning[J].Computers & Chemical Engineering,2017,106:785-795.

[13]   RAMSAY J O.When the data are functions[J].Psych-ometrika,1982,47(4):379-396.

[14]   FIDALEO M.Functional data analysis and design of experiments as efficient tools to determine the dynamical design space of food and biotechnological batch processes[J].Food and Bioprocess Technology,2020,13(6):1035-1047.

[15]   CROMBECQ K,LAERMANS E,DHAENE T.Effici-ent space-filling and non-collapsing sequential design strategies for simulation-based modeling[J].European Journal of Operational Research,2011,214(3):683-696.

[16]   RAMSAY J O,SILVERMAN B W.Functional Data Analysis[M].2nd ed.New York:Springer New York,2005.

[17]   BETZ W,PAPAIOANNOU I,STRAUB D.Numerical methods for the discretization of random fields by means of the Karhunen-Loève expansion[J].Computer Methods in Applied Mechanics and Engineering,2014,271:109-129.

[18]   RAMSAY J O,DALZELL C J.Some tools for functio-nal data analysis[J].Journal of the Royal Statistical Society:Series B (Methodological),1991,53(3):539-561.

[19]   SACKS J,WELCH W J,MITCHELL T J,et al.Design and analysis of computer experiments[J].Statistical Science,1989,4(4):409-423.

[20]   MONTGOMERY D C.Design and analysis of experi-ments[M].9th ed.Arizona:John Wiley & Sons,2017.

(收稿日期:2022-10-21,修回日期:2023-01-10)

Sequential Modeling of Process Batch Response Based on

Functional Principal Component Analysis

LIU Yang-yang, LIU Fei

(MOE Key Laboratory of Advanced Control for Light Industry Processes, Jiangnan University)

Abstract   Combined with the method of experiment design, a sequential modeling strategy based on functional principal component analysis(FPCA) was proposed for the batch response modeling of operation conditions in biochemical processes. Firstly, having B-spline basis function smoothing method adopted to transform discrete batch response sequence into a continuous response function curve; then, having FPCA employed to analyze and obtain response functions mean curve, principal component function and principal component score; finally, having Kriging model between the principal component score and operating conditions constructed to predict the principal component score corresponding to any operating conditions in the experiment region so as to establish the model of batch response on operating conditions. For purpose of improving prediction accuracy of the model, having sequential design used to update the model according to the improved convergence condition was implemented, including having effectiveness of the proposed modeling strategy verified by biochemical reaction network experiment simulation. The simulation results show that, the proposed modeling strategy has better data visualization and model interpretation ability.

Key words   functional principal component analysis, sequential design, batch response, experiment design, Kriging model, biochemical batch process