Data-driven and physical-based identification of partial differential equations for multivariable system

2022-06-16 04:33WenboCaoWeiweiZhang

Wenbo Cao, Weiwei Zhang

School of aeronautics, Northwestern Polytechnical University, Xi’an 710072, China

Keywords:Partial differential equation identification Data-driven Multivariable system Dimensional analysis

ABSTRACT Data-driven partial differential equation identification is a potential breakthrough to solve the lack of physical equations in complex dynamic systems.However, existing equation identification methods still cannot effectively identify equations from multivariable complex systems.In this work, we combine phys- ical constraints such as dimension and direction of equation with data-driven method, and successfully identify the Navier-Stocks equations from the flow field data of Karman vortex street.This method pro- vides an effective approach to identify partial differential equations of multivariable complex systems.

With the rapid development of sensors, computing power and data storage, the cost of data acquisition and computation has been plummeting, and the rapid development of machine learning pro- vides reliable tools for discovering the potential laws from big data [1] .Data-driven methods have been used to solve various problems in different fields.In recent years, data-driven differential equa- tion identification has become a new way to solve the lack of equa- tions in complex physical systems.This method identifies the gov- erning equations of the system from the observed data instead of deriving from physical laws.

Symbolic regression and sparse regression are two common techniques for differential equation identification.The symbolic re- gression [2,3] firstly realized the identification for ordinary differ- ential equation from the data by genetic programming [4] .How- ever, the method is time-consuming and easily overfitting.The sparse identification of nonlinear dynamics method (SINDy) [5] is a recent breakthrough, which uses sparse regression and Pareto analysis to discover differential equations from a large of com- bined potential dynamic models.SINDy has been widely used and developed in many fields [6-13] , one of the most significant ap- plications of which is the PDE functional identification of nonlin- ear dynamics (PDE-FIND) [14] .In order to accurately match the data, this method adopted spares regression to select the terms from an over-complete library which containing many nonlinear and partial derivative terms by sparse regression.Through PDE- FIND, the potential PDEs were simply, effectively and directly ob- tained.Similar methods are also proposed independently in Ref.[15] .This approach has also been developed to handle sparse and noisy data [16-18] .Convolutional neural networks and symbolic networks [19] are also used to identify PDE, which can make long term predictions flow field.If the PDE structure is known, the Gaussian process [20,21] and the physical information neural net- work [22] can be used to identify the scalar parameters in the PDE.

In addition, there is another different approach for the data- driven discovery of governing equations.This approach aims to approximate the evolution operator of the underlying equa- tions rather than to identify the interpretable equations.This ap- proach does not need a priori knowledge about the unknown sys- tem, and avoids the error caused by the derivative calculation of noise data, and the resulting predictive model can be directly used to predict the future behavior of the system.This approach has been applied to the modeling of ODE [23-25] and PDE [26,27] .

In this paper, we focus on PDE identification method, which aims to recovery the equations exactly.Most of the PDE identi- fication methods need to predefine a candidate library contain- ing a large number of partial differential terms.Due to the lack of in-depth research for building a reasonable and concise library, most researches consider the library as the combinations of lin- ear terms, nonlinear terms and partial derivatives.However, once there are multiple variables in the system, the candidate library will contain not only the derivatives and nonlinear terms of each variable, but also a large number of cross-nonlinear terms between variables, which makes the candidate library too large to iden- tify equations.DLGA-PDE [28] is used to solve the PDE identifi- cation problem with incomplete candidate library, but the validity for multivariable complex systems remains to be verified.In this work, we proposed principles of building candidate library based on physical laws.This method can build a concise and physically complete candidate library and provide a new idea for PDE identi- fication of multivariable complex systems.

In the existing research, the candidate library is usually consid- ered as combinations of linear term, nonlinear terms and partial derivatives.To guarantee the completeness, the candidate library needs to contain as many terms as possible.However, the size of candidate library will be too large to identify the PDEs if there are multiple variables in the system.Therefore, we proposed the fol- lowing principles based on physics to build the candidate library.

All terms in one physical equation should have the same di- mension [29] .Therefore, the dimensions of each term in the can- didate library should be consistent.The method of using dimen- sion as constraints has been used in [30] , which eliminated terms with different dimensions from the predefined candidate library.However, the candidate library is not determined by solving di- mensional equation, which may lead to the incompleteness.

The principle of dimensional consistency ensures that each term in the candidate library has the same basic attributes, such as length, acceleration, energy and so on.Further, the direction should be same for vectors such as acceleration.For example, in the Navier-Stokes (NS) equations, the termutmay appear in the same equation asuux, but notvux.Becauseutanduuxare both accelerations in thexdirection, andvuxis the acceleration in theydirection, whereuandvare the velocities in thexandydirec- tions, respectively, and the subscript denotes the partial derivative inxort.(The method of calculating the direction of each term is shown in Appendix A .)

Almost all the laws of physics in nature have obvious mathe- matical symmetry.The scalar equation is usually space symmet- rical, which means that a two-dimensional scalar equation must contain bothuxandvyor not at the same time.Therefore, the two symmetric termsuxandvycan be combined as one termux+vyin the candidate library.In fact, this is consistent with its tensor form ∇ ·V.

PDE-FIND can be used to identify the PDE after the candidate library is determined by the above principles.The following is a brief introduction to PDE-FIND.For more information about it, please refer to Ref.[14] .PDE-FIND assumes that PDE can be ex- pressed as linear combinations of terms in an overcomplete can- didate library.For example, a general form of nonlinear PDE can be

whereuis the solution of the problem under consideration and the subscript denotes the partial derivative inxort.Nindicates a nonlinear operator.For one-dimensional equations, an alternative candidate library is as follows

In this case, PDE can be expressed as a linear combination of terms in the library, that is,ut=φξT, whereφis the library, and vectorξis the coefficients of each term in the library corresponding to PDE.The vectorξuniquely determines PDE, as an example, the Burgers’ equationut+uux−0.03uxx= 0 can be expressed asut=φ[0,0,0,−1,0,0.03,0,0,0,0,0] T .Therefore, the PDE of the system is determined once the vectorξis obtained from the experimental data.

For the given experimental data of a physical field, the values of andφat many time-space positions are calculated and arranged as column vectors respectively, and Eq.(3) is obtained.Since every point in the field should satisfyut=φξT, the PDE of the field can be obtained by solvingξ.

A key assumption is that PDE consists of only a few of terms, so the vectorξis sparse and can be solved by the least absolute shrinkage and selection operator (LASSO) [31] .However, previous studies have shown that this method often cannot get the correct solution when there are high correlations between columns of the matrix has.An alternative method is sequentially thresholded least squares (STLS) [32] .In STLS, the least square solution of the equa- tion is obtained firstly, and then a hard threshold is performed on the regression coefficients.This process is repeated recursively on the remaining nonzero coefficients.The algorithm has the advan- tages of high computational efficiency and fast convergence.After that, the sequential threshold ridge regression is proposed to better overcome the problem of strong correlation of partial differential terms [14] .In this paper, since our goal is to build the candidate library, for convenience, we only use STLS to solve the vectorξ.

Navier-Stokes (NS) equations describes the flow of fluid, which is derived from three conservation laws of physics: the mass con- servation, the momentum theorem and the energy conservation.It is widely used in fluid mechanics.In this section, NS equations are taken as an example to study the PDE identification method of multivariable complex system.

NS equations are complex nonlinear PDEs with multiple vari- ables, including the densityρ, the velocity (u,v), the temperatureT, the gas viscosity coefficientμ, the heat conduction coefficientkand the constant specific heat capacityCV.In the identification process,ρ,u,v,p,Tare regarded as variables whileμ,k,CVas con- stants.In addition, the ideal gas equationp=ρRTis introduced.Therefore, the two-dimensional NS equations (Only the continuity equation and momentum equation are shown) are as follows

Suppose that we already know that the fluid system is re- lated to the five variables ofρ,u,v,p,Tand the three constants ofμ,k,CV, and the measured values of the whole flow field ofρ,u,v,p,Tand the values ofμ,k,CVhave been obtained.The tar- get is to identify the PDEs from the data.

In the identification of a single variable system like Eq.(3) ,utis regarded as the left term of PDE, so only one candidate library is required to identify the coefficients of spatial derivative terms.However, for multivariable systems, the corresponding candidate libraries should be built for all possible time derivative terms, such asρt,ut,pt,(ρut),(ρu)t,.In order to avoid repetition, iden- tification is only performed for equations withρtand(ρu)t.

First of all, the dimensional analysis is performed to find out all the terms having the same dimension ofρt.One ofuandvwere considered because their dimensions are same, and so isx,y.The dimensional equation is as follows

where [f] indicates the dimension off.By solving this equation, we can getρuxanduρxwhich are the same terms as the di- mension ofρt.Note that since onlyu,xbut notv,yis taken into account in dimensional analysis,ρuxrepresents four termsρux,ρvy,ρuy,ρvx, and we call these four terms the similar terms ofρux.(For more information for solving Eq.(7) , see Appendix B)

Second, the direction of each term is calculated.Every term in the library must be a scalar becauseρtis a scalar.ρuxandρvyare scalars in the similar terms ofρux, anduρx,vρyare scalars in the similar terms ofuρx.Therefore, there are four termsρux,ρvy,uρx,vρywhich satisfy both the dimensional principle and the directional principle.

Thirdly, we combine the symmetric terms in the above four terms.Finally, the candidate library ofρtis obtained as [ρ(∇ ·V),V·(∇ρ)] .

Similar to the above process, the dimensional equation is as fol- lows

By solving this equation, the termsu2ρx,ρuux,ρCVTx,TCVρx,μuxxwith the same dimension as(ρu)tare obtained.

Every term in the library must be a vector in the x direction like(ρu)t.The directions of all terms similar tou2ρx,ρuux,ρCVTx,TCVρx,μuxxare calculated and checked.Finally, the candidate li- brary of(ρu)tis [u2ρx,v2ρx,uvρy,ρuux,ρvuy,ρuvy,ρvvx,ρCVTx,TCVρx,μuxx,μuyy,μvxy].

The physically complete candidate libraries ofρtand(ρu)tare obtained.It can be found that the candidate libraries are very con- cise and contain all terms of the PDE, which fully illustrates the importance of considering physical principles when building a li- brary for the real physical system.

The wake region of Karmen vortex street atMa= 0.5,Re= 500 is selected to collect the data in the wake region (Fig.1) where is a periodic vortex shedding at the trailing edge of the cylinder.The flow field data of 200 consecutive moments is collected and used to identify equations.

Fig. 1. Karmen vortex street velocity contour.The black wireframe indicates the data collection region (Uniform grid: 20 0 *10 0).

All terms in Eqs.(9) and (10) are calculated from the flow field data, and then the coefficientsξcan be solved by STLM.The identification results (Table 1) show that the PDE structures are correctly identified.The coefficient error is very small except for the viscous termsμuxxandμvxy.The errors of the viscous terms may be related to the numerical discretization errors of the higherderivatives.

Table 1 NS equations identification results (Ma= 0.5, Re= 500).(a) Continuity equation; (b) Momentum equation.

In this example, theReis set as 60 because the flow field out- side the boundary layer is almost inviscid according to the bound- ary layer theory [29] .Only when the Reynolds number is low enough, the viscosity of the flow outside the boundary layer is suf- ficient to identify the viscous terms.TheMais set as 0.5 because the flow is considered to be incompressible whenMa<0.3 .For in- compressible flow, the densityρis constant so that Eq.(5) can be simplified to

The partial derivative of Eq.(11) with respect toxcan be written as

Addingaμtimes Eq.(12) through Eq.(6) yields

where a is an arbitrary constant.Thus, Eq.(13) is a special form of NS equations when the flow is incompressible, which shows that the identification result for(ρu)tis not unique for the incompress- ible flow.

Table 2 shows the identification results of three data collection regions (shown in Fig.1) atMa= 0.2,Re= 60 .It can be found that the coefficients ofμuxxandμvxyare all approximately satisfied with Eq.(13) , even though the coefficients are different from each other.Therefore, these identification results are all acceptable.

Table 2 Identification results of NS equations (Ma= 0.2, Re= 60)

Data-driven methods are considered as the fourth paradigm of scientific discovery, which have aroused widespread concerns in different fields.A natural problem is how to introduce physics knowledge into the data-driven approaches.In this work, we com- bine the physical principles with the data-driven PDE identifica- tion method to obtain a simple and physically complete candidate library, and successfully identify the NS equations from the flow field data of Karman vortex street.As far as we know, this is the first time to identify PDE from a complex physical system contain- ing so many physical quantities, which is important for the PDE identification of multivariable complex systems.

The main limitation of data-driven PDE identification method is the difficulty of obtaining the complete measurement data from a complex physical system.Therefore, we think the important ap- plication of this method is to discover equation from the calcu- lated data of multivariable complex systems, such as identifying the macroscopic equation from the data calculated by the Molec- ular Simulation method [ 33 ], or identifying the turbulence model from the data by Direct Numerical Simulations.The significance of our work is to demonstrate the feasibility of the approach.As the next step of our work, we will try to identify the turbulence model from the data of the Direct Numerical Simulations.

Declaration of competing interest

The authors declare that they have no known competing finan- cial interests or personal relationships that could have appeared to influence the work reported in this paper.

Declaration of Competing Interest

The authors declare no conflict of interest.

Acknowledgements

This work was supported by the National Natural Science Foun- dation of China (No.92152301).

Appendix A.The direction of the terms in the library

Every term in the library is composed of the basic variablesρ,u,v,p,T,μ,k,CV,x,y, whereρ,p,T,μ,k,CVare scalars,u,xare vectors inxdirection, andv,yare vectors inydirection.The terms in the library are the dot product or cross product of the basic variables, so their directions can be directly obtained.For exam- ple, the termuvcan only be the cross product ofuandvasuv=ui×vj=uvk, because their dot product is zero.The direction ofuvis inzdirection.Equation (A1) gives the directions of some examples.

Appendix B.Dimensional analysis

In order to obtain all the terms consistent with the dimension of the time derivative terms, we use dimensional analysis to solve Eqs.(7) and (8) .

1) Equation (B1) can be obtained by substituting the dimension of the variable into Eq.(7) .

To solve Eq.(B1), we make some additional assumptions: de- pendent variables can only be put in the numerator, while x and y in the denominator.In fact, the dependent variable in the denomi- nator can always be eliminated by multiplying itself on both sides of the equation.For example,ut= 1/ρ(ux+uy)can be written asρut=(ux+uy), which can be identified by building a candidate li- brary withρutas the time derivative.In addition,x,ycan only be in the form like∂x,∂xx, because the laws of physics are unrelated to the choice of the coordinate system.Therefore, Eq.(B2) can be obtained.

There are only two sets of integer solutions for Eq.(B2)

Becauseμis considered as a constant, the partial differential terms which having same dimension ofρtareρuxanduρx.2) By substituting the dimension of each variable into Eqs.(8), (B4) and (B5) can be got.

There are four sets of integer solutions of Eq.(B5)

Considering thatp=ρRT,P/xcan be replaced by(ρCVT)/x, therefore, the partial differential terms with dimensions(ρu)tareu2ρx,ρuux,ρCVTx,TCVρx,μuxx.