A Jackson Inequality for Kernel Function Approximation in Learning Theory

2023-02-19 12:12TIANMingdang田明党SHENGBaohuai盛宝怀
应用数学 2023年4期

TIAN Mingdang(田明党),SHENG Baohuai(盛宝怀)

(Department of Economic Statistics, Zhejiang Yuexiu University, Shaoxing 312000, China)

Abstract: We recognize kernel function spaces from the view of dierential operators and discuss the kernel function approximation problem with the classical Fourier transform.We define a modulus of smoothness with the Fourier multiplier operators associated with the semigroup of operators and show that it is equivalent to a K-functional defined with a given kernel based dierential operator,with which we provide a classical Jackson-type inequality to describe the decay of best kernel function approximation.We show that if the dierential operator is the Riesz potential operator or the Bessel potential operator,then the decay can be bounded with the modulus of smoothness defined by a convolutional operator.In particular,we give an upper bound estimate for the best approximation by a reproducing kernel Hilbert space.

Key words: Jackson-type inequality; K-functional;Modulus of smoothness;Reproducing kernel Hilbert space;Riesz potential operator;Poisson kernel;Learning theory

1.Introduction

Kernel function approximation problems belong to the mathematical foundation of statistical learning and artificial intelligence[1-2],among which there is the following approximation problem[3-4]:

Let (B,∥·∥) be a Banach space and (H,∥·∥H) be a dense subspace with embedding relation

wherek>0 is a given constant independent ofb.Given,what is the convergence rate of the function

A typical example is whenBis the square integrable function space[5].LetXbe a complete metric space andµbe a Borel measure onX.Denoted by(X) the Hilbert space consisting of (real) square integrable functions with the inner product

If (1.4) holds,then we have the following embedding inequality

We give two examples.

Example 1The Riesz potential operator[7]

is an integral operator associated with kernel theK(x,y)∥x −y∥α-d.

Example 2The Bessel kernel[7]

satisfies forα>0,Gα(x)1(Rd).The Bessel potential operator is defined as

For a givenα>0,Bα(f) is a kernel based integral operator withK(x,y)Gα(x −y).

It is known that,to describe the decay of best approximation by algebraic polynomials of order≤n

wherePndenotes the set of all the algebraic polynomials of order≤n,one establishes the classical Jackson-type inequality[8]

To describe the decay of best approximation by the entire function of exponential typeσ,one needs the Jackson inequality[9]

as+∞,whereνare no-negative integers and 0≤ν

It is known that the skills used in [8] areK-functionals,moduli of smoothness and their equivalence.These theories are also established in realHpspaces.To describe the convergence rates for the approximation of Bochner-Riesz means inHpspaces,LU[10]defines aK-functional associated with the Riesz potential operator and establishes an equivalent relation,which (in the case ofp2) is

where theK-functionalDr(f,t)L2(Rd)is defined as

whereH2,r2(Rd) :I-rg L2(Rd)}.The modulus of smoothnessωr(f,t)L2(Rd)is defined by

In the present paper we shall provide a modified equivalent relation similar to(1.15)with the help of Fourier multipliers and with which show a Jackson inequality to describe the best approximation (1.7).Replacing the Riesz potential operatorsI-rin (1.16) with the general operatorswe have the followingr-thK-functional

Throughout the paper,we shall writeAO(B) if there exists a constantC>0 such thatA ≤CB.We writeA ∼BifAO(B) andBO(A).Also we denote byNthe set of non-negative positive integers.

2. K-Functionals and Moduli of Smoothness

The Schwartz spaceS(Rd) consists of all indefinitely differentiable functionsfon Rdsuch that

for every multi-indexηandβ,i.e.,fand all its derivatives are required to be rapidly decreasing.For(Rd) we define its Fourier transform as

Let2(Rd) be positive definite,i.e.,for all finite distinct pointsx1,x2,···,xnRd,the matrix (ϕ(xj −xk))j,k=1,2,···,nis strictly positive-definite.By Bochner’s Theorem (see Chapter 2 of [15]) we know thatϕis positive definite if and only if

with dµ(y)being a nonnegative finite-valued Boreal measure on Rd.To simplify the derivation we provide an assumption on the kernel discussed.

Assumption 2The kernelsK(x,y) discussed in this section take the form of

whereλ(ξ)>0 and there is an even function1(Rd) such thatλ12(ξ)(ξ).

Under this assumption,the integral operator (1.3) becomes

We have an embedding inequality

wherek∥φ∥L1(Rd).In fact,

is the heat kernel.[14]

Proposition 2.1If a kernelK(x,y) satisfies Assumption 2,then

To show (2.7) we need a lemma.

The infinitesimal generatorEis given by

whenever the limit exists.D(E) is the domain ofE.Then we have the following Lemma.

Lemma 2.1(Theorem 5.1 of [16]) LetT(t) satisfy (2.8),(2.9) and (2.10),

and there exists a positive constantNindependent oftandT(t) such that

Then forandt>0 there holds

We now show Proposition 2.1.

Collecting (2.14),(2.15) and (2.13),we have (2.7).

and by (36) in P.134 of [7] we have

By Proposition 2.1 we have a corollary.

Corollary 2.1There holds

In particular,forα4 we have a corollary.

Corollary 2.2There holds

The Riesz potential operator then has the expression

SinceIαis not a(L2,L2)type operator(see Chapter 3 in[10]),the embedding inequality(2.5) does not hold.However,the set of functions defined by

is a subset ofL2(Rd).

For2(Rd) and an integerr ≥1 we define aK-functional

is thed-dimensional Poisson kernel,i.e.,

Corollary 2.3Letr ≥1 be an integer.Then

ProofReplace theλ(ξ) in (2.6) with (2π∥ξ∥)-2.

To make a combination of Mercer kernels with function translations we add a new assumption:

We point here that the functionsϕ(t) that satisfy Assumption 3 exist.For examples:

Let2(Rd) satisfy Assumption 3.Then

TakeK∗(ϕ,x,y)ϕ(x −y).Define

and equip it with an inner product as

ThenHK∗(ϕ)is a RKHS associated with kernelK∗(ϕ,x,y) (see e.g.[12,18])

Moreover,there is a constantc>0 such that

3.The Jackson-type Inequality

Theorem 3.1LetK(x,y) satisfy Assumption 2.Then there is a constantC>0 such that for any 0<ν

Takingνr −1 into (3.1),we have a corollary.

Corollary 3.1LetK(x,y) satisfy Assumption 2.Then there is a constantC>0 such that

Corollary 3.3There is a constantC>0 such that for any 0≤ν

Corollary 3.4Letϕsatisfy Assumption 3.Then there is a constantC>0 such that

ProofReplace theλ(ξ) in the proof of Proposition 3.1 with(ξ) and taker1.