Xiaoli Yin, Chunming Li, and Yuan Zhang
Abstract—Many effective optimization algorithms require partial derivatives of objective functions, while some optimization problems’ objective functions have no derivatives. According to former research studies, some search directions are obtained using the quadratic hypothesis of objective functions. Based on derivatives, quadratic function assumptions, and directional derivatives, the computational formulas of numerical first-order partial derivatives, second-order partial derivatives, and numerical second-order mixed partial derivatives were constructed. Based on the coordinate transformation relation, a set of orthogonal vectors in the fixed coordinate system was established according to the optimization direction. A numerical algorithm was proposed, taking the second order approximation direction as an example. A large stepsize numerical algorithm based on coordinate transformation was proposed. Several algorithms were validated by an unconstrained optimization of the two-dimensional Rosenbrock objective function. The numerical second order approximation direction with the numerical mixed partial derivatives showed good results. Its calculated amount is 0.2843% of that of without second-order mixed partial derivative. In the process of rotating the local coordinate system 360°, because the objective function is more complex than the quadratic function, if the numerical direction derivative is used instead of the analytic partial derivative, the optimization direction varies with a range of 103.05°. Because theoretical error is in the numerical negative gradient direction, the calculation with the coordinate transformation is 94.71% less than the calculation without coordinate transformation. If there is no theoretical error in the numerical negative gradient direction or in the large-stepsize numerical optimization algorithm based on the coordinate transformation, the sawtooth phenomenon occurs. When each numerical mixed partial derivative takes more than one point, the optimization results cannot be improved. The numerical direction based on the quadratic hypothesis only requires the objective function to be obtained, but does not require derivability and does not take into account truncation error and rounding error. Thus, the application scopes of many optimization methods are extended.
OPTIMIZATION methods are widely used in various research fields and play an important role in scientific and technological problems [1]–[7]. At present, the research of optimization methods is mainly divided into multi-objective[8]–[10], intelligence, genetic algorithm [10], [11], clustering[12] methods, and so on. Classical multidimensional optimization methods are also studied [13].
Many efficient optimization methods need derivative and partial derivative information of the objective function, such as the linear fitting derivative method, high-order second approximate fixed-point method, tangent intersection point method, negative gradient direction method, zigzag line negative gradient direction method (blind- walking method),second approximate direction method [14], linear fitting gradient method [15], various conjugate direction methods[16]–[18], the optimal available direction method (feasible direction method), half step method [19], and so on. These methods are not useful in optimization problems where the objective function is not differentiable. Numerical differentiation algorithms [20] may allow these methods to work.
Comparing the above algorithms, the second approximation direction is the best, and the direction of the negative gradient is most commonly used. The negative gradient direction is the fastest local descent direction, and usually deviates from the extreme value point in the whole design space. The conjugate directions constructed on the basis of the negative gradient direction have better optimization efficiency [21], [22].
The numerical differential algorithm can not only determine the numerical partial derivative, but also obtain the numerical directional derivative in the general direction.
Looking at the teaching contents of the course “mechanical optimum design” or “optimization methods” and the research literature of optimization methods, and at classical optimization methods, it can be determined that they are all proposed under the quadratic objective function hypothesis and then applied to the general objective function. Although there is an optimization direction based on finite differences[23], the truncation error and rounding error are emphasized while the quadratic hypothesis is neglected due to the fitting effect of the objective function. By representing each derivative in the system of differential equations by its finite difference approximation [24], the quadratic hypothesis is ignored. The research based on the quadratic hypothesis and fitting method is more general.
Classical optimization methods are not proposed to deal with large-scale matrices. However, some large-scale sparse matrix problems borrow optimization ideas, such as the graph coloring problem [25] and nonlinear equations solving method[26]. Although document [25] deals with the optimization direction based on finite differences, it is far from the research in this paper.
According to the characteristics of the optimization method,the numerical partial derivative and the numerical directional derivative of the objective function are constructed by using the numerical differentiation algorithm under the quadratic function assumption. On this basis, the optimization effect of the numerical second approximate direction and the local coordinate negative gradient direction is studied.
The optimization methods with a clear direction of optimization are put forward under the unimodal assumption.The optimization method based on the quadratic function is universal. There is no universal significance in the study of higher order objective functions. In the design space, the objective function is assumed to be quadratic in any direction,so there are three undetermined coefficients. Take the current pointx(k)as the center and taking pointx(k−1)and pointx(k+1)by a microstep ∆sat two sides. Assume that the objective function values of the three points aref(k−1),f(k)andf(k+1),respectively. ∆sshall be reasonably determined according to the computer’s expressive ability and termination condition value. Three equations can be obtained from the above three points. Thus the above three parameters are obtained so that the quadratic function of the hypothesis is known. The partial derivatives at pointsx(k−1),x(k)andx(k+1)are shown as follows:
The above three formulas are consistent with the results obtained by the Lagrange interpolation function method. The second partial derivatives at the three points are equal.
Equation (4) is consistent with the Newton center difference formula. The numerical gradient and the numerical second partial derivative at the pointx(k)can be obtained from the above 4 formulas, but the second mixed partial derivative cannot be obtained.
The directional derivative is the rate of change of a function along a certain direction, which is a generalization of the concept of a partial derivative. Letsbe the direction fromx(k)tox(k+1), wherescan be expressed as follows:
The gradient function of the objective function at pointx(k)is as follows:
The directional derivative can be regarded as the point product of the gradient vector and the unit directional vector[27], equal to the length of the vector ∇f(x) projected ons.
where cos(∇f(x),s) is the cosine of the angle betweenands.kis the optimal serial number. The minute step alongsis decomposed into the minute step alongNcoordinate axes. Then, according to the definition of the partial derivative (7) can be obtained.
According to the definition of directional derivative change rate, the numerical directional derivative atx(m), the midpoint ofx(k+1)andx(k)is as follows:
Ifx(k+1)is sufficiently close tox(k), the numerical directional derivative at pointx(k)can be expressed approximately by (8). Calculate the objective function valuef(m)of pointx(m); then the numerical directional derivatives at pointx(k)are as follows:
A linear equation of gradient vectors can be obtained from(7) and (8). By searching forNpoints near the current pointx(0)and establishing the above equations, a set ofN-ary linear equations can be obtained, and the numerical gradient of the objective function can be solved.
The numerical negative gradient direction obtained from(10) has some theoretical errors due to the approximate directional derivative of (8). According to the quadratic function hypothesis, the linear equations can be obtained from(7) and (9).
According to the method of taking points above, the coefficient matrix (10) and (11) is a real diagonal matrix, is non-singular, and its rank isN. It is almost impossible to find the same objective function at each point as that at pointx(0),so (10) and (11) have a unique nonzero solution. In the same way, if the first partial derivative of the objective function is known,Npoints can be found near the current pointx(0), and the aboveN-ary linear equations can be obtained, so that the second partial derivative matrix of the objective function can be solved by the least square method.
Fig. 1. The point drawing for obtaining mixed partial derivative by numerical differentiation.
Then the second mixed partial derivative is obtained from the following equation.
According to the quadratic function hypothesis, the relationship between the two coordinates has six undetermined coefficients, and the five points in Fig. 1 can not determine these coefficients, so a numerical point must be added. If we take pointfon the bisector of two coordinate axes, then according to (2) at the midpointgofcf, the partial derivative ofxiis as follows:
The numerical second mixed partial derivative is derived from the assumption that the objective function is quadratic,so the results of (13) and (14) should be the same.
The parameters (3), (4), (13) and (14) are not set according to the specific objective function, but determined, accurate and unchanged. Without the value of theoretical analysis,there is no need to discuss truncation error and rounding error.
If a local coordinate system is established in the design space, the partial derivative of the objective function can hardly be obtained, but the numerical partial derivative can be obtained using the numerical differential algorithm as in the global coordinate system. After optimizing in a certain direction, it is better to search in the direction of orthogonality again. An orthogonal direction vector group is set up whose last direction is the optimization direction. The orthogonal vector group can be regarded as the coordinate vector group of the local coordinate system. The coordinate vector group is then the coordinate transformation matrixAbetween the two coordinate systems. If the origin of the local coordinate system isx(0)in the fixed coordinate system, then a pointx′in the local coordinate system is shown in the global coordinate system as follows:
The vectors′in the local coordinate system is shown in the global coordinate system as shown below:
Set the previous optimization directionSas the last vector αN. Let the largest element ofS(the element with the largest absolute value) be the elementi. Then in theN×Nunit matrix, it corresponds to the column vectori. In order, the column vector of the unit matrix acts as other vectors; thus we can obtain the following linear independent vectors
The modified Schmitt orthogonalization method is used to perform orthogonalization [28], [29]. We use the following definitions and change vector to unitization vectors with two norm
Among them,kis an undetermined coefficient. According to the orthogonality between vectors, the following formula can be obtained. We then change the vectors to unitization vectors with two norm
The square matrix composed of unitized vectors β1,β2,...,βNis the coordinate transformation matrix of (15)and (16).
The orthogonal vector group can also makeSthe last coordinate axis of the local coordinate system after coordinate transformation, so as to obtain the last column elements ofAaccording to (16). Then, according to the type of coordinate transformation, other elements ofAcan be obtained.
For a non-quadratic objective function, the optimization direction obtained from the numerical gradient and the second partial derivative matrix constructed by 1, 2, 3 in different local coordinate systems is different.
The second approximate direction directly points to the extreme point of the quadratic objective function. Such a numerical direction is good. Another new algorithm is as follows:
In order to make full use of the optimization effect obtained by the detection point, after each detection of the best point,the best point is taken as the current pointx, and the originalxis taken as the reserve pointx(i)(iis the coordinate axis serial number). If neither probe point is good enough, halve the step size and continue to probe until the step size is less than the termination condition value. The next best point is taken as the reserved pointx(i), and then, from the current point (the best point) detect along the next coordinate axis.After sequential detection alongNcoordinate axes, the numerical negative gradient or numerical second approximation direction is constructed. A step optimization is achieved after one dimensional optimization until the current pointxis close enough to the reserve pointx(i).
Due to the large distance between the detection points, the numerical first partial derivatives and second partial derivatives have large errors. Instead of calculating the mixed partial derivative and one-dimensional optimization in the new direction, the backward half-step searching algorithm[30] based on the blind path-finding idea is used to find better points according to the current step size. The optimization scheme is shown in Fig. 2.
Because of the continuity of the feasible region, the initial point can be chosen arbitrarily. If the step size is too large, the optimization direction may deviate too much from the negative gradient direction and the optimization effect would not be good; if the step size is too small, it causes a sawtooth phenomenon because it is too close to the negative gradient direction. Therefore, the initial step size should be determined according to the specific optimization problem, and a better value can be determined by trial calculation.
A. The Two-Dimensional Example
The two-dimensional Rosenbrock function is the four square function, and it is one of the most difficult examples to test the optimization algorithm with. The unconstrained optimization problem whose objective function is as follows:
The gradient of the objective function is as follows:
The second partial derivative matrix of the objective function is as follows:
The objective function consists of two components, each of which has a minimum value of zero. A system of two variables and quadratic equations can be obtained, and the extreme points and extremes can be obtained as follows:
Fig. 2. The program diagram of large step numerical algorithm based on coordinate transformation.
Take the initial point and the micro step length as follows:
The objective function value, numerical gradient and second partial derivative matrix at this point are as follows:
The relative errors between the numerical solutions obtained from (2), (4), (13), and (14) and the analytical solutions are all less than 1 0−5.
In this paper, the numerical second approximate direction algorithm and the large step numerical algorithm based on coordinate transformation are realized by theClanguage. The blind searching method [31] is used in one dimensional optimization.
B. The Test of Numerical Second Approximate Direction Method
The optimal direction is obtained by constructing the numerical diagonal second partial derivative matrix from (4).After one-dimensional optimization is performed 3955 times,the objective function is calculated 156 517 times. The optimal point and optimal solution are obtained as follows:
The optimization process is shown in Fig. 3, solid line 1.
Fig. 3. The seeking process of the numerical second-order approximate direction method.
The optimal direction is obtained by constructing the numerical symmetric second partial derivative matrix from(4), (13), (14). After one-dimensional optimization is performed 11 times, the objective function is calculated 445 times, and the optimal point and solution are obtained as follows:
With the numerical second mixed partial derivative, the optimization calculation amount is 0.2843% of without it. The optimization process is shown in the gem-shaped point and dashed line 2 in Fig. 3. The result is similar to that of the analytical second approximate direction method.
A local coordinate system is established for each optimization.After one-dimensional optimization is performed 244 times, the objective function is called 9884 times, and the optimal point and solutions are obtained as follows:
Its optimization process is shown in Fig. 3, dotted line 3.Coordinate transformation is not suitable.
C. The Test of the Constructing Search Direction in Local Coordinate System
In order to test the influence of coordinate transformation on the numerical optimization direction, a local coordinate system is established withx(0)as the origin, and the local coordinate system is rotated counterclockwise 360°.
The numerical direction derivative of the fixed coordinate system is used as the numerical partial derivative of the local coordinate system to construct the optimization direction. In the above rotation process, the trajectory of the numerical optimization direction in the local coordinate system is shown in the plus point and dashed line 1 in Fig. 4. It is not evenly distributed, indicating that it is not a fixed direction. In the fixed coordinate system, the trajectory is not a fixed direction,but varies within the range 103.05°, as shown by the asterisk point and dotted line 2 in Fig. 4. It is clear that the calculated results are consistent with those of the 5 analyses.
Fig. 4. Numerical direction trajectories in the rotation process of local coordinate system.
D. Test of the Large Step Numerical Optimization Algorithm
The initial step size, the minimum initial step length, and the termination condition value are as follows:
In the first optimization round, the following points are detected along theNaxis:
According to (27), the analytic gradient of the objective function at this point is as follows:
The numerical gradient obtained by (9) is as follows:
The difference between them is quite large. Then we obtain theN(2) midpoint, and the numerical gradient obtained by(11) is as follows:
Using (9), after 1437 detection rounds, the objective function is called 89 155 times, and the optimal point and solutions are obtained as follows:
The optimization process is shown in Fig. 5, dotted line 1.Adopting (11), the objective function must be called 87 065 times in 1726 detection rounds.
Fig. 5. The search process of the large step numerical algorithm based on coordinate transformation.
Then the numerical optimization direction of (4) and (21) is as follows:
After 1430 detection rounds, we call the objective function 70 621 times to get the optimal point and solution.
The optimization process is shown in Fig. 5, solid line 2.
The numerical gradient is obtained by means of 5 coordinate transformation and (9). After 144 detection rounds,we call the objective function 4717 times to get the optimal point and solution
The optimization process is shown in the dashed line 3 of Fig. 5. Compared with the black dotted line, the computation amount is reduced by 94.71%. Due to the large step size, the numerical negative gradient direction obtained by (9) has a large error. If the numerical direction is rotated 90, after 243 rounds of detection, the objective function is called 7872 times, and the optimal point and solution are obtained as follows:
The calculation amount is not large, which is consistent with the analysis of 7.3. In the process of approaching, the extremum point is skipped over for several times, but not arrived at step by step. After the numerical negative gradient direction is constructed by (11) and compared with the dotted line 1, the optimal solution is 2.4764×10−15after 14 975 detection rounds and 944 941 calls to the objective function.There is a very serious zigzag phenomenon because it is too close to the analytic negative gradient direction.
E. The Three-Dimensional Example
The three-dimensional Rosenbrock function is one of the most difficult multi-dimensional examples to test the optimization algorithm with. The unconstrained optimization problem whose objective function is as follows:
The gradient of the objective function is as follows:
The second partial derivative matrix of the objective function is as follows:
The objective function consists of 4 components, each of which has a minimum value of 0. A system of three variable quadratic equations can be obtained, and the extreme points and extremes can be obtained as follows:we take the initial point as follows:
Adopting the analytic 2 order approximation direction, after 16 rounds of one-dimensional optimization, the objective function is calculated 614 times, and the optimal point and solution are obtained as follows:
Adopting the numerical two order approximation direction,if we do not calculate the numerical mixed partial derivative,after 446 rounds of one-dimensional optimization, the objective function is calculated 20 038 times, and the optimal point and solution are obtained as follows:
Using (13) or (14) to calculate the numerical mixed partial derivative, after 16 rounds of one-dimensional optimization,the objective function is calculated 614 times, and the optimal point and solution are obtained as follows:
Taking more than one point for each mixed partial derivative, it is then computed by using (13) or (14). After 16 rounds of one-dimensional optimization, the objective function is calculated 660 times, and the optimal point and solution are obtained as follows:
The mathematical deduction in this paper is concise and easy to understand. A lot of tedious deduction processes are omitted. The key derivation steps and principles are explained in the text and it is similar to many classical numerical differentiation formulas (difference formula).
The purpose of this paper is to broaden the scope of existing optimization methods. It should test the approximation between numerical and analytical solutions. If the two approaches are close enough, the optimization effect of the original optimization method will not be affected. The Rosenbrock function is four square, with a curved canyon. At the bottom of the canyon, the objective function contour is close to the parallel, and there are no characteristics of the quadratic function at all. The optimization methods with the optimal direction are all based on the quadratic objective function hypothesis. Some optimization algorithms cannot obtain the effective direction at the bottom of the valley.Therefore, this example is one of the most difficult examples to test the optimization algorithm with. The successful verification of the numerical example is sufficient to prove the optimization effect of numerical optimization direction.
Conjugate directions needNto expand enough in optimization space, and more than three-dimensional examples are needed to verify the optimization effect. The research in this paper does not involve the verification of conjugate directions. Two-dimensional examples can not only illustrate the problem, but also visually represent the optimization process. Even if the numerical algorithm of the conjugate direction is studied, a three-dimensional numerical example is not necessary, because if the analytical direction is not good, the numerical direction will not be good.
For multimodal objective functions, different initial points may have different optimization results. For a single peak objective function, the optimization algorithm with a certain direction will not yield different optimal points because of the different initial points. The initial point of the example in this paper is taken on a flat slope on the side of the canyon, which is far from the extreme value point. It not only checks the speed of searching to the canyon bottom, but also checks the effect of approaching the extreme value point at the canyon bottom.
In this paper, we compare different algorithms using computation and optimization results, and we show that it is not necessary to use an elegant intelligent algorithm. This algorithm is not comparable with an evolutionary algorithm or swarm intelligence optimization algorithm. The latter is an optimization method based on other theories, and does not require gradient information such as in the Powell method[32]. However, there is no concept of optimizing direction; it is an algorithm that seeks order in disorder; it is an algorithm that explores necessity in occasionality. The algorithm has obvious advantages for multi-modal feasible regions, but for single-modal feasible regions, it is not as effective as an algorithm with a definite direction. The numerical direction constructed by the numerical method is not more accurate than that obtained by the analytical method, and both are suitable for a single-peak feasible region.
Through the verification of several numerical algorithms with two-dimensional Rosenbrock objective function optimization, the following conclusions can be drawn:
1) The numerical formulas for calculating the first, second and second mixed partial derivatives have higher accuracy.The numerical first partial derivatives are in agreement with the results obtained by the method of constructing Lagrange interpolation functions. Thus, these numerical formulas are correct. Because it is difficult to obtain second partial derivatives, research in optimization methods that only need the gradient of objective function have been explored. This paper continues the research in this area. At present, there are some research directions involving using numerical direction to improve the accuracy of optimization. The research based on the quadratic hypothesis also ends this direction.
2) The numerical second approximate direction method with numerical second mixed partial derivatives is the best.
3) The optimization direction obtained in a local coordinate system has strong randomness, and the large step numerical algorithm requires it.
4) Based on the quadratic hypothesis and fitting method, the numerical algorithm proposed in this paper allows the derivatives, gradients, second partial derivatives and second mixed partial derivatives of the objective function to remain unresolved, which makes optimization methods that require this information more applicable. There is no derivable objective function for optimization problems in many engineering fields. This study provides more optimization methods for solving these optimization problems.
5) Because constrained optimization problems are often encountered in control processes, future research on constrained optimization methods may become the focus of our research.
IEEE/CAA Journal of Automatica Sinica2020年3期