Unconstrained Optimization


The unconstrained optimization problem is central to the development of optimization software. Constrained optimization algorithms are often extensions of unconstrained algorithms, while nonlinear least squares and nonlinear equation algorithms tend to be specializations. In the unconstrained optimization problem, we seek a local minimizer of a real-valued function, f(x), where x is a vector of real variables. In other words, we seek a vector, x*, such that
f(x*) <= f(x) for all x close to x*.

Global optimization algorithms try to find an x* that minimizes f over all possible vectors x. This is a much harder problem to solve. We do not discuss it here because, at present, no efficient algorithm is known for performing this task. For many applications, local minima are good enough, particularly when the user can draw on his/her own experience and provide a good starting point for the algorithm.

Newton's method gives rise to a wide and important class of algorithms that require computation of the gradient vector

and the Hessian matrix,

Although the computation or approximation of the Hessian can be a time-consuming operation, there are many problems for which this computation is justified. We describe algorithms in which the user supplies the Hessian explicitly before moving on to a discussion of algorithms that don't require the Hessian.

Newton's method forms a quadratic model of the objective function around the current iterate . The model function is defined by

.

In the basic Newton method, the next iterate is obtained from the minimizer of . When the Hessian matrix, , is positive definite, the quadratic model has a unique minimizer that can be obtained by solving the symmetric linear system:

The next iterate is then

Convergence is guaranteed if the starting point is sufficiently close to a local minimizer x* at which the Hessian is positive definite. Moreover, the rate of convergence is quadratic, that is,

for some positive constant .

In most circumstances, however, the basic Newton method has to be modified to achieve convergence.

Versions of Newton's method are implemented in the following software packages:

BTN, GAUSS, IMSL, LANCELOT, NAG, OPTIMA, PORT 3, PROC NLP, TENMIN, TN, TNPACK, UNCMIN, and VE08.

The NEOS Server also has an unconstrained minimization facility to solve these problems remotely over the Internet.

These codes obtain convergence when the starting point is not close to a minimizer by using either a line-search or a trust-region approach.

The line-search variant modifies the search direction to obtain another a downhill, or descent direction for f. It then tries different step lengths along this direction until it finds a step that not only decreases f, but also achieves at least a small fraction of this direction's potential.

The trust-region variant uses the original quadratic model function, but they constrain the new iterate to stay in a local neighborhood of the current iterate. To find the step, then, we have to minimize the quadratic subject to staying in this neighborhood, which is generally ellipsoidal in shape.

Line-search and trust-region techniques are suitable if the number of variables is not too large, because the cost per iteration is of order . Codes for problems with a large number of variables tend to use truncated Newton methods , which usually settle for an approximate minimizer of the quadratic model.

So far, we have assumed that the Hessian matrix is available, but the algorithms are unchanged if the Hessian matrix is replaced by a reasonable approximation. Two kinds of methods use approximate Hessians in place of the real thing:

The first possibility is to use difference approximations to the exact Hessian. We exploit the fact that each column of the Hessian can be approximated by taking the difference between two instances of the gradient vector evaluated at two nearby points. For sparse Hessians, we can often approximate many columns of the Hessian with a single gradient evaluation by choosing the evaluation points judiciously.

Quasi-Newton Methods build up an approximation to the Hessian by keeping track of the gradient differences along each step taken by the algorithm. Various conditions are imposed on the approximate Hessian. For example, its behavior along the step just taken is forced to mimic the behavior of the exact Hessian, and it is usually kept positive definite.

Finally, we mention two other approaches for unconstrained problems that are not so closely related to Newton's method:

Nonlinear conjugate gradient methods are motivated by the success of the linear conjugate gradient method in minimizing quadratic functions with positive definite Hessians. They use search directions that combine the negative gradient direction with another direction, chosen so that the search will take place along a direction not previously explored by the algorithm. At least, this property holds for the quadratic case, for which the minimizer is found exactly within just n iterations. For nonlinear problems, performace is problematic, but these methods do have the advantage that they require only gradient evaluations and do not use much storage.

The nonlinear Simplex method (not to be confused with the simplex method for linear programming) requires neither gradient nor Hessian evaluations. Instead, it performs a pattern search based only on function values. Because it makes little use of information about f, it typically requires a great many iterations to find a solution that is even in the ballpark. It can be useful when f is nonsmooth or when derivatives are impossible to find, but it is unfortunately often used when one of the algorithms above would be more appropriate.


Up To:

* Continuous Optimization.

Down To:

* Nonlinear Least Squares

* Nonlinear Equations

* Global Optimization

* Nondifferentiable Optimization


treesig.gif (5961 bytes)

[ OTC Home Page | NEOS Guide | NEOS Server | Optimization Tree ]


Updated 28 March 1996