Quasi-Newton or variable metric methods can be used when the Hessian
matrix is difficult or time-consuming to evaluate. Instead of obtaining an estimate of the
Hessian matrix at a single point, these methods gradually build up an approximate Hessian
matrix by using gradient information from some or all of the previous iterates
visited by the algorithm. Given the
current iterate
, and the approximate Hessian matrix
at
, the linear system
is solved to generate a direction
. The next iterate is then found by performing a line
search along
and setting
. The
question is then: How can we use the function and gradient information from points
and
to improve the quality of the
approximate Hessian matrix
? In other words, how do we obtain the new approximate Hessian
matrix
from the previous
approximation
?
The key to this question depends on what is sometimes called the fundamental theorem of integral calculus. If we define
then this theorem implies that
The matrix in braces can be interpreted as the average of the Hessian matrix on the
line segment
. This result states
that when this matrix is multiplied by the vector
, the resulting vector is
. In view of these observations, we can make
mimic the behavior of
by enforcing the quasi-Newton
condition
![]()
This condition can be satisfied by making a simple low-rank update to
. The most
commonly used family of updates is the Broyden class of rank-two updates, which have the
form
where
and
The choice
gives the
Broyden-Fletcher-Goldfarb-Shanno update, which practical experience, and some theoretical
analysis, has shown to be the method of choice in most circumstances. The
Davidon-Fletcher-Powell update, which was proposed earlier, is obtained by setting
. These two update formulae are known
universally by their initials BFGS and DFP, respectively.
Updates in the Broyden class remain positive definite as long as
. Although the previous condition holds
automatically if
is strictly
convex, it can be enforced for all functions by requiring that
satisfy the curvature condition. Some codes avoid
enforcing the curvature condition by skipping the update if
.
GAUSS , IMSL , MATLAB , NAG(FORTRAN) , NAG(C) , OPTIMA , and PROC NLP implement quasi-Newton
methods. These codes differ in the choice of update (usually BFGS), line-search procedure,
and the way in which
is stored and updated. We can update
by either updating the Cholesky
decomposition of
or by updating the inverse of
. In either case, the cost of
updating the search direction by solving the system
is on the order of
operations. Updating the Cholesky factorization is
widely regarded as more reliable, while updating the inverse of
is less complicated. Indeed, if
we define
![]()
then a BFGS update of
is equivalent to the following update of
:
When we store
explicitly, the direction
is obtained from the
matrix-vector product
.
The availability of quasi-Newton methods renders steepest-descent methods obsolete. Both types of algorithms require only first derivatives, and both require a line search. The quasi-Newton algorithms require slightly more operations to calculate an iterate and somewhat more storage, but in almost all cases, these additional costs are outweighed by the advantage of superior convergence.
At first glance, quasi-Newton methods may seem unsuitable for large
problems because the approximate Hessian matrices and inverse Hessian matrices are
generally dense. This is not the case: The explicit storage of
or
as
matrices is not necessary. For example, the above
expression for the BFGS update of
makes it clear that we can compute
![]()
if we know the initial matrix
; the subsequent update vectors
; and their inner products
for
. If
is chosen to be a diagonal
matrix, the necessary information can be stored in about
words of memory. Limited-memory quasi-Newton
methods make use of these ideas to cut down on storage for large problems. They store only
the
and
vectors from the previous few iterates (typically, five)
and compute the vector
by a recursion that requires roughly 16
operations. The LBFGS code is an implementation of the
limited-memory BFGS algorithms. The codes M1QN2 and M1QN3 are the same as LBFGS, except
that they allow the user to specify a preconditioning technique.
Up To:
![]()
[ OTC Home Page | NEOS Guide | NEOS Server | Optimization Tree ]
Updated 28 March 1996