codegen[GRADIENT] - compute the Gradient of a Maple procedure
|
Calling Sequence
|
|
GRADIENT(F)
GRADIENT(F, X)
|
|
Parameters
|
|
F
|
-
|
Maple procedure
|
X
|
-
|
list of symbols (parameters of F)
|
|
|
|
|
Description
|
|
•
|
The first argument F is a Maple procedure which computes a function of x1,x2,...,xn. The GRADIENT command outputs a new procedure G, which when executed at given values for x1,x2,...,xn, returns a vector of the partial derivatives of F w.r.t. x1,...,xn at the given values. This is known as automatic differentiation. It often leads to a more efficient computation than symbolic differentiation, that is, what you would obtain from using linalg[grad]. For example, given
|
F := proc(x,y) local t;
|
t := exp(-x);
|
y*t+t
|
end proc;
|
|
|
|
The output of G := GRADIENT(F); is the Maple procedure
|
G := proc(x, y) local df, t;
|
t := exp(-x);
|
df := array(1 .. 1);
|
df[1] := y + 1;
|
return -df[1]*exp(-x), t
|
end proc
|
|
|
|
When G is called with arguments , the output is the vector . G can subsequently be optimized by optimize(G) so that the exponential is computed only once, to obtain
|
proc(x, y)
|
local df, t;
|
t := exp(-x);
|
df := array(1 .. 1);
|
df[1] := y + 1;
|
return -df[1]*t, t
|
end proc
|
|
|
|
One can obtain derivatives of any function which can be represented by a computation constructed by composition of arithmetic operators and any mathematical functions, including functions containing loops and making subroutine calls.
|
|
The remaining arguments to GRADIENT are optional, they are described in the following. There are a number of limitations on the Maple procedure F that the GRADIENT command can handle. These are also discussed below.
|
•
|
By default, GRADIENT computes the partial derivatives of F w.r.t. all formal parameters present in F. The optional argument X, a list of symbols, may be used to specify which formal parameters to take the derivative w.r.t. For example, the call GRADIENT(F,[x]); computes the partial derivative of F with respect to the formal parameter x only. The result is
|
proc(x, y)
|
local df, t;
|
t := exp(-x);
|
df := array(1 .. 1);
|
df[1] := y + 1;
|
return -df[1]*exp(-x)
|
end proc
|
|
|
|
The procedure F may not refer to it's parameters using args[i]. For example, you may not define F as follows
|
F := proc() local t;
|
t := exp(-args[1]);
|
args[2]*t+t
|
end proc;
|
|
|
|
Nor is it possible to differentiate w.r.t. an array of parameters. For example, you cannot presently define F as
|
F := proc(x::array(1..2)) local t;
|
t := exp(-x[1]);
|
x[2]*t+t
|
end proc;
|
|
|
|
and compute the gradient w.r.t. the array x.
|
•
|
Two algorithms are supported, the so-called forward and reverse modes. By default, GRADIENT tries to use the reverse mode since it usually leads to a more efficient code. If it is unable to use the reverse mode, the forward mode is used. The user may specify which algorithm is to be used by giving the optional argument mode=forward or mode=reverse.
|
•
|
The advantage of the reverse mode is that the cost of computing the gradient is usually cheaper than the forward mode, especially if the number of partial derivatives n is high. It is also cheaper than symbolic differentiation (Maple's diff command) and divided differences. Specifically, if the cost of computing F is m arithmetic operations, then the cost of computing the Gradient of F using the forward and reverse modes is O(m n) and O(m+n) respectively. However, the best result is usually obtained by pre-optimizing the input procedure F and post-optimizing the output procedure of Gradient. Also, before applying GRADIENT it is best to split up long products in F. This can be done with the codegen[split] command.
|
•
|
The vector of partial derivatives is, by default, returned as a sequence. The optional argument result_type=list, result_type=array, or result_type=seq specifies that the vector of derivatives returned by G is to be a Maple list, array, and sequence respectively. For example, the call GRADIENT(F,result_type=array); yields
|
proc(x, y)
|
local df, grd, t;
|
t := exp(-x);
|
df := array(1 .. 1);
|
df[1] := y + 1;
|
grd := array(1 .. 2);
|
grd[1] := - df[1]*exp(-x);
|
grd[2] := t;
|
return grd
|
end proc
|
|
|
•
|
The optional argument function_value=true causes GRADIENT to compute the function value at the same time as the Gradient. It is returned as the first value in the vector. For example the call GRADIENT(F,function_value=true); yields
|
proc(x, y)
|
local df, t;
|
t := exp(-x);
|
df := array(1 .. 1);
|
df[1] := y + 1;
|
return y*t + t, - df[1]*exp(-x), t
|
end proc
|
|
|
•
|
See also codegen[HESSIAN] and codegen[JACOBIAN] for routines for computing Hessians and Jacobians using automatic differentiation of functions represented by Maple procedures. Also, the derivative operator D in Maple can be used to compute a derivative of a Maple procedure in one variable.
|
•
|
The command with(codegen,GRADIENT) allows the use of the abbreviated form of this command.
|
|
|
Examples
|
|
>
|
|
>
|
F := proc(x,y) local t; t := x*y; x+t-y*t; end proc;
|
| (1) |
>
|
|
| (2) |
>
|
|
| (3) |
>
|
|
| (4) |
>
|
|
| (5) |
>
|
|
| (6) |
>
|
|
| (7) |
>
|
|
| (8) |
>
|
|
| (9) |
>
|
|
| (10) |
>
|
|
| (11) |
>
|
|
| (12) |
This next example illustrates pre and post optimization. The gradient is computed with respect to phi and omega only. Since the torus procedure returns three values, the Gradient is really the Jacobian matrix of partial derivatives.
>
|
torus := proc(phi,omega,R,r) local x,y,z;
x := cos(phi)*(R+r*cos(omega));
y := sin(phi)*(R+r*cos(omega));
z := r*sin(omega);
[x,y,z]
end proc:
|
>
|
|
| (13) |
>
|
|
| (14) |
>
|
|
| (15) |
We redo the same example but this time using the makeproc command to first build the torus procedure, returning an array instead of a list.
>
|
|
>
|
torus := optimize( makeproc(A,[phi,omega,R,r]) );
|
| (16) |
>
|
|
| (17) |
This example shows that local procedures are handled.
>
|
f := proc(x) local s, t;
s := proc(x) local e; e := exp(x); (e-1/e)/2 end proc;
t := s(x^2);
1-x*t+x^2*t;
end proc:
|
>
|
|
| (18) |
This final example illustrates the need for breaking up large products.
>
|
F := proc(u,v,w,x,y,z) u*v*w*x*y*z end proc;
|
| (19) |
>
|
|
| (20) |
>
|
|
| (21) |
>
|
|
| (22) |
>
|
|
| (23) |
>
|
|
| (24) |
|
|