侧边栏壁纸
博主头像
qiql博主等级

水能载舟,亦可赛艇

  • 累计撰写 31 篇文章
  • 累计创建 28 个标签
  • 累计收到 4 条评论

目 录CONTENT

文章目录

🌈有限元分析学习笔记(二):最速下降法与共轭梯度法

qiql
2022-07-14 / 0 评论 / 1 点赞 / 452 阅读 / 9,788 字

最速下降法

定理 1

考虑线性方程组:

Ax=b,  A∈Rn×n,  A对称正定\bold A \bold x = \bold b,\;A \in \bold R^{n\times n},\;A对称正定

和二次泛函

φ(x)=xTAx−2bTx\varphi(\bold x) = \bold x^T\bold A \bold x-2\bold b^T\bold x

有定理 1:设 A\bold A 对称正定,求方程组 Ax=b\bold A \bold x = \bold b 的解等价与求二次泛函 φ(x)\varphi (x) 的极小值点。

即求 x∗∈Rn\bold x^* \in \bold R^n 使得 φ(x∗)=min⁡x∈Rφ(x)\varphi(\bold x^*) = \mathop{\min}\limits_{x \in R} \varphi(\bold x)

证明:

必要性

设 x∗\bold x^* 是 φ(x∗)\varphi(\bold x^*) 在RnR^n 上的极小值点,则 grad(φ(x∗))=∇φ(x)=0grad(\varphi (\bold x^*)) = \nabla \varphi (\bold x)=0

grad(φ(x∗))=(∂φ∂x1∂φ∂x2⋮∂φ∂xn)grad(\varphi(\bold x^*)) = \begin{pmatrix} \frac{\partial \varphi}{\partial x_1}\\ \frac{\partial \varphi}{\partial x_2}\\ \vdots\\ \frac{\partial \varphi}{\partial x_n}\\ \end{pmatrix}

下面开始求 φ(x)\varphi(x) :

φ(x)=xTAx−2bTx=[x1,  x2,  ⋯ ,  xn][a11a12⋯a1na21a22⋯a2n⋮⋮⋮an1an2⋯ann][x1x2⋮xn]−2[b1,  b2,  ⋯ ,  bn][x1x2⋮xn]=[∑i=1nai1xi,  ∑i=2nai2xi,  ⋯ ,  ∑i=nnainxi][x1x2⋮xn]−2∑i=1nbixi=∑i=1nai1xix1+∑i=1nai2xix2+  ⋯+∑i=1nainxixn−2∑i=1nbixi=∑j=1n∑i=1naijxixj−2∑i=1nbixi\begin{aligned} \varphi(\bold x) & = \bold x^T\bold A \bold x -2\bold b^T \bold x \\&= \begin{bmatrix} x_1,\;x_2,\;\cdots,\;x_n \end{bmatrix} \begin{bmatrix} a_{11}&a_{12}&\cdots&a_{1n}\\ a_{21}&a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&&\vdots\\ a_{n1}&a_{n2}&\cdots&a_{nn} \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{bmatrix} -2 \begin{bmatrix} b_1,\;b_2,\;\cdots,\;b_n \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{bmatrix}\\ & = \begin{bmatrix} \sum\limits_{i=1}^na_{i1}x_i,\;\sum\limits_{i=2}^na_{i2}x_i,\;\cdots,\;\sum\limits_{i=n}^na_{in}x_i \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{bmatrix} -2\sum\limits_{i=1}^nb_ix_i\\ & = \sum\limits_{i=1}^na_{i1}x_ix_1+\sum\limits_{i=1}^na_{i2}x_ix_2+\;\cdots+\sum\limits_{i=1}^na_{in}x_ix_n-2\sum\limits_{i=1}^nb_ix_i\\ & = \sum\limits_{j=1}^n\sum\limits_{i=1}^na_{ij}x_ix_j-2\sum\limits_{i=1}^nb_ix_i\\ \end{aligned}

求偏导:

∂φ(x)∂xk=ak1x1+ak2x2+ak,k−1xk−1+∂φ(xk)xk+ak,k+1xk+1+⋯+ak,nxn−2bk\frac{\partial \varphi(\bold x)}{\partial x_k}=a_{k1}x_1 + a_{k2}x_2 + a_{k,k-1}x_{k-1} + \frac{\partial \varphi(x_k)}{x_k} +a_{k,k+1}x_{k+1} + \cdots + a_{k,n}x_n - 2b_k

上式中,把 ∂φ(xk)xk\frac{\partial \varphi(x_k)}{x_k} 单独拎出来计算:

∂φ(xk)xk=∂(∑i=1naikxixk)xk=∑i=1,i≠knaikxi+2akkxk\frac{\partial \varphi(x_k)}{x_k} = \frac{\partial\left(\sum\limits_{i=1}^na_{ik}x_ix_k\right)}{x_k}= \sum\limits_{i=1,i\neq k}^na_{ik}x_i + 2a_{kk}x_k

又由于 A\bold A 矩阵是对称的,即 A=AT,  aki=aik\bold A = \bold A^T ,\; a_{ki} = a_{ik}
整理一下,得到:

∂φ(x)∂xK=2∑i=1nakixi−2bk\frac{\partial \varphi(\bold x)}{\partial x_K}=2\sum\limits_{i=1}^na_{ki}x_i - 2b_k

那么,

grad(φ(x∗))=(∂φ∂x1∂φ∂x2⋮∂φ∂xn)=2(∑i=1na1ixi∑i=1na2ixi⋮∑i=1nanixi)−2(b1b2⋮bn)=2(Ax−b)=−2r  ,  (r=b−Ax)grad(\varphi(\bold x^*)) = \begin{pmatrix} \frac{\partial \varphi}{\partial x_1}\\ \frac{\partial \varphi}{\partial x_2}\\ \vdots\\ \frac{\partial \varphi}{\partial x_n}\\ \end{pmatrix} = 2\begin{pmatrix} \sum\limits_{i=1}^na_{1i}x_i\\ \sum\limits_{i=1}^na_{2i}x_i\\ \vdots\\ \sum\limits_{i=1}^na_{ni}x_i\\ \end{pmatrix} -2 \begin{pmatrix} b_1\\ b_2\\ \vdots\\ b_n\\ \end{pmatrix} =2(\bold A\bold x - \bold b) = -2\bold r\;, \; (r = \bold b - \bold A\bold x)

所以,如果 Ax∗=b\bold A\bold x^* = \bold b ,那么,x∗\bold x^* 就是 Ax=b\bold A \bold x = \bold b 的解

充分性

若 Ax∗=b\bold A\bold x^* = \bold b ,则 ∀y∈Rn\forall \bold y\in \bold R^n ,有

φ(x∗+y)=(x∗+y)TA(x∗+y)−2bT(x∗+y)=x∗TAx∗+x∗TAy+yTAx∗+yTAy−2bTx∗−2bTy=x∗TAx∗+bTy+yTb+yTAy−2bTx∗−2bTy            (AT=A,  Ax∗=b,所以x∗TAy=x∗TATy=Ax∗T=bTy)=x∗TAx∗+yTAy−2bTx∗            (bTy和yTb是相等的,都是数字,所以可以跟后面的−2bTy  消掉)=φ(x∗)+yTAy⏟一定为非负数≥φ(x∗)\begin{aligned} \varphi(\bold x^*+\bold y) &=(\bold x^*+\bold y)^T\bold A (\bold x^*+\bold y)-2\bold b^T(\bold x^*+\bold y)\\ & = \bold {x^*}^T\bold A \bold x^* + \bold {x^*}^T\bold A\bold y + \bold y^T\bold A \bold x^*+\bold y^T\bold A\bold y-2\bold b^T\bold x^* - 2\bold b^T\bold y\\ & = \bold {x^*}^T\bold A \bold x^* + \bold b^T\bold y + \bold y^T\bold b+\bold y^T\bold A\bold y-2\bold b^T\bold x^* - 2\bold b^T\bold y\;\; \\&\;\;\;\;(\bold A^T = \bold A, \; \bold A\bold x^* = \bold b,所以\bold {x^*}^T\bold A\bold y = \bold {x^*}^T\bold A^T\bold y = {\bold A\bold x^*}^T\bold =\bold b^T\bold y)\\ & = \bold {x^*}^T\bold A \bold x^* + \bold y^T\bold A\bold y-2\bold b^T\bold x^*\;\; \\&\;\;\;\;(\bold b^T\bold y和\bold y^T\bold b是相等的,都是数字,所以可以跟后面的- 2\bold b^T\bold y \;消掉)\\ & = \varphi(\bold x^*) \underbrace{ + \bold y^T\bold A\bold y }_{一定为非负数} \geq \varphi(\bold x^*) \end{aligned}

∴x∗\therefore x^* 使得 φ(x)\varphi(\bold x) 达到最小

证毕

计算思路

通过前面的定理,那么用最速下降法求方程组就相当于求二次泛函的极小值点,求这个极小值点的方式类似于盲人下山

(1)对给定的初始向量 x0\bold x_0 ,确定一个下山方向 P0\bold P_0 ,沿着直线 x=x0+αP0\bold x = \bold x_0 + \alpha \bold P_0 寻找点 x1=x0+α0P0\bold x_1 = \bold x_0 + \alpha_0 \bold P_0 使得 φ(x0+α0P0)≤φ(x0+αP0),  α∈R\varphi(\bold x_0+\alpha_0\bold P_0) \leq \varphi(\bold x_0+\alpha \bold P_0),\;\alpha \in \bold R
(2)类似地,找点 x2=x1+α1P1\bold x_2 = \bold x_1+\alpha_1 \bold P_1,使得 φ(x2)=φ(x1+α1P1)≤φ(x1+αP1),  α∈R\varphi(\bold x_2)=\varphi(\bold x_1+\alpha_1\bold P_1) \leq \varphi(\bold x_1+\alpha \bold P_1),\;\alpha \in \bold R
     …
(k)找点 xk=xk−1+αk−1Pk−1\bold x_k = \bold x_{k-1} + \alpha_{k-1}\bold P_{k-1} ,使得 φ(xk)=φ(xk−1+αk−1Pk−1)≤φ(xk−1+αPk−1),  α∈R\varphi(\bold x_k)=\varphi(\bold x_{k-1}+\alpha_{k-1}\bold P_{k-1}) \leq \varphi(\bold x_{k-1}+\alpha \bold P_{k-1}),\;\alpha \in \bold R
     …
  直到 ∣∣rk∣∣=∣∣b−Axk∣∣≤tol||\bold r_k|| = ||\bold b - \bold A \bold x_k || \leq tol ,toltol 是能够容许的误差,通常为一个极小的值

在计算过程中,需要确定步长 αk\alpha_k 和搜索方向 Pk\bold P_k
首先确定搜索方向,应该是 φ(x)\varphi(\bold x) 减少速度最快的方向,也就是负梯度方向:Pk=rk=b−Axk\bold P_k = \bold r_k = \bold b - \bold A \bold x_k
然后确定步长 αk\alpha_k ,使得 f(αk)=min⁡αf(α)=min⁡αφ(xk+αPk)=φ(xk+αkPk)f(\alpha_k) = \mathop{\min}\limits_{\alpha}f(\alpha)=\mathop{\min}\limits_{\alpha}\varphi(x_k+\alpha P_k)=\varphi(\bold x_k+\alpha_k \bold P_k)
所以 αk\alpha_k 应为满足 f′(α)=0f'(\alpha) = 0 的解
下面确定 αk,  Pk\alpha_k,\;\bold P_k

f(α)=φ(xk+αPk)=(xk+αPk)TA(xk+αPk)−2bT(xk+αPk)=xkTAxk+αxkTAPk+αPkTAxk⏟这两项其实是一样的,可以用转置的定义推导得到+α2PkTAPk−2bTxk−2αbTPk=xkTAxk+2αxkTAPk‾+α2PkTAPk−2bTxk−2αbTPk‾=xkTAxk−2αxkTAPk+α2PkTAPk−2bTxk          (rk=b−Axk,  2αxkTAPk−2αbTPk=2α((Axk)T−bT)Pk=−2αrkTpk  )=α2PkTAPk−2αrkTpk+φ(xk)\begin{aligned} f(\alpha) & = \varphi(\bold x_k + \alpha \bold P_k) = (\bold x_k + \alpha\bold P_k)^T\bold A (\bold x_k+\alpha \bold P_k)-2\bold b^T(\bold x_k+\alpha \bold P_k)\\ & = \bold x_k^T \bold A \bold x_k \underbrace{ +\alpha \bold x_k^T\bold A \bold P_k + \alpha \bold P_k^T\bold A \bold x_k }_{这两项其实是一样的,可以用转置的定义推导得到} +\alpha^2\bold P_k^T\bold A \bold P_k -2\bold b^T\bold x_k - 2\alpha\bold b^T\bold P_k\\ & = \bold x_k^T\bold A\bold x_k \underline{+ 2\alpha \bold x_k^T\bold A \bold P_k} + \alpha^2\bold P_k^T\bold A \bold P_k -2\bold b^T\bold x_k\underline{- 2\alpha\bold b^T\bold P_k}\\ & = \bold x_k^T\bold A\bold x_k -2\alpha\bold x_k^T\bold A\bold P_k+\alpha^2\bold P_k^T\bold A \bold P_k-2\bold b^T\bold x_k\\ &\;\;\;\;\;(\bold r_k = \bold b - \bold A \bold x_k ,\; 2\alpha \bold x_k^T\bold A \bold P_k-2\alpha\bold b^T\bold P_k = 2\alpha((\bold A\bold x_k)^T-\bold b^T)\bold P_k=-2\alpha \bold r_k^T\bold p_k\;)\\ & =\alpha^2\bold P_k^T\bold A \bold P_k-2\alpha \bold r_k^T\bold p_k+\varphi(\bold x_k) \end{aligned}

解方程:    f′(α)=2αPkTAPk−2rkTPk=0得:    αk=rkTPkPkTAPk又由于:  Pk=rk=b−Axk∴    αk=rkTrkPkTAPk\begin{aligned} 解方程& : \;\;f'(\alpha) = 2\alpha \bold P_k^T\bold A \bold P_k - 2\bold r_k^T\bold P_k= 0\\ 得& : \;\;\alpha_k=\frac{\bold r_k^T\bold P_k}{\bold P_k^T\bold A\bold P_k}\\ 又由于&: \;\bold P_k = \bold r_k = \bold b-\bold A\bold x_k\\ \therefore&\;\;\alpha_k = \frac{\bold r_k^T\bold r_k}{\bold P_k^T\bold A\bold P_k} \end{aligned}

验证不等式 φ(xk+1)=φ(xk+αkPk)<φ(xk)\varphi(\bold x_{k+1}) = \varphi(\bold x_k+\alpha_k\bold P_k) < \varphi(\bold x_k) 是否满足

φ(xk+1−φ(xk))=φ(xk+αkPk)−φ(xk)=(xk+αkPk)TA(xk+αkPk)−2bT(xk+αkPk)−(xkTAxk−2bTxk)=xkTAxk+αkxkTAPk+αkPkTAxk+αk2PkTAPk−2bTxk−2αkbTPk−xkTAxk+2bTxk=αkxkTAPk+αkPkTAxk+αk2PkTAPk−2αkbTPk=2αkxkTAPk+αk2PkTAPk−2αkbTPk=αk2PkTAPk−2αkrkTPk=(rkTrk)2(PkTAPk)2PkTAPk−2rkTrkPkTAPkrkTPk=(rkTrk)2PkTAPk−2(rkTrk)2PkTAPk=−(rkTrk)2PkTAPk<0    (只要  rkTPk≠0)\begin{aligned} \varphi(\bold x_{k+1}-\varphi(\bold x_k)) & = \varphi(\bold x_k+\alpha_k\bold P_k) - \varphi(\bold x_k)\\ & = (\bold x_k+\alpha_k\bold P_k)^T\bold A(\bold x_k+\alpha_k\bold P_k)-2\bold b^T(\bold x_k+\alpha_k\bold P_k) - (\bold x_k^T\bold A\bold x_k-2\bold b^T\bold x_k)\\ & = \bold x_k^T \bold A \bold x_k +\alpha_k \bold x_k^T\bold A \bold P_k + \alpha_k \bold P_k^T\bold A \bold x_k +\alpha_k^2\bold P_k^T\bold A \bold P_k -2\bold b^T\bold x_k - 2\alpha_k\bold b^T\bold P_k - \bold x_k^T\bold A\bold x_k + 2\bold b^T\bold x_k\\ & = \alpha_k \bold x_k^T\bold A \bold P_k + \alpha_k \bold P_k^T\bold A \bold x_k +\alpha_k^2\bold P_k^T\bold A \bold P_k - 2\alpha_k\bold b^T\bold P_k\\ & = 2\alpha_k \bold x_k^T\bold A \bold P_k+\alpha_k^2\bold P_k^T\bold A \bold P_k - 2\alpha_k\bold b^T\bold P_k\\ & = \alpha_k^2\bold P_k^T\bold A \bold P_k - 2\alpha_k\bold r_k^T\bold P_k\\ & = \frac{(\bold r_k^T\bold r_k)^2}{(\bold P_k^T\bold A \bold P_k)^2}\bold P_k^T\bold A\bold P_k - 2\frac{\bold r_k^T\bold r_k}{\bold P_k^T\bold A \bold P_k}\bold r_k^T\bold P_k\\ & = \frac{(\bold r_k^T\bold r_k)^2}{\bold P_k^T\bold A \bold P_k} - 2\frac{(\bold r_k^T\bold r_k)^2}{\bold P_k^T\bold A \bold P_k}\\ & =- \frac{(\bold r_k^T\bold r_k)^2}{\bold P_k^T\bold A \bold P_k} < 0\;\;(只要\;\bold r_k^T\bold P_k \neq 0) \end{aligned}

若 rk=0,则:φ(xk+1=φxk),即解为  xk\bold r_k = 0 , 则:\varphi(\bold x_{k+1} = \varphi{\bold x_k}), 即解为\;\bold x_k

一些向量的性质

PkTPk+1=0\bold P_k^T \bold P_{k+1} = 0

其中,  Pk+1=b−Axk+1=b−A(xk+αkPk)=b−Axk−αkAPk=rk−αkAPk其中,\;\bold P_{k+1} = \bold b-\bold A\bold x_{k+1}=\bold b-\bold A(\bold x_k+\alpha_k\bold P_k) = \bold b-\bold A\bold x_k-\alpha_k\bold A\bold P_k = \bold r_k-\alpha_k\bold A\bold P_k
那么:PkTPk+1=PkTrk−PkTAPkrkTrkrkTArk=rkTrk−rkTrk=0那么:\bold P_k^T \bold P_{k+1} = \bold P_k^T\bold r_k - \bold P_k^T\bold A\bold P_k\frac{\bold r_k^T\bold r_k}{\bold r_k^T\bold A \bold r_k}=\bold r_k^T\bold r_k-\bold r_k^Tr_k = 0
这一性质的几何含义:如果在二维上,表现为两次下降的方向是垂直的(正交的)

具体计算步骤

     步0:任选初始点 x0\bold x_0 ,置 k=0k = 0
     步1:计算 rk=b−Axk\bold r_k = \bold b-\bold A \bold x_k (下降方向为负梯度方向)
     步2:若 rk=0\bold r_k = 0 ,则停止
     步3:计算 αk=rkTrkrkTArk\alpha_k = \frac{\bold r_k^T \bold r_k}{\bold r_k^T \bold A \bold r_k} % (步长因子)
     步4:计算 xk+1=xk+αkPk\bold x_{k+1} = \bold x_k + \alpha_k \bold P_k (新的迭代点)
     步5:k=k+1k = k+1 ,转步1

补充

定理2:设 A∈Rn×n\bold A \in \bold R^{n\times n} 对称正定,A\bold A 的特征值为:

λn≥λn−1≥⋯≥λ1>0\lambda_n\geq\lambda_{n-1}\geq\cdots \geq \lambda_1 > 0

设 {xk}\{x_k\} 为由求解对称正定方程组 Ax=b\bold A \bold x = \bold b 的最速下降法得到的向量序列,则有:

∣∣xk−x∗∣∣A≤(λn−λ1λn+λ1)k∣∣x0−x∗∣∣A||\bold x_k-\bold x^*||_A\leq\left( \frac{\lambda_n-\lambda_1}{\lambda_n+\lambda_1} \right)^k||\bold x_0-\bold x^*||_A

     其中:Ax∗=b\bold A\bold x^* = \bold b
     定义:设 A∈Rn×n\bold A \in \bold R^{n\times n} 对称正定,则称 ∣∣x∣∣A=(xTAx)12||\bold x||_A = (\bold x^T \bold A \bold x)^{\frac{1}{2}} 为 Rn\bold R^n 上的 A−范数\bold A- 范数
     注:可以用范数定义证明 ∣∣∗∣∣A||*||_A 是范数
     从定理的结论上看,当 λnλ1≫1\frac{\lambda_n}{\lambda_1} \gg 1 时,迭代序列收敛速度很慢
     缺点:从局部上看,最速下降法的每一步迭代都是最优的,但是从整体上看却未必,为获得全局最优的迭代方法,需要采用共轭梯度法

共轭梯度法

推导

(1)∀  x0∈Rn,  r0=b−Ax0\forall\;\bold x_0\in\bold R^n,\;\bold r_0=\bold b-\bold A\bold x_0 ,仍取 P0=r0\bold P_0 = \bold r_0 。x0\bold x_0 沿 P0\bold P_0 方向作直线搜索
     使得 φ(x0+α0P0)=min⁡αφ(x0+αP0),  α∈R\varphi(\bold x_0+\alpha_0\bold P_0) = \mathop{\min}\limits_{\alpha}\varphi(\bold x_0+\alpha \bold P_0),\;\alpha \in \bold R 证明方法与最速下降法类似
     解之,得 α0=r0Tr0r0TAr0>0\alpha_0 = \frac{\bold r_0^T\bold r_0}{\bold r_0^T\bold A \bold r_0} > 0 【 通过求f′(α)=0f'(\alpha) = 0 中的 α\alpha 得到 】
     ∴x1=x0+α0P0,  r1=b−Ax1\therefore \bold x_1=\bold x_0 + \alpha_0\bold P_0,\;\bold r_1=\bold b - \bold A\bold x_1
     注:P0Tr1=P0T(b−Ax0−α0AP0)=P0Tr0−α0P0TAP0=r0=P00\bold P_0^T \bold r_1 = \bold P_0^T(\bold b - \bold A\bold x_0 - \alpha_0\bold A\bold P_0) = \bold P_0^T\bold r_0 - \alpha_0\bold P_0^T\bold A\bold P_0 \xlongequal{\bold r_0 = \bold P_0} \bold 0
          r0Tr1=P0Tr1=0  (∗)\bold r_0^T\bold r_1 = \bold P_0^T\bold r_1 = 0 \;(*)

(2)从第二步起,下山方向就不再选取负梯度方向,而是在二维平面 π2={x=x1+ξr1+ηP0  ∣  ξ,η∈R}\pi_2 = \{\bold x=\bold x_1+\xi\bold r_1 + \eta\bold P_0\;|\;\xi,\eta\in \bold R\} 内找出使函数下降最快的方向作为新的下山方向 P1\bold P_1 和新的步长 α1\alpha_1 。

  • 先求 P1\bold P_1 :

φ(ξ,η)=φ(x1+ξr1+ηP0)=(x1+ξr1+ηP0)TA(x1+ξr1+ηP0)−2bT(x1+ξr1+ηP0)=x1TAξr1+ξr1TAx1+ξ2r1TAr1+ξr1TAηP0+ηP0TAξr1−2bTξr1\begin{aligned} \varphi(\xi,\eta) & = \varphi(\bold x_1+\bold \xi\bold r_1+\eta\bold P_0) \\ & = (\bold x_1 + \xi\bold r_1 + \eta\bold P_0)^T\bold A(\bold x_1 + \xi\bold r_1 + \eta\bold P_0) - 2\bold b^T(\bold x_1 + \xi\bold r_1 + \eta\bold P_0)\\ & = \bold x_1^T \bold A \xi\bold r_1+\xi\bold r_1^T\bold A\bold x_1+\xi^2\bold r_1^T\bold A \bold r_1 + \xi\bold r_1^T\bold A \eta\bold P_0+ \eta P_0^T\bold A\xi\bold r_1-2\bold b^T\xi\bold r_1 \end{aligned}

     对其求偏导,得到

φ(ξ,η)ξ∂φ(ξ,η)∂ξ=2(ξr1TAr1+ηr1TAP0−r1Tr1)∂φ(ξ,η)∂ξ=2(ξr1TAP0+ηP0TAP0)\begin{aligned} \frac{\varphi(\xi,\eta)}{\xi} \frac{\partial\varphi(\xi,\eta)}{\partial\xi} & = 2(\xi\bold r_1^T\bold A\bold r_1+\eta\bold r_1^T\bold A \bold P_0-\bold r_1^T\bold r_1)\\ \frac{\partial\varphi(\xi,\eta)}{\partial\xi} & = 2(\xi\bold r_1^T\bold A\bold P_0+\eta\bold P_0^T\bold A \bold P_0) \end{aligned}

     令(19)(20)为 0 ,即得 φ\varphi 在 π2\pi_2 内的唯一极小值解

x~=x1+ξ0r1+η0P0\widetilde{\bold x} = \bold x_1 + \xi_0\bold r_1 + \eta_0\bold P_0

     其中:ξ0,  η0  \xi_0,\;\eta_0\; 满足(19)=(20)= 0
     即求方程组:{  ξr1TAr1+ηr1TAP0=r1Tr1  ξr1TAP0+ηP0TAP0\begin{cases} \;\xi\bold r_1^T\bold A\bold r_1+\eta\bold r_1^T\bold A \bold P_0=\bold r_1^T\bold r_1\\ \;\xi\bold r_1^T\bold A\bold P_0+\eta\bold P_0^T\bold A \bold P_0 \end{cases} 【 如果这里直接求解 ξ0,  η0\xi_0,\;\eta_0 比较困难,可以先求出 β0=η0ξ0=−r1TAP0P0TAP0\beta_0=\frac{\eta_0}{\xi_0} = -\frac{\bold r_1^T\bold A \bold P_0}{\bold P_0^T \bold A \bold P_0} ,得出方向,再求方向的系数 】

     (21)中, 若 r1≠0  则  ξ0≠0r_1 \neq 0 \;则\;\xi_0\neq 0 ,故可取 x2=x1+α1P1\bold x_2 = \bold x_1 + \alpha_1 \bold P_1 【 P1=r1+β0P0\bold P_1 = \bold r_1 + \beta_0\bold P_0 作为新的方向】

  • 再求 α1\alpha_1

     因 α1\alpha_1 也满足 φ(x1+α1P1)=min⁡αφ(x1+αP1),  α∈R\varphi(\bold x_1+\alpha_1\bold P_1) = \mathop{\min}\limits_{\alpha}\varphi(\bold x_1+\alpha \bold P_1),\;\alpha \in \bold R
     故,得到: α1=r1TP1P1TAP1\alpha_1 = \frac{\bold r_1^T\bold P_1}{\bold P_1^T\bold A \bold P_1}
     由 P1=r1+β0P0\bold P_1 = \bold r_1 + \beta_0 \bold P_0 ,得 α1=r1T+β0P0P1TAP1=r1Tr1P1TAP1\alpha_1 = \frac{\bold r_1^T+\beta_0\bold P_0}{\bold P_1^T\bold A \bold P_1} = \frac{\bold r_1^T\bold r_1}{\bold P_1^T \bold A \bold P_1} 【 由 (∗)(*),r1\bold r_1 和 P0\bold P_0 垂直,即 r1TP0=(P0Tr1)T=0\bold r_1^T \bold P_0 = (\bold P_0^T\bold r_1)^T = 0 】

     综上:x2=x1+α1P1,  P1=r1+β0P0,  r2=b−Ax2\bold x_2 = \bold x_1 +\alpha_1\bold P_1,\;\bold P_1 = \bold r_1+\beta_0\bold P_0,\;\bold r_2 = \bold b - \bold A\bold x_2

一些向量的性质

P1TAP0=(r1+β0P0)TAP0=r1TAP0+β0P0TAP0=r1TAP0−r1TAP0P0TAP0P0TAP0=0\bold P_1^T\bold A \bold P_0 = (\bold r_1 + \beta_0 \bold P_0)^T \bold A \bold P_0 = \bold r_1^T\bold A \bold P_0 + \beta_0\bold P_0^T\bold A \bold P_0 = \bold r_1^T\bold A \bold P_0 - \frac{\bold r_1^T\bold A \bold P_0}{\bold P_0^T\bold A \bold P_0}\bold P_0^T\bold A\bold P_0 = 0

P0Tr2=P0T(b−Ax1⏟r1−α1AP1)=P0Tr1−α1P0TAP1=0\bold P_0^T \bold r_2 = \bold P_0^T(\underbrace{\bold b -\bold A \bold x_1}_{\bold r_1} - \alpha_1 \bold A \bold P_1) = \bold P_0^T \bold r_1 - \alpha_1\bold P_0^T\bold A \bold P_1 = 0

P1Tr2=P1T(b−Ax1−α1AP1)=P1Tr1−α1P1TAP1=P1Tr1−r1Tr1P1TAP1P1TAP1=P1Tr1−r1Tr1=(r1+β0P0)Tr1−r1Tr1=β0P0Tr1=β0⋅0=0\begin{aligned} \bold P_1^T \bold r_2 = \bold P_1^T(\bold b-\bold A\bold x_1-\alpha_1\bold A\bold P_1) & = \bold P_1^T \bold r_1 - \alpha_1 \bold P_1^T\bold A\bold P_1\\& = \bold P_1^T\bold r_1 - \frac{\bold r_1^T\bold r_1}{\bold P_1^T\bold A\bold P_1}\bold P_1^T\bold A \bold P_1 \\& =\bold P_1^T\bold r_1 - \bold r_1^T\bold r_1 \\ & =(\bold r_1+\beta_0\bold P_0)^T\bold r_1 - \bold r_1^T \bold r_1\\& =\beta_0\bold P_0^T \bold r_1 = \beta_0\cdot 0 = 0\end{aligned}

r0Tr2=P0Tr2=0\bold r_0^T \bold r_2 = \bold P_0^T\bold r_2 = 0

r1Tr2=r1T(b−Ax1−α1AP1)=r1T(r1−α1AP1)=r1Tr1−r1Tα1AP1=r1Tr1−α1r1TAP1=r1Tr1−r1TP1P1TAP1r1TAP1=r1Tr1−r1TP1(r1+β0P0)TAP1r1TAP1=r1Tr1−r1TP1r1TAP1+β0P0TAP1r1TAP1=r1Tr1−r1Tr1r1TAP1r1TAP1=0\begin{aligned} \bold r_1^T\bold r_2 = \bold r_1^T(\bold b - \bold A \bold x_1 - \alpha_1\bold A \bold P_1) & = \bold r_1^T(r_1-\alpha_1\bold A \bold P_1) \\&= \bold r_1^T \bold r_1-\bold r_1^T\bold \alpha_1\bold A\bold P_1 \\ &= \bold r_1^T \bold r_1 - \alpha_1\bold r_1^T\bold A \bold P_1\\& =\bold r_1^T \bold r_1-\frac{\bold r_1^T\bold P_1}{\bold P_1^T\bold A\bold P_1}\bold r_1^T\bold A\bold P_1\\&= \bold r_1^T \bold r_1-\frac{\bold r_1^T\bold P_1}{(\bold r_1+\beta_0\bold P_0)^T\bold A\bold P_1}\bold r_1^T\bold A\bold P_1\\&= \bold r_1^T \bold r_1-\frac{\bold r_1^T\bold P_1}{\bold r_1^T\bold A\bold P_1 + \beta_0\bold P_0^T\bold A\bold P_1}\bold r_1^T\bold A\bold P_1 \\&= \bold r_1^T \bold r_1-\frac{\bold r_1^T\bold r_1}{\bold r_1^T\bold A\bold P_1}\bold r_1^T\bold A\bold P_1 \\& = 0 \end{aligned}

(3)若 r2≠0\bold r_2 \neq 0 ,则重复第二步,总结为:

xk+1=xk+αkPkPk=rk+βk−1Pk−1βk−1=−rkTAPk−1Pk−1TAPk−1=rkTrk−1rk−1Trk−1αk=rkTrkPkTAPkrk+1=b−Axk+1=rk−αkAPk\begin{aligned} \bold x_{k+1} & = \bold x_k+\alpha_k \bold P_k\\ \bold P_k & = \bold r_k + \beta_{k-1}\bold P_{k-1}\\ \beta_{k-1} & = -\frac{\bold r_k^T\bold A\bold P_{k-1}}{\bold P_{k-1}^T\bold A\bold P_{k-1}} = \frac{\bold r_k^T\bold r_{k-1}}{\bold r_{k-1}^T\bold r_{k-1}}\\ \bold \alpha_k & = \frac{\bold r_k^T\bold r_k}{\bold P_{k}^T\bold A\bold P_k} \\ \bold r_{k+1} & = \bold b - \bold A\bold x_{k+1} = \bold r_k - \alpha_k\bold A \bold P_k \end{aligned}

如此,便得到共轭梯度法的计算步骤

计算步骤

推导出来的计算步骤如下:

x0=初值\bold x_0 = 初值
r0=b−Ax0;  k=0\bold r_0 = \bold b - \bold A\bold x_0 ;\;k = 0
while    rk≠0【实际上,应该用范数是否小于设定值来判断while \;\;\bold r_k \neq 0 【实际上,应该用范数是否小于设定值来判断
     k=k+1k = k + 1
     if  k=1    P0=r0if \; k = 1 \;\; \bold P_0 = \bold r_0
     elseelse
         βk−2=rk−1Trk−1rk−2Trk−2\beta_{k-2} = \frac{\bold r_{k-1}^T\bold r_{k-1}}{\bold r_{k-2}^T\bold r_{k-2}}
         Pk−1=rk−1+βk−2Pk−2\bold P_{k-1} = \bold r_{k-1}+\beta_{k-2}\bold P_{k-2}
     endend
     αk−1=rk−1Trk−1Pk−1TAPk−1\alpha_{k-1} = \frac{\bold r_{k-1}^T\bold r_{k-1}}{\bold P_{k-1}^T\bold A \bold P_{k-1}}
     xk=xk−1+αk−1Pk−1\bold x_k = \bold x_{k-1}+\alpha_{k-1}\bold P_{k-1}
     rk=rk−1−αk−1APk−1\bold r_k = \bold r_{k-1} - \alpha_{k-1}\bold A\bold P_{k-1}
endend
x=xk\bold x = \bold x_k

但是在编写程序过程中,应当按照如下方式编写:

x0=初值\bold x_0 = 初值
r0=b−Ax0;  k=0\bold r_0 = \bold b - \bold A\bold x_0 ;\;k = 0
while    ∣∣rk∣∣≥tol    (tol为一设定的很小的值)while \;\;||\bold r_k||\geq tol \;\;(tol为一设定的很小的值)
     k=k+1k = k + 1
         if  k=1    P0=r0if \; k = 1 \;\; \bold P_0 = \bold r_0
     elseelse
         βk−2=rk−1Trk−1rk−2Trk−2\beta_{k-2} = \frac{\bold r_{k-1}^T\bold r_{k-1}}{\bold r_{k-2}^T\bold r_{k-2}}
         Pk−1=rk−1+βk−2Pk−2\bold P_{k-1} = \bold r_{k-1}+\beta_{k-2}\bold P_{k-2}
     endend
     temp=APk−1temp = \bold A \bold P_{k-1}
     αk−1=rk−1Trk−1Pk−1T⋅temp\alpha_{k-1} = \frac{\bold r_{k-1}^T\bold r_{k-1}}{\bold P_{k-1}^T\cdot temp}
     xk=xk−1+αk−1Pk−1\bold x_k = \bold x_{k-1}+\alpha_{k-1}\bold P_{k-1}
     rk=rk−1−αk−1⋅temp\bold r_k = \bold r_{k-1} - \alpha_{k-1}\cdot temp
endend
x=xk\bold x = \bold x_k

补充

定理3:由共轭梯度法得到的向量组 {ri}\{\bold r_i\} 和 {Pi}\{\bold P_i\} 具有下面的性质:

(1)PITrj=0,  0≤i<j≤k\bold P_I^T\bold r_j = 0,\;0\leq i < j \leq k
(2)riTrj=0,  i≠j,  0≤i,j≤k\bold r_i^T\bold r_j = 0,\;i \neq j,\; 0 \leq i,j\leq k
(3)PiTAPj=0,  i≠j,  0≤i,j≤k\bold P_i^T\bold A \bold P_j = 0,\;i \neq j,\;0\leq i,j\leq k
(4)span{r0,  r1,  ⋯ ,  rk}=span{P0,  P1,  ⋯ ,  Pk}=K~(A,  r0,  k+1)span\{\bold r_0,\;\bold r_1,\;\cdots,\;\bold r_k\} = span\{\bold P_0,\;\bold P_1,\;\cdots,\;\bold P_k\} = \widetilde{K}(\bold A,\;\bold r_0,\;k+1)
K~(A,  r0,  k+1)=span{r0,  Ar0,  ⋯ ,  Akr0}通常称为krylov子空间\widetilde{K}(\bold A,\;\bold r_0,\;k+1) = span\{\bold r_0,\;\bold A\bold r_0,\;\cdots,\;\bold A^k\bold r_0\} 通常称为krylov子空间