AI基础(1):Gradient,-Jacobian-matrix-and-Hessian-matrix
文章来自微信公众号“科文路”,欢迎关注、互动。转载须注明出处。
Gradient, Jacobian matrix and Hessian matrix
这两周,扮演了几场面试官。最大的感触是,应届生在基础数学知识的储备上存在很大的问题。所以我决定把我认为重要的 AI 基础知识拿出来过一下。
用英文写的(水平一般),帮助各位了解下术语。
1 Gradient
The gradient of $f$ is defined as the unique vector field whose dot product with any unit vector $\mathbf{v}$at each point $x$ is the directional derivative of f$f$ along $\mathbf{v}$. That is,
$$\big (\nabla f(x){\big )}\cdot \mathbf {v} =D_{\mathbf {v} }f(x)$$
e.g. in coordinate system,
沿着 $i$ 方向的导数,就是 $i$ 轴方向的分量
$$\nabla f = \frac { \partial f } { \partial x _ { 1 } } e _ { 1 } + \cdots + \frac { \partial f } { \partial x _ { n } } e _ { n }$$
Attention: the relationship to derivation
2 Jacobian matrix
雅可比矩阵的重要性在于它体现了一个可微方程与给出点的最优线性逼近。 因此,雅可比矩阵类似于多元函数的导数。
The Jacobian of a vector-valued function in several variables generalizes the gradient of a scalar-valued function in several variables, which in turn generalizes the derivative of a scalar-valued function of a single variable. If $f$ is differentiable at a point $p$ in $\mathbb{R}^n$, then its differential is represented by $J_f(p)$. In this case, the linear transformation represented by $J_f(p)$ is the best linear approximation of f near the point p, in the sense that
$$f ( x ) - f ( p ) = J _ { f } ( p ) ( x - p ) + o ( | x - p | ) \quad ( \text { as } x \rightarrow p )$$
This approximation specializes to the approximation of a scalar function of a single variable by its Taylor polynomial of degree one, namely
$$f(x)-f(p)=f’(p)(x-p)+o(x-p)\quad ({\text{as }}x\to p)$$
The Jacobian matrix represents the differential of $f$ at every point where $f$ is differentiable.
$$\mathbf {J} ={
\begin{bmatrix}
{\dfrac {\partial \mathbf {f} }{\partial x_{1}}}
&\cdots &{\dfrac {\partial \mathbf {f} }{\partial x_{n}}}
\end{bmatrix}}
={
\begin{bmatrix}
{\dfrac {\partial f_{1}}{\partial x_{1}}}
&\cdots
&{\dfrac {\partial f_{1}}{\partial x_{n}}}
\\vdots &\ddots &\vdots \ {\dfrac {\partial f_{m}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{m}}{\partial x_{n}}}
\end{bmatrix}
}$$
3 Hessian
$$\mathbf {H}(f(\mathbf {x}))= \mathbf {J}(\nabla f(\mathbf {x}))^T$$
$$\mathbf {H} ={
\begin{bmatrix}
{\dfrac {\partial ^{2}f}{\partial x_{1}^{2}}}
&{\dfrac {\partial ^{2}f}{\partial x_{1},\partial x_{2}}}
&\cdots &{\dfrac {\partial ^{2}f}{\partial x_{1},\partial x_{n}}}\
{\dfrac {\partial ^{2}f}{\partial x_{2},\partial x_{1}}}
&{\dfrac {\partial ^{2}f}{\partial x_{2}^{2}}}&\cdots
&{\dfrac {\partial ^{2}f}{\partial x_{2},\partial x_{n}}}\
\vdots &\vdots &\ddots &\vdots \
{\dfrac {\partial ^{2}f}{\partial x_{n},\partial x_{1}}}
&{\dfrac {\partial ^{2}f}{\partial x_{n},\partial x_{2}}}
&\cdots &{\dfrac {\partial ^{2}f}{\partial x_{n}^{2}}}
\end{bmatrix}}$$
即在每一个变化方向($\nabla$)上做线性逼近(Jacobian),这可能可以体现变化程度?
~~
都看到这儿了,不如关注每日推送的“科文路”、互动起来~
赞赏请前往公众号菜单栏~~
至少点个赞再走吧~
AI基础(1):Gradient,-Jacobian-matrix-and-Hessian-matrix