\[ \tag{46} \frac{\partial \det \left( \mathbf{Y} \right)}{\partial x}=\,\,\det \left( \mathbf{Y} \right) Tr\left[ \mathbf{Y}^{-1}\frac{\partial \mathbf{Y}}{\partial x} \right] \]
Formula \((46)\) is actually Jacobi’s formula. 1
Analogy in functions
For a differentiable function \(f: D\subseteq R\rightarrow R\), for all \(x\) in some neighborhood of \(a\), \(f\) can be written as: 2 \[f(x)=f(a)+f^{\prime}(a) (x−a)+R(x−a) \] and, \(L(x)=f(a)+f^{\prime}(a)(x−a)\) is the best affine approximation of the function \(f\) at \(a\).
or, the idea could be expressed in other way: \[f(x+\epsilon)=f(x)+f^{\prime}(x) \epsilon +R\epsilon \]
It comes from Taylor aproximation at \(x+\epsilon\): \[f(x+\epsilon)=f(x)+f^{\prime}(x) \epsilon +f^{\prime\prime}(x) \epsilon^2 /2 + \cdots \]
Lemma1 3
\[ \det \left( \mathbf{I}+\epsilon \mathbf{A} \right) =\,\,1+\epsilon Tr\left( \mathbf{A} \right) +O\left( \epsilon^2 \right) \]
Let \(A_1,A_2, \cdot,A_N\) be the column vectors of the matrix \(A\). Let \(e_1,e_2, \cdot,e_N\) be the standard basis; note that these basis vectors form the columns of the identity matrix \(I\). Then we recall that the determinant is an alternating multi-linear map on the column space.
\[det(I+ϵA)=det(e_1+ϵA_1,e_2+ϵA_2,…,e_N+ϵA_N) \\ =det(e_1,e_2,…,e_N)+\epsilon \left\{ det(A_1,e_2,…,e_N)+det(e_1,A_2,…,e_N) +\cdots \\ +det(e_1,e_2,…,A_N) \right\} + O(\epsilon^2)\]
The first term is just the determinant of the identity matrix which is 1. The term proportional to ϵ is a sum of expressions like \(det(e_1,e_2,…,A_j,…,e_N)\) where the j’th column of the identity matrix is replaced with the j’th column of A. Expanding the determinant along the j’th row we see that \(det(e_1,e_2,…,A_j,…,e_N)=A_{jj}\).
\[det(I+ϵA)=1+ϵ\sum_{j=1}^N A_{jj}+O(ϵ^2)=1+ϵTr(A)+O(ϵ^2)\]
Particularly when \(n=2\), \[\begin{align} \det \left( I+\epsilon A \right) &=\det \left( \begin{matrix}{} 1+\varepsilon a_{11}& \varepsilon a_{12}\\ \varepsilon a_{21}& 1+\varepsilon a_{22}\\ \end{matrix} \right) \,\, \\ &=\,\,1+\varepsilon \left( a_{11}+a_{22} \right) +\varepsilon ^2\left( a_{11}a_{22}-a_{12}a_{21} \right) \,\,\\ &=\,\,1+\varepsilon Tr\left( A \right) +\varepsilon ^2\det \left( A \right) \end{align}\]
Lemma 2. 4
\[ det^{\prime}(I)=\mathrm {Tr} \]
where \(det^{\prime}(I)=Tr\) is the differential of \({\displaystyle \det }\)
This equation means that the differential of \({\displaystyle \det }\), evaluated at the identity matrix, is equal to the trace. The differential \({\displaystyle \det '(I)}\) is a linear operator that maps an n × n matrix to a real number.
Using the definition of a directional derivative together with one of its basic properties for differentiable functions, we have \[\begin{equation} \operatorname{det}^{\prime}(I)(T)=\nabla_{T} \operatorname{det}(I)=\lim _{\varepsilon \rightarrow 0} \frac{\operatorname{det}(I+\varepsilon T)-\operatorname{det} I}{\varepsilon} \\ = lim_{\varepsilon \rightarrow 0} \frac{1+ϵTr(T)+O(ϵ^2)-1}{\varepsilon} \\ = Tr(T) \end{equation}\]
Alternative proof of lemma 2: 5
\(det\) is a function \(M_{n×n}→R\) where \(M_{n×n}\) is the space of \(n×n\) square matrices. Therefore, a matrix is the equivalent of a point for real functions. The best linear approximation to \(det\) near the identity is given by: \[det(\mathbf{I}+\mathbf{M})=det(\mathbf{I})+d(det(\mathbf{I}))M+R(\mathbf{I},\mathbf{M})\] \[\underset{||\mathbf{M} ||\rightarrow 0}{lim}\frac{R(\mathbf{I},\mathbf{M})}{||\mathbf{M} ||}=0\]
\(det^{\prime}(I)=\mathrm {Tr}\) is equivalent to the following:
\[\begin{equation} \left.\frac{\mathrm{d}}{\mathrm{d} t}[\operatorname{det}(\mathbf{I}+t \mathbf{B})]\right|_{t=0}=\operatorname{Tr}(\mathbf{B}) \end{equation}\]
Lemma 3 6
For an invertible matrix \(A\), we have:\[\begin{equation} \operatorname{det}^{\prime}(A)(T)=\operatorname{det} A \operatorname{tr}\left(A^{-1} T\right) \end{equation}\]
proof: 7
Remember that, if \(f:E→F\) is a differentiable map, a way to compute \(df(a)(v)\) is to find a curve \(γ:R→E\) with \(γ(0)=a\) and \(γ′(0)=v\), and then \(df(a)(v)=\frac{d}{dt}|_0f(γ(t))\) (this is the chain rule). Here, find a curve \(γ:R→M_n(R)\) with \(γ(0)=A\) and \(γ′(0)=T\). Then note that
\[\begin{equation} \begin{aligned} d \operatorname{det}(A)(T)=\left.\frac{d}{d t}\right|_{0}(\operatorname{det}(\gamma(t))) &=\left.\frac{d}{d t}\right|_{0}\left(\operatorname{det}\left(A A^{-1} \gamma(t)\right)\right) \\ =\left.\operatorname{det}(A) \frac{d}{d t}\right|_{0}\left(\operatorname{det}\left(A^{-1} \gamma(t)\right)\right) &=\operatorname{det}(A) d \operatorname{det}(\operatorname{I})\left(A^{-1} T\right) \end{aligned} \end{equation}\] (since \(t↦A^{−1}γ(t)\) is a curve which is \(I\) in 0 and which the derivative is \(A^{−1}T\) in 0).