121x Filetype PDF File size 0.21 MB Source: paulklein.ca
MatrixCalculus-NotesontheDerivativeofaTrace JohannesTraa This write-up elucidates the rules of matrix calculus for expressions involving the trace of a function of a matrix X: f =tr£g X ¤ . (1) ( ) Wewouldliketotakethederivativeof f withrespecttoX: ∂f =? . (2) ∂X One strategy is to write the trace expression as a scalar using index notation, take the derivative, and re-write in matrix form. An easier way is to reduce the problem to one or moresmallerproblemswheretheresultsforsimplerderivativescanbeapplied. It’sbrute- forcevsbottom-up. MATRIX-VALUEDDERIVATIVE Thederivativeofascalar f withrespecttoamatrixX∈RM×N canbewrittenas: 1 ∂f ∂f ··· ∂f ∂X11 ∂X12 ∂X1N ∂f ∂f ∂f ∂f ∂X ∂X ··· ∂X = 21 22 2N (3) ∂X . . . . . . .. . . . . ∂f ∂f ··· ∂f ∂XM1 ∂XM2 ∂XMN So, the result is the same size as X. MATRIXANDINDEXNOTATION It is useful to be able to convert between matrix notation and index notation. For example, theproductABhaselements: [AB]ik =XAijBjk , (4) j andthematrixproductABCT haselements: £ T¤ X £ T¤ X X XX ABC il = j Aij BC jl = j Aij k BjkClk = j k AijBjkClk . (5) FIRST-ORDER DERIVATIVES EXAMPLE1 Considerthisexample: f =tr AXB . (6) [ ] Wecanwritethisusingindexnotationas: f =X AXB =XXA XB =XXA XX B =XXXA X B . (7) [ ]ii i j [ ]ji i j jk ki i j jk ki i i j i j k i j k Takingthederivativewithrespectto Xjk,weget: ∂f =XA B = BA . (8) ∂X i j ki [ ]kj jk i The result has to be the same size as X, so we know that the indices of the rows and columnsmustbe j andk,respectively. This means we have to transpose the result above towritethederivativeinmatrixformas: ∂tr AXB [ ] T T ∂X =A B . (9) 2 EXAMPLE2 Similarly, we have: £ T ¤ XXX f =tr AX B = i j k AijXkjBki , (10) sothatthederivativeis: ∂f =XA B = BA , (11) ∂X i j ki [ ]kj kj i TheXtermappearsin(10)withindiceskj,soweneedtowritethederivativeinmatrix formsuchthatk istherowindexand j isthecolumnindex. Thus,wehave: £ T ¤ ∂tr AX B =BA . (12) ∂X MULTIPLE-ORDER Nowconsideramorecomplicatedexample: £ T¤ f =tr AXBXC (13) =XXXXXAijXjkBklXlmCim . (14) i j k l m ThederivativehascontributionsfrombothappearancesofX. TAKE1 Inindexnotation: ∂f XXX £ T ¤ ∂Xjk = i l m AijBklXlmCim = BXC A kj , (15) ∂f XXX £ T ¤ ∂Xlm = i j k AijXjkBklCim = C AXB ml . (16) Transposingappropriatelyandsummingthetermstogether,wehave: £ T¤ ∂tr AXBXC T T T T T T ∂X =A CX B +B X A C . (17) TAKE2 Wecanskipthistediousprocessbyapplying(9)foreachappearanceofX: £ T¤ £ T¤ ∂tr AXBXC ∂tr[AXD] ∂tr EXC T T T ∂X = ∂X + ∂X =A D +E C . (18) 3 where D=BXCT andE=AXB. Sowejustevaluatethematrixderivativeforeachappear- anceofXassumingthateverythingelseisaconstant(includingotherX’s). Toseewhythis rule is useful, consider the following beast: £ T T ¤ f =tr AXX BCX XC . (19) Wecanimmediatelywritedownthederivativeusing(9)and(12): £ T T ¤ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ∂tr AXX BCX XC T T T T T T T T T T = A X BCX XC + BCX XC AX + XC AXX BC + AXX BCX C ∂X ( ) ( ) ( ) ( ) (20) T T T T T T T T T T T =AC X XC B X+BCX XCAX+XCAXX BC+XC B XX A C . (21) FROBENIUSNORM TheFrobeniusnormshowsupwhenwehaveanoptimizationprobleminvolvingamatrix factorization andwewanttominimizeasum-of-squareserrorcriterion: Ã ! XX X 2 £ ¤ 2 T f = X − W H =kX−WHk =tr X−WH X−WH . (22) ik i j jk F ( )( ) i k j Wecanworkwiththeexpressioninindexnotation, but it’s easier to work directly with matricesandapplytheresultsderivedearlier. Supposewewanttofindthederivativewith respecttoW. Expandingthematrixouterproduct,wehave: £ T¤ £ T T¤ £ T¤ £ T T¤ f =tr XX −tr XH W −tr WHX +tr WHH W . (23) Applying(9)and(12),weeasilydeducethat: ∂tr£ X−WH X−WHT¤ ( )( ) T T ∂W =−2XH +2WHH . (24) LOG Considerthistraceexpression: £ T ¤ XX µXX ¶ f =tr V log AXB = V log A X B . (25) ( ) i j im mn nj i j m n Takingthederivativewithrespectto Xmn,weget: ∂f XX Vij XX µ V ¶ ∂Xmn = i j PPAimXmnBnjAimBnj = i j Aim AXB ijBnj . (26) m n Thus: 4
no reviews yet
Please Login to review.