Matrix Calculus Pdf 171186 | Matrix Calculus

Partial capture of text on file.
                              MatrixCalculus-NotesontheDerivativeofaTrace
                                                                   JohannesTraa
                       This write-up elucidates the rules of matrix calculus for expressions involving the trace
                    of a function of a matrix X:
                                                                    f =tr£g X ¤ .                                                    (1)
                                                                                (   )
                       Wewouldliketotakethederivativeof f withrespecttoX:
                                                                         ∂f =? .                                                     (2)
                                                                         ∂X
                       One strategy is to write the trace expression as a scalar using index notation, take the
                    derivative, and re-write in matrix form. An easier way is to reduce the problem to one or
                    moresmallerproblemswheretheresultsforsimplerderivativescanbeapplied. It’sbrute-
                    forcevsbottom-up.
                    MATRIX-VALUEDDERIVATIVE
                    Thederivativeofascalar f withrespecttoamatrixX∈RM×N canbewrittenas:
                                                                                                                                       1
                                                                                                           ∂f                ∂f         ···           ∂f     
                                                                                                          ∂X11             ∂X12                    ∂X1N 
                                                                                                                                                             
                                                                                                                                                             
                                                                                                           ∂f                ∂f                       ∂f     
                                                                                              ∂f          ∂X               ∂X           ···        ∂X        
                                                                                                     = 21                       22                      2N                                                             (3)
                                                                                                                                                             
                                                                                              ∂X           .                   .         .             .     
                                                                                                           .                   .           ..          .     
                                                                                                           .                   .                       .     
                                                                                                                                                             
                                                                                                           ∂f                ∂f         ···           ∂f     
                                                                                                             ∂XM1           ∂XM2                    ∂XMN
                                     So, the result is the same size as X.
                                MATRIXANDINDEXNOTATION
                                It is useful to be able to convert between matrix notation and index notation. For example,
                                theproductABhaselements:
                                                                                                           [AB]ik =XAijBjk ,                                                                                             (4)
                                                                                                                               j
                                andthematrixproductABCT haselements:
                                                            £           T¤           X £ T¤ X X                                                                XX
                                                              ABC il = j Aij BC                                 jl = j Aij k BjkClk = j k AijBjkClk .                                                                    (5)
                                FIRST-ORDER DERIVATIVES
                                EXAMPLE1
                                Considerthisexample:
                                                                                                                 f =tr AXB .                                                                                             (6)
                                                                                                                              [          ]
                                     Wecanwritethisusingindexnotationas:
                                                 f =X AXB =XXA XB =XXA XX B =XXXA X B .                                                                                                                                  (7)
                                                              [          ]ii                      i j [       ]ji                      i j           jk     ki                           i j     jk     ki
                                                          i                          i     j                              i     j            k                         i    j     k
                                     Takingthederivativewithrespectto Xjk,weget:
                                                                                                     ∂f       =XA B = BA                                    .                                                            (8)
                                                                                                   ∂X                       i j    ki       [       ]kj
                                                                                                         jk          i
                                     The result has to be the same size as X, so we know that the indices of the rows and
                                columnsmustbe j andk,respectively. This means we have to transpose the result above
                                towritethederivativeinmatrixformas:
                                                                                                           ∂tr AXB
                                                                                                                   [          ]         T T
                                                                                                                   ∂X            =A B .                                                                                  (9)
                                                                                                                                                                                                                            2
                                EXAMPLE2
                                Similarly, we have:
                                                                                                       £       T ¤ XXX
                                                                                          f =tr AX B = i j k AijXkjBki ,                                                                                               (10)
                                sothatthederivativeis:
                                                                                                     ∂f       =XA B = BA                                    ,                                                          (11)
                                                                                                   ∂X                       i j    ki       [       ]kj
                                                                                                         kj          i
                                     TheXtermappearsin(10)withindiceskj,soweneedtowritethederivativeinmatrix
                                formsuchthatk istherowindexand j isthecolumnindex. Thus,wehave:
                                                                                                                     £       T ¤
                                                                                                            ∂tr AX B =BA .                                                                                             (12)
                                                                                                                      ∂X
                                MULTIPLE-ORDER
                                Nowconsideramorecomplicatedexample:
                                                                                                      £                T¤
                                                                                         f =tr AXBXC                                                                                                                   (13)
                                                                                            =XXXXXAijXjkBklXlmCim .                                                                                                    (14)
                                                                                                  i     j    k     l    m
                                     ThederivativehascontributionsfrombothappearancesofX.
                                TAKE1
                                Inindexnotation:
                                                                                  ∂f            XXX                                                  £           T ¤
                                                                                ∂Xjk = i l m AijBklXlmCim = BXC A kj ,                                                                                                 (15)
                                                                                  ∂f            XXX                                                  £ T              ¤
                                                                               ∂Xlm = i j k AijXjkBklCim = C AXB ml .                                                                                                  (16)
                                     Transposingappropriatelyandsummingthetermstogether,wehave:
                                                                                             £                 T¤
                                                                                     ∂tr AXBXC                              T        T T              T T T
                                                                                                 ∂X                  =A CX B +B X A C .                                                                                (17)
                                TAKE2
                                Wecanskipthistediousprocessbyapplying(9)foreachappearanceofX:
                                                                           £                T¤                                             £          T¤
                                                                  ∂tr AXBXC                            ∂tr[AXD]                   ∂tr EXC                          T      T         T
                                                                               ∂X                 =            ∂X             +             ∂X              =A D +E C .                                                (18)
                                                                                                                                                                                                                            3
                                where D=BXCT andE=AXB. Sowejustevaluatethematrixderivativeforeachappear-
                                anceofXassumingthateverythingelseisaconstant(includingotherX’s). Toseewhythis
                                rule is useful, consider the following beast:
                                                                                                                    £         T            T        ¤
                                                                                                       f =tr AXX BCX XC .                                                                                              (19)
                                     Wecanimmediatelywritedownthederivativeusing(9)and(12):
                             £          T            T        ¤                ¡                         ¢        ¡                   ¢                         ¡                   ¢      ¡                         ¢
                     ∂tr AXX BCX XC                                        T        T           T          T                 T                                             T                          T           T T            T
                                                                 = A             X BCX XC + BCX XC AX + XC AXX BC + AXX BCX                                                                                                 C
                                       ∂X                            ( )                                                                 (      )      (      )                                                           ( )
                                                                                                                                                                                                                       (20)
                                                                          T T            T T                       T                                  T                   T T            T T T
                                                                =AC X XC B X+BCX XCAX+XCAXX BC+XC B XX A C .                                                                                                           (21)
                                FROBENIUSNORM
                                TheFrobeniusnormshowsupwhenwehaveanoptimizationprobleminvolvingamatrix
                                factorization andwewanttominimizeasum-of-squareserrorcriterion:
                                                                         Ã                                    !
                                                              XX                        X                       2                                        £                                          ¤
                                                                                                                                            2                                                    T
                                                      f =                   X − W H                                =kX−WHk =tr X−WH X−WH                                                                .              (22)
                                                                               ik                 i j     jk                                F              (                )(                  )
                                                                i    k                    j
                                     Wecanworkwiththeexpressioninindexnotation, but it’s easier to work directly with
                                matricesandapplytheresultsderivedearlier. Supposewewanttoﬁndthederivativewith
                                respecttoW. Expandingthematrixouterproduct,wehave:
                                                                               £       T¤             £       T       T¤            £            T¤             £             T      T¤
                                                                  f =tr XX                  −tr XH W −tr WHX +tr WHH W                                                                      .                          (23)
                                     Applying(9)and(12),weeasilydeducethat:
                                                                              ∂tr£ X−WH X−WHT¤
                                                                                        (                 )(                 )                        T                     T
                                                                                                     ∂W                              =−2XH +2WHH .                                                                     (24)
                                LOG
                                Considerthistraceexpression:
                                                                                  £ T                         ¤      XX                       µXX                                    ¶
                                                                     f =tr V log AXB =                                          V log                       A X B                         .                            (25)
                                                                                                 (          )                      i j                         im mn nj
                                                                                                                      i     j                    m n
                                     Takingthederivativewithrespectto Xmn,weget:
                                                            ∂f             XX                         Vij                                          XX µ V ¶
                                                         ∂Xmn = i j PPAimXmnBnjAimBnj = i j Aim AXB ijBnj .                                                                                                          (26)
                                                                                          m n
                                     Thus:
                                                                                                                                                                                                                            4
The words contained in this file might help you see if this file matches what you are looking for:

...Matrixcalculus notesonthederivativeofatrace johannestraa this write up elucidates the rules of matrix calculus for expressions involving trace a function x f tr g wewouldliketotakethederivativeof withrespecttox one strategy is to expression as scalar using index notation take derivative and re in form an easier way reduce problem or moresmallerproblemswheretheresultsforsimplerderivativescanbeapplied it sbrute forcevsbottom valuedderivative thederivativeofascalar withrespecttoamatrixx rm n canbewrittenas xn xm xmn so result same size matrixandindexnotation useful be able convert between example theproductabhaselements ik xaijbjk j andthematrixproductabct haselements t xx abc il aij bc jl k bjkclk aijbjkclk first order derivatives considerthisexample axb wecanwritethisusingindexnotationas xxa xb b xxxa ii i ji jk ki takingthederivativewithrespectto xjk weget xa ba kj has we know that indices rows columnsmustbe andk respectively means have transpose above towritethederivativeinmatrixforma...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area