jagomart
digital resources
picture1_Matrix Calculus Pdf 171186 | Matrix Calculus


 121x       Filetype PDF       File size 0.21 MB       Source: paulklein.ca


File: Matrix Calculus Pdf 171186 | Matrix Calculus
matrixcalculus notesonthederivativeofatrace johannestraa this write up elucidates the rules of matrix calculus for expressions involving the trace of a function of a matrix x f tr g x 1 wewouldliketotakethederivativeof ...

icon picture PDF Filetype PDF | Posted on 26 Jan 2023 | 2 years ago
Partial capture of text on file.
                              MatrixCalculus-NotesontheDerivativeofaTrace
                                                                   JohannesTraa
                       This write-up elucidates the rules of matrix calculus for expressions involving the trace
                    of a function of a matrix X:
                                                                    f =tr£g X ¤ .                                                    (1)
                                                                                (   )
                       Wewouldliketotakethederivativeof f withrespecttoX:
                                                                         ∂f =? .                                                     (2)
                                                                         ∂X
                       One strategy is to write the trace expression as a scalar using index notation, take the
                    derivative, and re-write in matrix form. An easier way is to reduce the problem to one or
                    moresmallerproblemswheretheresultsforsimplerderivativescanbeapplied. It’sbrute-
                    forcevsbottom-up.
                    MATRIX-VALUEDDERIVATIVE
                    Thederivativeofascalar f withrespecttoamatrixX∈RM×N canbewrittenas:
                                                                                                                                       1
                                                                                                           ∂f                ∂f         ···           ∂f     
                                                                                                          ∂X11             ∂X12                    ∂X1N 
                                                                                                                                                             
                                                                                                                                                             
                                                                                                           ∂f                ∂f                       ∂f     
                                                                                              ∂f          ∂X               ∂X           ···        ∂X        
                                                                                                     = 21                       22                      2N                                                             (3)
                                                                                                                                                             
                                                                                              ∂X           .                   .         .             .     
                                                                                                           .                   .           ..          .     
                                                                                                           .                   .                       .     
                                                                                                                                                             
                                                                                                           ∂f                ∂f         ···           ∂f     
                                                                                                             ∂XM1           ∂XM2                    ∂XMN
                                     So, the result is the same size as X.
                                MATRIXANDINDEXNOTATION
                                It is useful to be able to convert between matrix notation and index notation. For example,
                                theproductABhaselements:
                                                                                                           [AB]ik =XAijBjk ,                                                                                             (4)
                                                                                                                               j
                                andthematrixproductABCT haselements:
                                                            £           T¤           X £ T¤ X X                                                                XX
                                                              ABC il = j Aij BC                                 jl = j Aij k BjkClk = j k AijBjkClk .                                                                    (5)
                                FIRST-ORDER DERIVATIVES
                                EXAMPLE1
                                Considerthisexample:
                                                                                                                 f =tr AXB .                                                                                             (6)
                                                                                                                              [          ]
                                     Wecanwritethisusingindexnotationas:
                                                 f =X AXB =XXA XB =XXA XX B =XXXA X B .                                                                                                                                  (7)
                                                              [          ]ii                      i j [       ]ji                      i j           jk     ki                           i j     jk     ki
                                                          i                          i     j                              i     j            k                         i    j     k
                                     Takingthederivativewithrespectto Xjk,weget:
                                                                                                     ∂f       =XA B = BA                                    .                                                            (8)
                                                                                                   ∂X                       i j    ki       [       ]kj
                                                                                                         jk          i
                                     The result has to be the same size as X, so we know that the indices of the rows and
                                columnsmustbe j andk,respectively. This means we have to transpose the result above
                                towritethederivativeinmatrixformas:
                                                                                                           ∂tr AXB
                                                                                                                   [          ]         T T
                                                                                                                   ∂X            =A B .                                                                                  (9)
                                                                                                                                                                                                                            2
                                EXAMPLE2
                                Similarly, we have:
                                                                                                       £       T ¤ XXX
                                                                                          f =tr AX B = i j k AijXkjBki ,                                                                                               (10)
                                sothatthederivativeis:
                                                                                                     ∂f       =XA B = BA                                    ,                                                          (11)
                                                                                                   ∂X                       i j    ki       [       ]kj
                                                                                                         kj          i
                                     TheXtermappearsin(10)withindiceskj,soweneedtowritethederivativeinmatrix
                                formsuchthatk istherowindexand j isthecolumnindex. Thus,wehave:
                                                                                                                     £       T ¤
                                                                                                            ∂tr AX B =BA .                                                                                             (12)
                                                                                                                      ∂X
                                MULTIPLE-ORDER
                                Nowconsideramorecomplicatedexample:
                                                                                                      £                T¤
                                                                                         f =tr AXBXC                                                                                                                   (13)
                                                                                            =XXXXXAijXjkBklXlmCim .                                                                                                    (14)
                                                                                                  i     j    k     l    m
                                     ThederivativehascontributionsfrombothappearancesofX.
                                TAKE1
                                Inindexnotation:
                                                                                  ∂f            XXX                                                  £           T ¤
                                                                                ∂Xjk = i l m AijBklXlmCim = BXC A kj ,                                                                                                 (15)
                                                                                  ∂f            XXX                                                  £ T              ¤
                                                                               ∂Xlm = i j k AijXjkBklCim = C AXB ml .                                                                                                  (16)
                                     Transposingappropriatelyandsummingthetermstogether,wehave:
                                                                                             £                 T¤
                                                                                     ∂tr AXBXC                              T        T T              T T T
                                                                                                 ∂X                  =A CX B +B X A C .                                                                                (17)
                                TAKE2
                                Wecanskipthistediousprocessbyapplying(9)foreachappearanceofX:
                                                                           £                T¤                                             £          T¤
                                                                  ∂tr AXBXC                            ∂tr[AXD]                   ∂tr EXC                          T      T         T
                                                                               ∂X                 =            ∂X             +             ∂X              =A D +E C .                                                (18)
                                                                                                                                                                                                                            3
                                where D=BXCT andE=AXB. Sowejustevaluatethematrixderivativeforeachappear-
                                anceofXassumingthateverythingelseisaconstant(includingotherX’s). Toseewhythis
                                rule is useful, consider the following beast:
                                                                                                                    £         T            T        ¤
                                                                                                       f =tr AXX BCX XC .                                                                                              (19)
                                     Wecanimmediatelywritedownthederivativeusing(9)and(12):
                             £          T            T        ¤                ¡                         ¢        ¡                   ¢                         ¡                   ¢      ¡                         ¢
                     ∂tr AXX BCX XC                                        T        T           T          T                 T                                             T                          T           T T            T
                                                                 = A             X BCX XC + BCX XC AX + XC AXX BC + AXX BCX                                                                                                 C
                                       ∂X                            ( )                                                                 (      )      (      )                                                           ( )
                                                                                                                                                                                                                       (20)
                                                                          T T            T T                       T                                  T                   T T            T T T
                                                                =AC X XC B X+BCX XCAX+XCAXX BC+XC B XX A C .                                                                                                           (21)
                                FROBENIUSNORM
                                TheFrobeniusnormshowsupwhenwehaveanoptimizationprobleminvolvingamatrix
                                factorization andwewanttominimizeasum-of-squareserrorcriterion:
                                                                         Ã                                    !
                                                              XX                        X                       2                                        £                                          ¤
                                                                                                                                            2                                                    T
                                                      f =                   X − W H                                =kX−WHk =tr X−WH X−WH                                                                .              (22)
                                                                               ik                 i j     jk                                F              (                )(                  )
                                                                i    k                    j
                                     Wecanworkwiththeexpressioninindexnotation, but it’s easier to work directly with
                                matricesandapplytheresultsderivedearlier. Supposewewanttofindthederivativewith
                                respecttoW. Expandingthematrixouterproduct,wehave:
                                                                               £       T¤             £       T       T¤            £            T¤             £             T      T¤
                                                                  f =tr XX                  −tr XH W −tr WHX +tr WHH W                                                                      .                          (23)
                                     Applying(9)and(12),weeasilydeducethat:
                                                                              ∂tr£ X−WH X−WHT¤
                                                                                        (                 )(                 )                        T                     T
                                                                                                     ∂W                              =−2XH +2WHH .                                                                     (24)
                                LOG
                                Considerthistraceexpression:
                                                                                  £ T                         ¤      XX                       µXX                                    ¶
                                                                     f =tr V log AXB =                                          V log                       A X B                         .                            (25)
                                                                                                 (          )                      i j                         im mn nj
                                                                                                                      i     j                    m n
                                     Takingthederivativewithrespectto Xmn,weget:
                                                            ∂f             XX                         Vij                                          XX µ V ¶
                                                         ∂Xmn = i j PPAimXmnBnjAimBnj = i j Aim AXB ijBnj .                                                                                                          (26)
                                                                                          m n
                                     Thus:
                                                                                                                                                                                                                            4
The words contained in this file might help you see if this file matches what you are looking for:

...Matrixcalculus notesonthederivativeofatrace johannestraa this write up elucidates the rules of matrix calculus for expressions involving trace a function x f tr g wewouldliketotakethederivativeof withrespecttox one strategy is to expression as scalar using index notation take derivative and re in form an easier way reduce problem or moresmallerproblemswheretheresultsforsimplerderivativescanbeapplied it sbrute forcevsbottom valuedderivative thederivativeofascalar withrespecttoamatrixx rm n canbewrittenas xn xm xmn so result same size matrixandindexnotation useful be able convert between example theproductabhaselements ik xaijbjk j andthematrixproductabct haselements t xx abc il aij bc jl k bjkclk aijbjkclk first order derivatives considerthisexample axb wecanwritethisusingindexnotationas xxa xb b xxxa ii i ji jk ki takingthederivativewithrespectto xjk weget xa ba kj has we know that indices rows columnsmustbe andk respectively means have transpose above towritethederivativeinmatrixforma...

no reviews yet
Please Login to review.