The Matrix Cookbook
http:matrixcookbook.com
Kaare Brandt Petersen Michael Syskind Pedersen
Version: November 15, 2012
1
Introduction
What is this? These pages are a collection of facts identities, approxima tions, inequalities, relations, about matrices and matters relating to them. It is collected in this form for the convenience of anyone who wants a quick desktop reference .
Disclaimer: The identities, approximations and relations presented here were obviously not invented but collected, borrowed and copied from a large amount of sources. These sources include similar but shorter notes found on the internet and appendices in bookssee the references for a full list.
Errors: Very likely there are errors, typos, and mistakes for which we apolo gize and would be grateful to receive corrections at cookbook2302.dk.
Its ongoing: The project of keeping a large repository of relations involving matrices is naturally ongoing and the version will be apparent from the date in the header.
Suggestions: Your suggestion for additional content or elaboration of some topics is most welcome acookbook2302.dk.
Keywords: Matrix algebra, matrix relations, matrix identities, derivative of determinant, derivative of inverse matrix, differentiate a matrix.
Acknowledgements: We would like to thank the following for contributions and suggestions: Bill Baxter, Brian Templeton, Christian Rishj, Christian Schr oppel, Dan Boley, Douglas L. Theobald, Esben HoeghRasmussen, Evripidis Karseras, Georg Martius, Glynne Casteel, Jan Larsen, Jun Bin Gao, Ju rgen Struckmeier, Kamil Dedecius, Karim T. AbouMoustafa, Korbinian Strimmer, Lars Christiansen, Lars Kai Hansen, Leland Wilkinson, Liguo He, Loic Thibaut, Markus Froeb, Michael Hubatka, Miguel Bar ao, Ole Winther, Pavel Sakov, Stephan Hattinger, Troels Pedersen, Vasile Sima, Vincent Rabaud, Zhaoshui He. We would also like thank The Oticon Foundation for funding our PhD studies.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 2
CONTENTS CONTENTS
Contents
1 Basics
6
1.1 Trace. 6 1.2 Determinant 6 1.3 TheSpecialCase2x2. 7
2 Derivatives 8
2.1 DerivativesofaDeterminant .. 8
2.2 DerivativesofanInverse.. 9
2.3 DerivativesofEigenvalues . 10
2.4 Derivatives of Matrices, Vectors and Scalar Forms . . . . . . . . 10
2.5 DerivativesofTraces. 12
2.6 Derivativesofvectornorms 14
2.7 Derivativesofmatrixnorms 14
2.8 DerivativesofStructuredMatrices .. 14
3 Inverses 17
3.1 Basic. 17 3.2 ExactRelations. 18 3.3 ImplicationonInverses 20 3.4 Approximations. 20 3.5 GeneralizedInverse.. 21 3.6 PseudoInverse . 21
4 Complex Matrices 24
4.1 ComplexDerivatives . 24 4.2 Higherorderandnonlinearderivatives. . . . . . . . . . . . . . . 26 4.3 Inverseofcomplexsum .. 27
5 Solutions and Decompositions 28
5.1 Solutionstolinearequations 28 5.2 EigenvaluesandEigenvectors .. 30 5.3 SingularValueDecomposition.. 31 5.4 TriangularDecomposition . 32 5.5 LUdecomposition .. 32 5.6 LDMdecomposition . 33 5.7 LDLdecompositions . 33
6 Statistics and Probability 34
6.1 DefinitionofMoments 34 6.2 ExpectationofLinearCombinations . 35 6.3 WeightedScalarVariable . 36
7 Multivariate Distributions 37
7.1 Cauchy .. 37 7.2 Dirichlet.. 37 7.3 Normal .. 37 7.4 NormalInverseGamma .. 37 7.5 Gaussian.. 37 7.6 Multinomial 37
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 3
CONTENTS CONTENTS
7.7 Studentst 37 7.8 Wishart.. 38 7.9 Wishart,Inverse 39
8 Gaussians 40
8.1 Basics 40 8.2 Moments . 42 8.3 Miscellaneous.. 44 8.4 MixtureofGaussians. 44
9 Special Matrices 46
9.1 Blockmatrices . 46
9.2 DiscreteFourierTransformMatrix,The . . . . . . . . . . . . . . 47
9.3 HermitianMatricesandskewHermitian . . . . . . . . . . . . . . 48
9.4 IdempotentMatrices. 49
9.5 Orthogonalmatrices . 49
9.6 Positive Definite and Semidefinite Matrices . . . . . . . . . . . . 50
9.7 SingleentryMatrix,The .. 52
9.8 Symmetric, SkewsymmetricAntisymmetric . . . . . . . . . . . . 54
9.9 ToeplitzMatrices 54
9.10Transitionmatrices.. 55 9.11Units,PermutationandShift .. 56 9.12VandermondeMatrices 57
10 Functions and Operators 58
10.1FunctionsandSeries . 58 10.2KroneckerandVecOperator .. 59 10.3VectorNorms.. 61 10.4MatrixNorms.. 61 10.5Rank. 62 10.6 IntegralInvolvingDiracDeltaFunctions . . . . . . . . . . . . . . 62 10.7Miscellaneous.. 63
A Onedimensional Results 64
A.1 Gaussian.. 64 A.2 OneDimensionalMixtureofGaussians. . . . . . . . . . . . . . . 65
B Proofs and Details 66
B.1 MiscProofs 66
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 4
CONTENTS
Notation and Nomenclature
A Matrix
Aij Matrix indexed for some purpose
Ai Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or
The n.th power of a square matrix A1 The inverse matrix of the matrix A
CONTENTS
A The pseudo inverse matrix of the matrix A see Sec. 3.6 A12 The square root of a matrix if unique, not elementwise
Aij The i,j.th entry of the matrix A Aij The i, j .th entry of the matrix A
Aij The ijsubmatrix, i.e. A with i.th row and j.th column deleted a Vector columnvector
ai Vector indexed for some purpose ai The i.th element of the vector a
a Scalar
Rz Real part of a scalar
Rz Real part of a vector
RZ Real part of a matrix
Iz Imaginary part of a scalar Iz Imaginary part of a vector IZ Imaginary part of a matrix
detA Determinant of A TrA Trace of the matrix A
diagA Diagonal matrix of the matrix A, i.e. diagAijij Aij eigA Eigenvalues of the matrix A
vecA The vectorversion of the matrix A see Sec. 10.2.2
sup Supremum of a set
A Matrix norm subscript if any denotes what norm
AT Transposed matrix
AT The inverse of the transposed and vice versa, ATA1TAT 1.
A Complex conjugated matrix
AH Transposed and complex conjugated matrix Hermitian
AB Hadamard elementwise product AB Kronecker product
0 The null matrix. Zero in all entries.
I The identity matrix
Jij The singleentry matrix, 1 at i, jand zero elsewhere
A positive definite matrixA diagonal matrix
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 5
1 Basics
1 BASICS
B1 A1 1
C1 B1 A1 2
A1 T 3
ATBT 4
BT AT 5
CT BT AT 6
A1 H 7
AHBH 8
BH AH 9
CH BH AH 10
1.1 Trace
AB1 ABC1 AT 1
ABT ABT ABCT
AH 1 ABH ABH ABCH
TrA TrA TrA
TrAB TrAB TrABC aTa
1.2 Determinant
Let A be an nn matrix. detA
11 ieigA 12 13 14 TrATrB 15 TrBCATrCAB 16 TraaT17
ii ieigA 18
detcA detATdetAB detA1detAndetIuvT
cn detA,
detA
detA detB
1 detA
detAn 1uTv
if ARnn 19 20 21 22 23 24
i Aiiii,TrAT TrBA
For n2: For n3:
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 6
detIA
1detATrA 25
detIA1detATrA1TrA21TrA2 26 22
1.3 The Special Case 22
1 BASICS
For n4:
detIA
1detATrA 1 2
TrA21TrA2 2
1.3 The Special Case 22
Consider the matrix A
Determinant and trace
Eigenvalues
A11 A12 A A21 A22
detAA11A22A12A21 29 TrAA11A22 30
2 TrAdetA0
1TrA31TrATrA2 1TrA3 27 623
For small , the following approximation holds
detIA1detATrA12TrA212TrA2 28
22
TrATrA24 detA TrATrA24 detA 1 2 2 2
Eigenvectors
Inverse
12TrAA12
12detA
A12
v1 A
111 211
v2 A 1 1A22 A12
A 31 detA A21 A11
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 7
This section is covering differentiation of a number of expressions with respect to a matrix X. Note that it is always assumed that X has no special structure, i.e. that the elements of X are independent e.g. not symmetric, Toeplitz, positive definite. See section 2.8 for differentiation of structured matrices. The basic assumptions can be written in a formula as
Xkl iklj 32Xij
that is for e.g. vector forms,
x xi x x x xi yy yyi yyj
The following rules are general and very useful when deriving the differential of an expression 19:
2 DERIVATIVES
2 Derivatives
2.1
2.1.1
XH Derivatives of a Determinant
i i ij
0
X
XY
Tr X
XYXY XYXY XYXY X1XX1
TradjX X detXTrX1X TrX1X
XT
A is a constant 33 34 35 36 37 38 39 40 41 42 43 44 45
AXXY TrXXYXYXYX1detXdetXlndetXXTXH
General form
detY1Y
k Xik
2 detY
2
x detYTrYx 46 detXXjkijdetX 47
x
Y detY Tr Y1 x
x
1Y1Y
Tr Y x Tr Y x
1 Y 1 Y
Tr Y x Y x 48 PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 8
2.2 Derivatives of an Inverse 2
2.1.2 Linear formsdetX
detXX1 T detXXjkijdetX
DERIVATIVES
49 50
detAXBXT 1 51
X k Xik
detAXB X
detAXBX1T
2.1.3 Square forms
If X is square and invertible, then
detXT AX2 detXT AXXT 52 X
If X is not square but A is symmetric, then
detXT AX2 detXT AXAXXT AX1 53
X
If X is not square and A is not symmetric, then
detXT AXdetXT AXAXXT AX1AT XXT AT X1 54 X
2.1.4 Other nonlinear forms
Some special cases are See 9, 7
ln detXT X X
ln detXT X
X
lndetX
X
detXk
2X T 552XT 56
X1TXT 1 57 k detXk XT 58
Y1
x Y xY 59
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 9
X
2.2 Derivatives of an Inverse
From 27 we have the basic identity
1Y 1
2.3 Derivatives of Eigenvalues
from which it follows
X1klXij
aT X1b X
detX1 X
TrAX1B X
TrXA1 X
X1kiX1jl XT abT XT
detX1X1T X1 BAX1 T
2 DERIVATIVES
60
61 62 63
XA1XA1T 64
result: Let A be an nn invertible square J A is an nn variate and differentiable
From 32 we have the following
matrix, W be the inverse of A, and
function with respect to A, then the partial differentials of J with respect to A and W satisfy
JAT J AT A W
2.3 Derivatives of Eigenvalues
eigX TrXI 65 X X
eigX detXdetXXT 66 X X
If A is real and symmetric, i and vi are distinct eigenvalues and eigenvectors of A see 276 with viT vi1, then 33
2.4
2.4.1
iviT Avi 67 viiIAAvi 68
Derivatives of Matrices, Vectors and Scalar Forms
First Order
aTXb X
aTXTb X
xTa
x x
aTxa 69
abT 70
baT 71
aTXTaaaT 72 X X
aTXa X
Jij
im Anj in Amj
73 Jmn Aij 74 Jnm Aij 75
Xij XAij
Xmn XT Aij
Xmn
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 10
2.4 Derivatives of Matrices, Vectors and Scalar Forms
2 DERIVATIVES
76 77
2.4.2
Second Order
XklXmn Xij klmn
bT XT Xc X
BxbTCDxd x
XT BXklXij
xTBx x
X
Assume W is symmetric, then
xAsTWxAs s
xsTWxs x
xsTWxs s
xAsTWxAs x
xAsTWxAs A
2Xkl kl
XbcT cbT
BTCDxdDTCTBxb 78ljXTBki kjBXil 79
XT BXXij
XT BJijJjiBX Jijklikjl 80 See Sec 9.7 for useful properties of the Singleentry matrix Jij
BBTx 81
bT XT DXc X
DT XbcTDXcbT 82XbcTDXbcDDTXbcbT 83
2AT WxAs 842Wxs 852Wxs 862WxAs 872WxAssT 88
As a case with complex values the following
2.4.3 Higherorder and nonlinear
n1
XnklXrJijXn1rkl 90
Xij r0 For proof of the above, see B.1.3.
n1
XaT XnbXrT abT Xn1rT 91
r0
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 11
holds
2baxH b 89 This formula is also known from the LMS algorithm 14
axHb2 x
2.5 Derivatives of Traces
XaT XnT Xnb
2 DERIVATIVES
n1
Xn1rabT XnT Xr
r0
XrT XnabT Xn1rT92
See B.1.3 for a proof.
Assume s and r are functions of x, i.e. ssx,rrx, and that A is a constant, then
sT rT
xsTArx Ar x ATs 93
AxTAx xTATAx 94 x BxT Bx x xT BT Bx
2 ATAx 2xTATAxBTBx 95 xT BBx xT BT Bx2
2.4.4 Gradient and Hessian
Using the above we have for the gradient and the Hessian
fxTAxbTx 96
xff x
2f xxT
2.5 Derivatives of Traces
Assume FX to be a differentiable function of each of the elements of X. It
AATxb 97AAT 98
then holds that
where f is the scalar derivative of F.
TrFXfXT X
2.5.1 First Order
TrX X
TrXA X
TrAXB X
TrAXT B X
TrXT A X
TrAXTX
TrAX X
I
AT
AT BTBA
A
A
TrAI
99 100 101 102 103 104 105
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 12
2.5 Derivatives of Traces
2
2XT
XBBXT
BXBTX
BXBTX
BXBTX
XBTXB
XBTXB
XBTXB
ATXTBT BTXTAT
TrX2 X
TrX2B X
TrXT BX X
TrBXXTX
TrXXT B X
TrXBXTX
TrBXT X X
TrXT XB X
TrAXBX X
106 107 108 109 110 111 112 113 114 115 116 117 118 119
DERIVATIVES
2.5.2
Second Order
TrXTX
X X
TrXXT2XCTXBBT CXBBTBXCBTXCT
ATCTXBT CAXB2AT AXBCBT
X
TrBT XT CXB X
TrXTBXC X
TrAXBXT C X
TrAXBCAXBCT
X
TrXTrX2TrXI120 X
TrXX
TrXk X
XTrAXkTr BT XT CXXT CXB
X
See 7.
2.5.3 Higher Order
kXk1Tk1
121 122
123
XrAXkr1T r0
CXXT CXBBT CTXBBTXTCTX
CXBBT XT CX CT XXT CT XBBT
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 13
2.6 Derivatives of vector norms 2 DERIVATIVES
2.5.4 Other
TrAX1BX1BAX1TXT AT BT XT 124
X
Assume B and C to be symmetric, then
TrXT CX1AX
TrXT CX1XT BXX
TrAXT CX1XT BXX
See 7.
CXXT CX1AAT XT CX1
2CXXT CX1XT BXXT CX1 2BXXT CX1
125
TrsinX X
cosXT 2.6 Derivatives of vector norms
126 2CXAXT CX1XT BXAXT CX1
2BXAXT CX1
127
128
129 130 131
2.6.1 Twonorm
xa2x
xa xa2
xaxaT 222
x2xT x22x x x
2.7 Derivatives of matrix norms
For more on matrix norms, see Sec. 10.4.
2.7.1 Frobenius norm
X2F TrXXH2X
xa I
xxaxaxa3
132 See 248. Note that this is also a special case of the result in equation 119.
X X
2.8 Derivatives of Structured Matrices
Assume that the matrix A has some structure, i.e. symmetric, toeplitz, etc. In that case the derivatives of the previous section does not apply in general. Instead, consider the following general rule for differentiating a scalar function f A
fT A dAij kl Akl Aij A Aij
df f A
kl Tr
133
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 14
2.8 Derivatives of Structured Matrices 2 DERIVATIVES
The matrix differentiated with respect to itself is in this document referred to as the structure matrix of A and is defined simply by
A Sij 134Aij
If A has no special structure we have simply SijJij, that is, the structure matrix is simply the singleentry matrix. Many structures have a representation in singleentry matrices, see Sec. 9.7.6 for more examples of structure matrices.
2.8.1 The Chain Rule
Sometimes the objective is to find the derivative of a matrix which is a function of another matrix. Let UfX, the goal is to find the derivative of the function gU with respect to X:
gUgfX X X
Then the Chain Rule can then be written the following way:
gU gU M N
135
136
137
138
139 140 141
142
gU ukl X xij k1 l1 ukl xij
Using matrix notation, this can be written as: gUTrgUT
Xij U
U . Xij
2.8.2 Symmetric
If A is symmetric, then SijJijJjiJij Jij and therefore
dff f TfdAAA diagA
That is, e.g., 5:
TrAX
X
detX
X
ln detX
X 2.8.3 Diagonal
AAT AI, see 142
detX2X1X1I
2X1 X1 I
If X is diagonal, then 19:
TrAX
X
AI
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 15
2.8 Derivatives of Structured Matrices 2 DERIVATIVES
2.8.4 Toeplitz
Like symmetric matrices and diagonal matrices also Toeplitz matrices has a special structure which should be taken into account when the derivative with respect to a matrix with Toeplitz structure.
TrAT T
TrTA T
TrA
TrAT 1n
. . .. ..
143
TrAT1n2,n1
. .
A
TrAT n1 TrA
TrAT 1nn1,2
.. ..
.. ..
TrAT1nn1,2
A1nTrAT 1n2,n1 TrAT 1n
As it can be seen, the derivative A also has a Toeplitz structure. Each value in the diagonal is the sum of all the diagonal valued in A, the values in the diagonals next to the main diagonal equal the sum of the diagonal next to the main diagonal in AT . This result is only valid for the unconstrained Toeplitz matrix. If the Toeplitz matrix also is symmetric, the same derivative yields
TrATTrTAAATAI 144 T T
An1. . .
TrAT n1 TrA
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 16
AA1A1AI,
where I is the nn identity matrix. If A1 exists, A is said to be nonsingular.
Otherwise, A is said to be singular see e.g. 12. 3.1.2 Cofactors and Adjoint
The submatrix of a matrix A, denoted by Aij is a n1n1 matrix obtained by deleting the ith row and the jth column of A. The i,j cofactor of a matrix is defined as
cofA, i, j 1ij detAij , The matrix of cofactors can be created from the cofactors
cofA, 1, 1cofA, 1, n
. .cofA . cofA,i,j .
cofA,n,1cofA,n,n The adjoint matrix is the transpose of the cofactor matrix
adjAcofAT ,
3.1.3 Determinant
The determinant of a matrix ACnn is defined as see 12
146
147
148
149 150
151
n
1j1A1j detA1j j1
detA
A1j cofA, 1, j.
3.1.4 Construction
detA For the case of 22 matrices, see section 1.3.
n j1
The inverse matrix can be constructed, using the adjoint matrix, by A11adjA
3 INVERSES
3 Inverses
3.1 Basic
3.1.1 Definition
The inverse A1 of a matrix ACnn is defined such that
145
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 17
3.2 Exact Relations 3 INVERSES
3.1.5 Condition number
The condition number of a matrix cA is the ratio between the largest and the
smallest singular value of a matrix see Section 5.3 on singular values,
cAd 152
d
The condition number can be used to measure how singular a matrix is. If the condition number is large, it indicates that the matrix is nearly singular. The condition number can also be estimated from the matrix norms. Here
cAAA1, 153
whereis a norm such as e.g the 1norm, the 2norm, the norm or the Frobenius norm see Sec 10.4 for more on matrix norms.
The 2norm of A equals maxeigAHA 12, p.57. For a symmetric matrix, this reduces to A2maxeigA 12, p.394. If the matrix is symmetric and positive definite, A2maxeigA. The condition number based on the 2norm thus reduces to
A2A12maxeigAmaxeigA1maxeigA. mineigA
3.2 Exact Relations
3.2.1 Basic
AB1B1A1 3.2.2 The Woodbury identity
154
155
The Woodbury identity comes in many variants. The latter of the two can be found in 12
ACBCT 1A1A1CB1CT A1C1CT A1 AUBV1A1A1UB1VA1U1VA1
If P, R are positive definite, then see 30
P1BT R1B1BT R1PBT BPBTR1
3.2.3 The Kailath Variant
ABC1A1A1BICA1B1CA1
See 4, page 153.
3.2.4 ShermanMorrison
T 1 1 A1bcTA1
156 157
158
159
AbcA1cTA1b
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 18
160
3.2 Exact Relations
3.2.5 The Searle Set of Identities
The following set of identities, can be found in 25, page 151,
3 INVERSES
IA11 ABBT 1B
A1B11 AAAB1A
A1B1 IAB1
AAI1
A1 BIBT A1 B1
AAB1BBAB1ABBAB1B
A1 ABB1
IAIBA1B
AIBA1
Denote AXT X1 and that X is extended to include a new column vector in the end X X v. Then 34
AXTvvTvvTXAXTv
1
The following is a rank1 update for the MoorePenrose pseudoinverse of real valued matrices and proof can be found in 18. The matrix G is defined below:
IAB1A
3.2.6 Rank1 update of inverse of inner product
161 162 163 164 165 166 167
A AXTvvTXATT 1 vTvvTXAXTv
X XvTXAT
vT vvT XAXT v
vT vvT XAXT v 3.2.7 Rank1 update of MoorePenrose Inverse
Using the the notation
AcdT A G
168
169 170 171 172 173
vn
wm
1dTAc Ac
ATd
IAAc IAATd
the solution is given as six different cases, depending on the entities w,
m, and . Please note, that for any column vector v it holds that v
vT vT v1vT . The solution is: v2
Case1of6: Ifw 0andm 0. Then
GvwmT nTmT w
1 vwT1 mnT mwT w2 m2 m2w2
Case2of6: Ifw0andm 0and0. Then GvvAmT nT
1 vvTA 1 mnT v2 m2
174 175
176 177
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 19
3.3 Implication on Inverses 3 INVERSES
Case3of6: Ifw0and0. Then
1v2 m2 T
GmvTAv2m22mv
Case4of6: Ifw 0andm0and0. Then GAnnvw
1AnnT 1vwT n2 w2
Case5of6: Ifm0and0. Then
1w2 n2 T
ATvn 178
G AnwT n2w2 2Anvwn Case6of6: Ifw0andm0and0. Then
179 180
181
182 183
184
185
holds
186
187
GvvAAnnvAnvn
1 vvTA 1 AnnT vTAnvnT
3.3
v2 n2 v2n2 Implication on Inverses
AB1 A1 B1 then AB1ABA1B 3.3.1 A PosDef identity
Assume P, R to be positive definite and invertible, then
P1BT R1B1BT R1PBT BPBTR1
See 30.
3.4 Approximations
The following identity is known as the Neuman series of a matrix, which when i1 for all eigenvalues i
If See 25.
which is equivalent to
IA1 An
n0
IA11nAn
n0
When i1 for all eigenvalues i, it holds that A0 for n, and the
following approximations holds
IA1IAA2 188
IA1IAA2 189 PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 20
3.5 Generalized Inverse 3 INVERSES
The following approximation is from 22 and holds when A large and symmetric
AAIA 1 AIA1 If 2 is small compared to Q and M then
190
191
192 193 194 195
196 197
Proof:
Q2M1Q12Q1MQ1
Q2M1 QQ1Q2MQ1Q1 I2MQ1Q1 Q1I2MQ11
Q12Q1MQ1
This can be rewritten using the Taylor expansion: Q1I2MQ11
Q1I2MQ12MQ12
3.5 Generalized Inverse
3.5.1 Definition
A generalized inverse matrix of the matrix A is any matrix A such that see
26
The matrix A is not unique.
3.6 Pseudo Inverse
AAAA 198
3.6.1 Definition
The pseudo inverse or MoorePenrose inverse of a matrix A is the matrix A
that fulfils
I AAAA
II AAAA
III AA symmetric
IV A A symmetric
The matrix A is unique and does always exist. Note that in case of com plex matrices, the symmetric condition is substituted by a condition of being Hermitian.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 21
3.6 Pseudo Inverse 3 INVERSES
3.6.2 Properties
Assume A to be the pseudoinverse
of A, then See 3 for some of them
AATAHAA AAH
A AAT cA
A A
AT A AAT
A A
AH A AAHAB fAHAf0I fAAHf0I
A
A T
A H
AAH
AT
1cA AT A AT
AT AATA AT
ATA
AH A AH
AH AAHA AH
AHA
A AB ABBAfAAHf0IA Af AH Af 0IA
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
217 218 219 220
221 222
where ACnm.
Assume A to have full rank, then
AA AA A AA ATrAA TrA A
For two matrices it hold that AB
AB
3.6.3 Construction
Assume that A has full rank, then
AA
AA rankAArankA A
See See
26 26
Ann Square Anm Broad Anm Tall
rankAnrankAnrankAm
A A1
A ATAAT1 A ATA1AT
A AB ABBAB
The socalled broad version is also known as right inverse and the tall ver sion as the left inverse.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 22
3.6 Pseudo Inverse 3 INVERSES
Assume A does not have full rank, i.e. A is nm and rankArminn, m. The pseudo inverse A can be constructed from the singular value decomposition AUDVT , by
AV D1UT 223 rrr
where Ur,Dr, and Vr are the matrices with the degenerated rows and columns deleted. A different way is this: There do always exist two matrices C nr and D rm of rank r, such that ACD. Using these matrices it holds that
ADT DDT 1CT C1CT 224
See 3.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 23
4 Complex Matrices
The complex scalar product rpq can be written as Rr Rp IpRq
IrIp Rp Iq 225 4.1 Complex Derivatives
In order to differentiate an expression fz with respect to a complex z, the CauchyRiemann equations have to be satisfied 7:
dfzRfziIfz dz Rz Rz
226
227
228
4 COMPLEX MATRICES
and
or in a more compact form:
dfziRfzIfz dz Iz Iz
fzifz. Iz Rz
A complex function that satisfies the CauchyRiemann equations for points in a region R is said yo be analytic in this region R. In general, expressions involving complex conjugate or conjugate transpose do not satisfy the CauchyRiemann equations. In order to avoid this problem, a more generalized definition of complex derivative is used 24, 6:
Generalized Complex Derivative:
dfz1fzifz.
dz 2 Rz IzConjugate Complex Derivative
dfz1fzifz. dz 2 Rz Iz
229
230
The Generalized Complex Derivative equals the normal derivative, when f is an analytic function. For a nonanalytic function such as fzz, the derivative equals zero. The Conjugate Complex Derivative equals zero, when f is an analytic function. The Conjugate Complex Derivative has e.g been used by 21 when deriving a complex gradient.
Notice:
dfzfzifz. 231 dz Rz Iz
Complex Gradient Vector: If f is a real function of a complex vector z, then the complex gradient vector is given by 14, p. 798
f z2 df z 232 dz
fzifz. Rz Iz
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 24
4.1
Complex Derivatives 4 COMPLEX MATRICES
Complex Gradient Matrix: If f is a real function of a complex matrix Z, then the complex gradient matrix is given by 2
f Z2 df Z dZ
fZifZ. RZ IZ
These expressions can be used for gradient descent algorithms.
The Chain Rule for complex numbers
233
4.1.1
The chain rule is a little more complicated when the function of a complex ufx is nonanalytic. For a nonanalytic function, the following chain rule can be applied 7
gu g u g u
xuxu x 234
gugu u x u x
Notice, if the function is analytic, the second term reduces to zero, and the func tion is reduced to the normal wellknown chain rule. For the matrix derivative of a scalar function gU, the chain rule can be written the following way:
gU T gU T
gUTr UUTrUU. 235
X X X 4.1.2 Complex Derivatives of Traces
If the derivatives involve complex numbers, the conjugate transpose is often in volved. The most useful way to show complex derivative is to show the derivative with respect to the real and the imaginary part separately. An easy example is:
TrXTrXHI 236 RX RX
iTrXiTrXHI 237 IX IX
Since the two results have the same sign, the conjugate complex derivative 230 should be used.
TrXTrXT I 238 RX RX
iTrXiTrXTI 239 IX IX
Here, the two results have different signs, and the generalized complex derivative 229 should be used. Hereby, it can be seen that 100 holds even if X is a complex number.
240 241
IX
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 25
TrAXH RX
A iTrAXHA
4.2 Higher order and nonlinear derivatives 4 COMPLEX MATRICES
242 243
244 245
246
247
TrAXRX
AT iTrAXAT
IX
TrXXHTrXHX2RX RX RX
iTrXXHiTrXHXi2IX IX IX
By inserting 244 and 245 in 229 and 230, it can be seen that
TrXXHX X
TrXXHX X
Since the function TrXXH is a real function of the complex matrix X, the complex gradient matrix 233 is given by
TrXXH2TrXXH2X 248 X
4.1.3 Complex Derivative Involving Determinants
Here, a calculation example is provided. The objective is to find the derivative of detXHAX with respect to XCmn. The derivative is found with respect to the real part and the imaginary part of X, by use of 42 and 37, detXH AX can be calculated as see App. B.1.4 for details
detXHAX1 detXHAXi detXHAX X 2 RX IX
detXH AXXH AX1 XH AT and the complex conjugate derivative yields
249
250
251 252
detXHAXX
1 detXHAXi detXHAX 2 RX IX
detXH AXAXXH AX1 4.2 Higher order and nonlinear derivatives
AxHAx xHAHAx x BxH Bx x xH BH Bx
2 AHAx 2xHAHAxBHBx xH BBx xH BH Bx2
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 26
4.3 Inverse of complex sum 4 COMPLEX MATRICES
4.3 Inverse of complex sum
Given real matrices A, B find the inverse of the complex sum AiB. Form the auxiliary matrices
EAtB 253 FBtA, 254
and find a value of t such that E1 exists. Then
AiB1
1itEiF1 2551itEFE1 F1iEFE1 F1 FE1 256
1itEFE1 F1 IiFE1
EFE1F1ItFE1itIFE1EFE1 F1 ItFE1
iEFE1F1tIFE1
257 258
259
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 27
5
5.1
5.1.1
5 SOLUTIONS AND DECOMPOSITIONS
Solutions and Decompositions
Solutions to linear equations
Simple Linear Regression
Assume we have data xn , ynfor n1, , N and are seeking the parameters a, bR such that yiaxib. With a least squares error function, the optimal values for a, b can be expressed using the notation
xx1,,xNT yy1,,yNT 11,,1TRN1 and
as
Rxx xTx Rx1 xT1 R11 1T1 Ryx yTx Ry1 yT1
a Rxx Rx1 1Rx,ybRRR 260
x111 y1 Existence in Linear Systems
5.1.2
Assume A is nm and consider the linear system
Axb Construct the augmented matrix BA b then
261
Condition
rankArankBm rankArankBm rankArankB
5.1.3 Standard Square
Assume A is square and invertible, then
Solution
Unique solution x Many solutions x No solutions x
xA1b AssumeAisnnbutofrankrn. Inthatcase,thesystemAxbissolved
Axb5.1.4 Degenerated Square
262
xAb
where A is the pseudoinverse of the rankdeficient matrix, constructed as
described in section 3.6.3.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 28
by
5.1 Solutions to linear equations5 SOLUTIONS AND DECOMPOSITIONS
5.1.5 Cramers rule
The equation
where A is square has exactly one solution x if the ith element in x can be
Axb, 263
found as
where B equals A, but the ith column in A has been substituted by b.
5.1.6 Overdetermined Rectangular
Assume A to be nm, nm tall and rankAm, then
xidetB, det A
264
AxbxAT A1AT bAb
that is if there exists a solution x at all! If there is no solution the following
can be useful:
Now xmin is the vector x which minimizes Axb2, i.e. the vector which is
Axbxmin Ab 266 least wrong. The matrix A is the pseudoinverse of A. See 3.
5.1.7 Underdetermined Rectangular
Assume A is nm and nm broad and rankAn.
Axbxmin ATAAT1b 267
The equation have many solutions x. But xmin is the solution which minimizes Axb2 and also the solution with the smallest norm x2 . The same holds for a matrix version: Assume A is nm, X is mn and B is nn, then
AXBXmin AB 268
The equation have many solutions X. But Xmin is the solution which minimizes AXB2 and also the solution with the smallest norm X2. See 3.
Similar but different: Assume A is square nn and the matrices B0 , B1 arenN,whereNn,thenifB0 hasmaximalrank
AB0B1AminB1BT0 B0BT0 1 269
where Amin denotes the matrix which is optimal in a least square sense. An interpretation is that A is the linear approximation which maps the columns vectors of B0 into the columns vectors of B1.
5.1.8 Linear form and zeros
Ax0, x
5.1.9 Square form and zeros
If A is symmetric, then
A0
270
xT Ax0, x
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 29
A0
271
265
5.2 EigenvaluesandEigenvectors5 SOLUTIONSANDDECOMPOSITIONS
5.1.10 The Lyapunov Equation
AXXBC 272
vecXIABTI1vecC 273 Sec 10.2.1 and 10.2.2 for details on the Kronecker product and the vec op
erator.
5.1.11 Encapsulating Sum
nAnXBnC 274
vecXnBTnAn1 vecC 275 See Sec 10.2.1 and 10.2.2 for details on the Kronecker product and the vec
operator.
5.2 Eigenvalues and Eigenvectors
5.2.1 Definition
The eigenvectors vi and eigenvalues i are the ones satisfying
Aviivi
276
5.2.2 Decompositions
For matrices A with as many distinct eigenvalues as dimensions, the following
holds, where the columns of V are the eigenvectors and Dijiji,
AVVD 277
For defective matrices A, which is matrices which has fewer distinct eigenvalues than dimensions, the following decomposition called Jordan canonical form, holds
AVVJ 278
where J is a block diagonal matrix with the blocks JiiIN. The matrices Ji have dimensionality as the number of identical eigenvalues equal to i, and N is square matrix of same size with 1 on the super diagonal and zero elsewhere.
It also holds that for all matrices A there exists matrices V and R such that
AVVR
where R is upper triangular with the eigenvalues i on its diagonal.
5.2.3 General Properties
Assume that ARnm and BRmn,
eigABeigBA
rankArAt most r nonzero i
279
280 281
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 30
5.3 Singular Value Decompositio5n
SOLUTIONS AND DECOMPOSITIONS
5.2.4 Symmetric
Assume A is symmetric, then
VVTiTrAp eigIcAeigAcIeigA1
For a symmetric, positive matrix A,
eigAT AeigAAT eigAeigA
5.2.5 Characteristic polynomial
The characteristic polynomial for the matrix A is
0detAI
ng1n1g2n21ngn
I
R
i.e. V is orthogonal i.e. i is real
282 283 284 285 286 287
288
289 290
ipi 1ci
ic 1
i
Note that the coefficients gj for j1, , n are the n invariants under rotation of A. Thus, gj is the sum of the determinants of all the submatrices of A taken j rows and columns at a time. That is, g1 is the trace of A, and g2 is the sum of the determinants of the nn12 submatrices that can be formed from A by deleting all but two rows and columns, and so onsee 17.
5.3 Singular Value Decomposition
Any nm matrix A can be written as
AUDVT ,
291
292
AVDVT ,
where D is diagonal with the eigenvalues of A, and V is orthogonal and the
eigenvectors of A.
5.3.2 Square decomposed into squares
Assume ARnn. Then
AVDUT , 294
where D is diagonal with the square root of the eigenvalues of AAT , V is the eigenvectors of AAT and UT is the eigenvectors of AT A.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 31
where
eigenvectors of AAT diageigAATeigenvectors of ATA
nn nm mm
UDV
5.3.1 Symmetric Square decomposed into squares Assume A to be nn and symmetric. Then
293
5.4 Triangular Decomposition 5 SOLUTIONS AND DECOMPOSITIONS
5.3.3 Square decomposed into rectangular
Assume VDUT0 then we can expand the SVD of A into
D0UT AVV0DUT, 295
where the SVD of A is AVDUT .
5.3.4 Rectangular decomposition I AssumeAisnm,Visnn,Disnn,UT isnm
AVDUT , 296 where D is diagonal with the square root of the eigenvalues of AAT , V is the
eigenvectors of AAT and UT is the eigenvectors of AT A. 5.3.5 Rectangular decomposition II
AssumeAisnm,Visnm,Dismm,UT ismm
AVDUT5.3.6 Rectangular decomposition III
AssumeAisnm,Visnn,Disnm,UT ismm
297
AVDUT ,
where D is diagonal with the square root of the eigenvalues of AAT , V is the
eigenvectors of AAT and UT is the eigenvectors of AT A.
5.4 Triangular Decomposition
5.5 LU decomposition
Assume A is a square matrix with nonzero leading principal minors, then
ALU 299 where L is a unique unit lower triangular matrix and U is a unique upper
triangular matrix.
5.5.1 Choleskydecomposition
Assume A is a symmetric positive definite square matrix, then
AUT ULLT , 300 where U is a unique upper triangular matrix and L is a lower triangular matrix.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 32
298
5.6 LDM decomposition 5 SOLUTIONS AND DECOMPOSITIONS
5.6 LDM decomposition
Assume A is a square matrix with nonzero leading principal minors1, then
ALDMT 301
where L, M are unique unit lower triangular matrices and D is a unique diagonal matrix.
5.7 LDL decompositions
The LDL decomposition are special cases of the LDM decomposition. Assume A is a nonsingular symmetric definite square matrix, then
ALDLT LTDL 302 where L is a unit lower triangular matrix and D is a diagonal matrix. If A is
also positive definite, then D has strictly positive diagonal entries.
1If the matrix that corresponds to a principal minor is a quadratic upperleft part of the larger matrix i.e., it consists of matrix elements in rows and columns from 1 to k, then the principal minor is called a leading principal minor. For an n times n square matrix, there are n leading principal minors. 31
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 33
or alternatively as
Mijxixixjxj MxmxmT
6.1.3 Third moments
The matrix of third centralized momentsin some contexts referred coskewnessis defined using the notation
as
m3xixixjxjxkxk ijk
3 3 3 M3m::1 m::2 m::n
6 STATISTICS AND PROBABILITY
6 Statistics and Probability 6.1 Definition of Moments
Assume xRn1 is a random variable 6.1.1 Mean
The vector of means, m, is defined by
mixi
6.1.2 Covariance
The matrix of covariance M is defined by
303
304 305
to as
306
307
where : denotes all elements within the given index. M3 can alternatively be expressed as
M3 xmxmT xmT 6.1.4 Fourth moments
The matrix of fourth centralized momentsin some contexts referred cokurtosisis defined using the notation
308
to as 309
310
311
m4xixixjxjxkxkxlxl ijkl
Mm4 m4 m4 m4 m4 m4 m4 m4 m44 ::11 ::21 ::n1 ::12 ::22 ::n2 ::1n ::2n ::nn
or alternatively as
M4 xmxmT xmT xmT
as
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 34
6.2 Expectation of Linear Combinatio6ns STATISTICS AND PROBABILITY
6.2 Expectation of Linear Combinations
6.2.1 Linear Forms
Assume X and x to be a matrix and a vector of random variables. Then see
See 26
EAxbEAxExb
6.2.2 Quadratic Forms
Assume A is symmetric, cEx and coordinates xi are independent, have the and denote adiagA. Then See 26
Amb Am mb
EAXBCAEXBC VarAxAVarxAT
312 313 314
315 316 317
CovAx, ByACovx, yBT Assume x to be a stochastic vector with mean m, then see 7
ExT AxTrAcT Ac
VarxT Ax22TrA242cT A2c43cT Aa432aT a
Also, assume x to be a stochastic vector with mean m, and covariance M. see 7
318 319
Then
320 321 322 323 324 325
326 327 328 329 330
EAxaBxbTExxTE xaT x E xT axTE AxAxTExaxaT
EAxaTBxbExT xExT AxEAxT AxExaTxa
See 7.
AMBTAmaBmbT MmmT
MmmTa
aTMmmT
AMmmT AT MmamaT
TrAMBTAmaTBmb TrMmTm
TrAMmT Am
TrAMAT AmT Am TrMmaTma
Varx. Assume also that all same central moments 1,2,3,4
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 35
6.3 Weighted Scalar Variable 6 STATISTICS AND PROBABILITY
6.2.3 Cubic Forms
Assume x to be a stochastic vector with independent coordinates, mean m, covariance M and central moments v3Exm3. Then see 7
EAxaBxbTCxc
ExxT xEAxaAxaTAxa
EAxabTCxcDxdT
6.3 Weighted Scalar Variable
AdiagBT Cv3
TrBMCT Ama
AMCT Bmb
AMBT AmaBmbTCmc
v3 2MmTrMmTmm
AdiagAT Av3
2AMAT AxaAxaTAma TrAMAT Ama
AxabTCMDT CmcDmdT AMCT AmaCmcTbDmdT bTCmcAMDT AmaDmdT
Assume xRn1 is a random variable, wRn1 is a vector of constants and y is the linear combination ywT x. Assume further that m, M2, M3, M4 denotes the mean, covariance, and central third and fourth moment matrix of the variable x. Then it holds that
ywT m yy2wT M2w
yy3wTM3ww yy4wTM4www
331 332 333 334
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 36
7 MULTIVARIATE DISTRIBUTIONS
7 Multivariate Distributions 7.1 Cauchy
The density function for a Cauchy distributed vector tRP1, is given by
1Pdet12 pt, P2 2
335
12 1tT1t1P2
whereis the location,is positive definite, anddenotes the gamma func
tion. The Cauchy distribution is a special case of the Studentt distribution.
7.2 Dirichlet
The Dirichlet distribution is a kind of inverse distribution compared to the multinomial distribution on the bounded continuous variate xx1,,xP 16, p. 44
7.3 Normal
The normal distribution is also known as a Gaussian distribution. See sec. 8.
7.4 NormalInverse Gamma 7.5 Gaussian
See sec. 8.
7.6 Multinomial
If the vector n contains counts, i.e. ni0, 1, 2, , then the discrete multino mial disitrbution for n is given by
336
337
whereis the location, the scale matrixis symmetric, positive definite,is the degrees of freedom, anddenotes the gamma function. For 1, the Studentt distribution becomes the Cauchy distribution see sec 7.1.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 37
P P
pp p1
pxP p xp pp
n!d d Pna,n ani, ni n
n1!nd! i ii
where ai are probabilities, i.e. 0ai1 and i ai1. 7.7 Students t
The density of a Studentt distributed vector tRP1, is given by
Pdet12 pt, , P2 2
2 11tT1tP 2
7.8 Wishart
7.7.1 Mean
7.7.2 Variance
7.7.3 Mode
7
Et,
covt 2
MULTIVARIATE DISTRIBUTIONS
The notion mode meaning the position of the most probable value modet
7.7.4 Full Matrix Version
If instead of a vector tRP1 one has a matrix TRPN, then the Studentt
distribution for T is
pTM,,,NP2P Pp12
pM,m
7.8.1 Mean
1
2mP2PP14 P 1m1p
p2
detm2 detMmP 12
1 1exp2 Tr M
EMm
p1
,
1
2
338
339
340
p12det2 detN2
det 1TM1TMTP 2341 where M is the location,is the rescaling matrix,is positive definite,is
the degrees of freedom, anddenotes the gamma function.
7.8 Wishart
The central Wishart distribution for MRP P , M is positive definite, where m can be regarded as a degree of freedom parameter 16, equation 3.8.1 8, section 2.5,11
342
343
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 38
7.9 Wishart, Inverse 7 MULTIVARIATE DISTRIBUTIONS
7.9 Wishart, Inverse
The normal Inverse Wishart distribution for MRP P , M is positive defi nite, where m can be regarded as a degree of freedom parameter 11
pM,m
7.9.1 Mean
1
2mP2PP14 P 1m1p
p2
detm2 detMmP 12
1 1 exp2 TrM
EM 1 mP1
344
345
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 39
8 Gaussians
8.1 Basics
8.1.1 Density and normalization The density of xNm, is
1 1 T1
pxdet2 exp 2xmxm 346
Note that if x is ddimensional, then det22d det. Integration and normalization
1 T1
exp 2xmxm dxdet2
1T1 T11T1 exp2x xm xdxdet2exp2m m
1TT 1TT exp 2x Axc x dxdet2A1exp 2c A c
8 GAUSSIANS
If Xx1x2xn and Cc1c2cn, then
1T T 1n1T1
exp 2TrX AXTrC X dXdet2Aexp 2TrC The derivatives of the density are
A C
347 348
349
350 351
x x
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 40
px x
px1xm
2ppx1xmxmT 11
xxT
8.1.2 Marginal Distribution
Assume xNx,where
xa a a c
then
x x
Tbbcb
pxaNxa a, a
pxbNxb b, b
8.1.3 Conditional Distribution Assume xNx,where
xa a a
c T
352
bbcb
8.1 Basics
8 GAUSSIANS
1x
a a cb b b353
a a c 1Tc b
355
356
357
then
pxxN, a b x a a a
trix, see 9.1.5 for details.
8.1.4 Linear combination
Assume xNmx,x and yNmy,y then
AxBycNAmx Bmy c,AxAT ByBT 8.1.5 Rearranging Means
T 1x
b b c a a a 354
pxbxaNxb b,b
Note, that the covariance matrices are the Schur complement of the block ma
T 1bbcac
det2AT 1A1 1 NAxm, det2 NxA
m, A
T
1 1 A
If A is square and invertible, it simplifies to
NAxm, 1 NxA1m, AT 1A1
detA 8.1.6 Rearranging into squared form
If A is symmetric, then
1xTAxbTx1xA1bTAxA1b 1bTA1b
1TrXT AXTrBT X1TrXA1BT AXA1B1TrBT A1B 222
222
8.1.7 Sum of two squared forms
In vector formulation assuming 1 , 2 are symmetric
1xm T1xm2111
1xm T1xm2222
1xm T1xm C 2ccc
111 c12
m1111m1mc121122
358 359 360
361 362
C1mT 1mT 11111m1m 363 21122121122
1mT 1mmT 1m364 2111222
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 41
8.2 Moments
8 GAUSSIANS
365 366 367
368 369
C1Tr1M1M T 1111M1M21122121122
In a trace formulation assuming 1,2 are symmetric 1TrXM T1XM
2111 1TrXM T1XM
2222
1TrXM T1XM C
2ccc M1111M1M
111 c12
c121122
1TrMT 1MMT 1M2111222
8.1.8 Product of gaussian densities Let Nxm, denote a density of x, then
Nxm1, 1Nxm2, 2ccNxmc, c ccNm1 m2, 12
370
371
1 1 T 1
det212 exp 2m1m2 12 m1111m1m
m1m2
c121122 111
c12
but note that the product is not normalized as a density of x.
8.2 Moments
8.2.1 Mean and covariance of linear forms First and second moments. Assume xN m,
Exm
Covx, xVarxExxT ExExT ExxT mmT As for any other distribution is holds for gaussians that
E AxAE x VarAxAVarxAT
CovAx, ByACovx, yBT
372
373
374 375 376
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 42
8.2 Moments
8.2.2 Mean and variance of square forms
Mean and variance of square forms: Assume xN m,
8 GAUSSIANS
377 378
379 380
381
382
ExmTAxmIf 2I and A is symmetric, then
mTAATAATm mmTAmmTrA
ExxTExT Ax VarxT Ax
mmT
TrAmT Am
TrAAAT
VarxT Ax24TrA242mT A2m Assume xN0,2I and A and B to be symmetric, then
CovxT Ax, xT Bx24TrAB
8.2.3 Cubic forms
Assume x to be a stochastic vector with independent coordinates, mean m and
covariance M
ExbTxxTmbTMmmTMmmTbmT
8.2.4
bT mMmmTMean of Quartic Forms
383
ExxTxxTExxTAxxT
2mmT2 mTmmmT
TrmmTmmTAATmmT mTAmmmTTrAmmT 2Tr24mT mTrmT m2 TrABBTmTAATBBTm TrAmT AmTrBmT Bm
ExT xxT xExTAxxTBx
EaT xbT xcT xdT x
aTmmTbcTmmTd
aT mmT cbT mmT d aTmmTdbTmmTc2aTmbTmcTmdTm
EAxaBxbTCxcDxdT
ABT AmaBmbTCDT CmcDmdT
ACT AmaCmcTBDT BmbDmdT BmbTCmcADT AmaDmdT TrBCTADT AmaDmdT
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 43
8.3 Miscellaneous 8 GAUSSIANS
EAxaTBxbCxcTDxdTrACT DDT CBT
AmaTBBmbTACTDmdDTCmc TrABT AmaT BmbTrCDT CmcT Dmd
See 7.
8.2.5 Moments
Exkmk
384 385
Covx
8.3 Miscellaneous
k
kk kmkmTkmkmTk
k k
8.3.1 Whitening Assume xN m,then
z12xmN 0, I
Conversely having zN 0, I one can generate data xN m,by setting
x12zmN m,387 Note that 12 means the matrix which fulfils 1212, and that it exists
and is unique sinceis positive definite.
8.3.2 The ChiSquare connection
Assume xN m,and x to be n dimensional, then
zxmT 1xm2n
where 2n denotes the Chi square distribution with n degrees of freedom.
8.3.3 Entropy
Entropy of a Ddimensional gaussian D
Hx Nm,lnNm,dxln det2 2
8.4 Mixture of Gaussians
388
389
386
8.4.1 Density
The variable x is distributed as a mixture of gaussians if it has the density
K 1 1 T1
k det2k exp 2xmk k xmk 390 where k sum to 1 and the k all are positive definite.
px
k1
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 44
8.4 Mixture of Gaussians
8.4.2 Derivatives
Defining psk kNsk,k one get
ln ps jNsj,j
N,lnjNsj,j
j kkskkj
ln ps
8 GAUSSIANS
391
392
393
394
395
jNsj,j 1 N,
kkskkj
jNsj,j
N,lnjNsj,j
j kkskkj jNsj,j1
k kNsk,k j sj
lnps jNsj,j
N,lnjNsj,j
j kkskkj jNsj,j 1 1 1
T 1
k kNsk,k 2 jj sjsj j 396
But k and k needs to be constrained.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 45
Assuming the dimensions of the blocks matches we have
A11 A12B11 B12 A11B11 A12B21 A11B12 A12B22AABBABABABAB
9.1.2
The Determinant
The determinant can be expressed as by the use of CA A A1A
397 398
399 400
det A The Inverse
1 11122221
CA A A1A
2 22211112
as
9.1.3
The inverse can be expressed as by the use of
CA A A1A
9 SPECIAL MATRICES
9 Special Matrices 9.1 Block matrices
Let Aij denote the ijth block of A. 9.1.1 Multiplication
21 22 21 22 21 11 22 21
21 12
22 22
A11 A12
AdetA22detC1detA11detC2 22
as
9.1.4
CA A A1A
2 22211112
AA1 C1 A1AC1 11121 11122
21
1 11122221
A21 A22 C1 A21 A1 C1 2 11 2
A1A1A C1A 11 11 12 2
A1A C1 22 21 1
A1 C1A 21 11 1
A1 12 22
A1 12 22
A1A1A
22 22 21 1
C1A
Block diagonal
For block diagonal matrices we have
A1101 A111 0
401 402
0A0A1 22 22
det
A11 0
0 A
detA11detA22
22
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 46
9.2 Discrete Fourier Transform Matrix, The 9 SPECIAL MATRICES
9.1.5 Schur complement
Regard the matrix
A11 A12 A21 A22
The Schur complement of block A11 of the matrix above is the matrix denoted
C2 in the text above
The Schur complement of block A22 of the matrix above is the matrix denoted
A A A1A 22 211112
C1 in the text above
Using the Schur complement, one can rewrite the inverse of a block matrix
A11 A21
A12 1 A22
A A A1A 11 122221
0A AA1A1 0 I AA1 11 122221 1222
I
A1AI 0 A10I
The Schur complement is useful when solving linear systems of the form
22 21 22
A11 A12 x1b1AAxb
21 22 2 2 which has the following equation for x1
A A A1A x b A A1b 11 12 22 21 1 1 12 22 2
When the appropriate inverses exists, this can be solved for x1 which can then be inserted in the equation for x2 to solve for x2.
9.2 Discrete Fourier Transform Matrix, The
The DFT matrix is an NN symmetric matrix WN , where the k, nth element is given by
Wknej2kn
403
N
Thus the discrete Fourier transform DFT can be expressed as
n0
1 N1
xn XkWkn. 405
TheDFTofthevectorxx0,x1,,xN1T canbewritteninmatrix form as
XWN x, 406 PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 47
N
N1
Xk xnWkn.
404 Likewise the inverse discrete Fourier transform IDFT can be expressed as
N
NN k0
9.3 Hermitian Matrices and skewHermitian 9 SPECIAL MATRICES
where XX0, X1,, xN1T . The IDFT is similarly given as xW1X.
407
408
409 410
411
Some properties of WN exist:
N
W11W NNN
WNWNNI WNWHN
WmN2 Wm NN
j 2 IfWN e N
,then23
Notice, the DFT matrix is a Vandermonde Matrix.
The following important relation between the circulant matrix and the dis
412
crete Fourier transform DFT exists
TC W1IWNtWN,
N
where tt0,t1, ,tn1T is the first row of TC.
9.3 Hermitian Matrices and skewHermitian
A matrix ACmn is called Hermitian if AHA
For real valued matrices, Hermitian and symmetric matrices are equivalent.
Note that
where B, C are hermitian, then
ABiC
A is HermitianxHAxR, xCn1 A is HermitianeigAR
413 414
AAH AAH B 2 , C 2i
9.3.1 SkewHermitian
A matrix A is called skewhermitian if
AAH
For real valued matrices, skewHermitian and skewsymmetric matrices are
equivalent.
A HermitianA skewHermitianA skewHermitian
iA is skewhermitian xH AyxH AH y, eigAi, R
x, y
415 416 417
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 48
9.4 Idempotent Matrices 9 SPECIAL MATRICES
9.4 Idempotent Matrices
A matrix A is idempotent if
Idempotent matrices A and B, have the following properties
An IA
AH
IAH If ABBA rankA AIA
AA
fsItAIAfsAfst
Note that AI is not necessarily idempotent. 9.4.1 Nilpotent
0 IAA0
A matrix A is nilpotent if
A nilpotent matrix has the following property:
A, forn1,2,3, is idempotent
is idempotent
is idempotent
418 419 420 421 422 423 424 425 426 427
428
429
AAA
ABTrA
is idempotent
A20
fsItAIfstAfs
9.4.2 Unipotent
A matrix A is unipotent if
AAI
A unipotent matrix has the following property:
fsItAIAfstIAfst2 9.5 Orthogonal matrices
If a square matrix Q is orthogonal, if and only if, QTQQQT I
and then Q has the following properties
Its eigenvalues are placed on the unit circle.
Its eigenvectors are unitary, i.e. have length one.
The inverse of an orthogonal matrix is orthogonal too.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 49
9.6 Positive Definite and Semidefinite Matrices
Basic properties for the orthogonal matrix Q
Q1QT QTQ
QQTI
QTQI detQ1
9 SPECIAL MATRICES
9.5.1 OrthoSym
A matrix Q which simultaneously is orthogonal and symmetric is called an
orthosym matrix 20. Hereby
QTQI QQT
430 431
432 433
The powers of an orthosym matrix are given by the following rule k 11k 11k1
22
9.5.2 OrthoSkew
A matrix which simultaneously is orthogonal and antisymmetric is called an orthoskew matrix 20. Hereby
Q2 I 2 Q1coskI1coskQ
QH QI
QQH
The powers of an orthoskew matrix are given by the following rule
434 435
436 437
k
ikik ikik
2 Ii 2 Q
coskIsinkQ 22
Q
9.5.3 Decomposition
A square matrix A can always be written as a sum of a symmetric A and an
antisymmetric matrix A
AA A
438
439 440
9.6 Positive Definite and Semidefinite Matrices
9.6.1 Definitions
A matrix A is positive definite if and only if
xTAx0, x0 A matrix A is positive semidefinite if and only if
xT Ax0, x
Note that if A is positive definite, then A is also positive semidefinite.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 50
9.6 Positive Definite and Semidefinite Matrices 9 SPECIAL MATRICES
9.6.2 Eigenvalues
The following holds with respect to the eigenvalues:
A pos. def.eigAAH 0
2 H 441
A pos. semidef.eigAA 0 2
9.6.3 Trace
The following holds with respect to the trace:
A pos. def.TrA0
A pos. semidef.TrA0
442
9.6.4 Inverse
If A is positive definite, then A is invertible and A1 is also positive definite.
9.6.5 Diagonal
If A is positive definite, then Aii0, i
9.6.6 Decomposition I
The matrix A is positive semidefinite of rank rthere exists a matrix B of
rank r such that ABBT
The matrix A is positive definitethere exists an invertible matrix B such
that ABBT
9.6.7 Decomposition II
Assume A is an nn positive semidefinite, then there exists an nr matrix B of rank r such that BT ABI.
9.6.8 Equation with zeros
Assume A is positive semidefinite, then XT AX0AX0
9.6.9 Rank of product
Assume A is positive definite, then rankBABT rankB
9.6.10 Positive definite property
If A is nn positive definite and B is rn of rank r, then BABT is positive
definite.
9.6.11 Outer Product
If X is nr, where nr and rankXn, then XXT is positive definite.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 51
9.7 Singleentry Matrix, The 9 SPECIAL MATRICES
9.6.12 Small pertubations
If A is positive definite and B is symmetric, then AtB is positive definite for
sufficiently small t.
9.6.13 Hadamard inequality
If A is a positive definite or semidefinite matrix, then
detA Aii i
See 15, pp.477
9.6.14 Hadamard product relation
Assume that PAAT and QBBT are semi positive definite matrices, it
then holds that
PQRRT
where the columns of R are constructed as follows: rij 1NAaibj , for i1, 2, , NA and j1, 2, , NB . The result is unpublished, but reported by Pavel Sakov and Craig Bishop.
9.7 Singleentry Matrix, The
9.7.1 Definition
The singleentry matrix JijRnn is defined as the matrix which is zero everywhere except in the entry i, j in which it is 1. In a 44 example one might have
0000
J230 0 1 0 443
0 0 0 00000
The singleentry matrix is very useful when working with derivatives of expres sions involving matrices.
9.7.2 Swap and Zeros AssumeAtobenmandJij tobemp
AJij0 0 Ai 0 444
i.e. an np matrix of zeros with the i.th column of A in place of the j.th column. Assume A to be nm and Jij to be pn
0.
0
JijAAj
0
..
0
445
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 52
9.7 Singleentry Matrix, The 9 SPECIAL MATRICES
i.e. an pm matrix of zeros with the j.th row of A in the placed of the i.th row.
9.7.3 Rewriting product of elements AkiBjlAeieTj Bkl
AikBljAT eieTj BT klAikBjlAT eieTj BklAkiBljAeieTj BT kl
AJij Bkl AT JijBT kl
AT JijBkl AJijBT kl
446 447 448 449
9.7.4 Properties of the Singleentry Matrix
If ij
If ij
JijJijJij JijJijTJij
JijJij0 JijJijTJii
JijT JijTJij JijT JijJij
JijT JijT0 JijT JijJjj
9.7.5 The Singleentry Matrix in Scalar Expressions Assume A is nm and J is mn, then
TrAJij TrJij AAT ij Assume A is nn, J is nm and B is mn, then
450
451 452 453
454 455
456
457
458
TrAJij B
TrAJj i B TrAJij Jij B
AT BT ij
BAij
diagAT BT ij
Assume A is nn, Jij is nm B is mn, then
xT AJij BxAT xxT BT ij
xT AJij Jij BxdiagAT xxT BT ij 9.7.6 Structure Matrices
The structure matrix is defined by
If A has no special structure then
If A is symmetric then
SijJij
Sij Jij Jji JijJij
A SijAij
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 53
9.8 Symmetric, SkewsymmetricAntisymmetric 9 SPECIAL MATRICES
9.8 Symmetric, SkewsymmetricAntisymmetric
9.8.1 Symmetric
The matrix A is said to be symmetric if
AAT 459 Symmetric matrices have many important properties, e.g. that their eigenvalues
are real and eigenvectors orthogonal.
9.8.2 SkewsymmetricAntisymmetric
The antisymmetric matrix is also known as the skew symmetric matrix. It has the following property from which it is defined
AAT 460 Hereby, it can be seen that the antisymmetric matrices always have a zero
diagonal. The nn antisymmetric matrices also have the following properties. detAT detA1n detA 461
detAdetA0, if n is odd 462 The eigenvalues of an antisymmetric matrix are placed on the imaginary axis
and the eigenvectors are unitary.
9.8.3 Decomposition
A square matrix A can always be written as a sum of a symmetric A and an
antisymmetric matrix A
Such a decomposition could e.g. be
AAT AAT
A22AA
9.9 Toeplitz Matrices
463
464
AA A
A Toeplitz matrix T is a matrix where the elements of each diagonal is the same. In the nn square case, it has the following structure:
t11 t12t1n t0 t1tn1t . t .
T21 1 . .. .. . .. ..
465
A Toeplitz matrix is persymmetric. If a matrix is persymmetric or orthosym metric, it means that the matrix is symmetric about its northeastsouthwest diagonal antidiagonal 12. Persymmetric matrices is a larger class of matri ces, since a persymmetric matrix not necessarily has a Toeplitz structure. There
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 54
t12. ..t1 tn1t21 t11 tn1t1 t0
9.10 Transition matrices 9 SPECIAL MATRICES
are some special cases of Toeplitz matrices. The symmetric Toeplitz matrix is given by:
9.9.1 Properties of Toeplitz Matrices
t0 t1tn1 t .
T 1
. . . t1
tn1t1 t0 t0 t1 tn1
t .TCn1
. .. .. . . . t1
t1tn1 t0 The upper triangular Toeplitz matrix:
t0 t1 tn1
The circular Toeplitz matrix:
. .. ..
466
467
468
469
0 . TU,
00 t0 and the lower triangular Toeplitz matrix:
t0 00t .
. t1
TL1. .. ..
0 tn1t1 t0
The Toeplitz matrix has some computational advantages. The addition of two Toeplitz matrices can be done with On flops, multiplication of two Toeplitz matrices can be done in On ln n flops. Toeplitz equation systems can be solved in On2 flops. The inverse of a positive definite Toeplitz matrix can be found in On2 flops too. The inverse of a Toeplitz matrix is persymmetric. The product of two lower triangular Toeplitz matrices is a Toeplitz matrix. More information on Toeplitz matrices and circulant matrices can be found in 13, 7.
9.10 Transition matrices
A square matrix P is a transition matrix, also known as stochastic matrix or probability matrix, if
0Pij 1, Pij 1 j
The transition matrix usually describes the probability of moving from state i to j in one step and is closely related to markov processes. Transition matrices
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 55
9.11 Units, Permutation and Shift 9
have the following properties
Probij in 1 stepPij Probij in 2 stepsP2 ij Probij in k stepsPk ij
If all rows are identicalPnP
P,is called invariant
whereis a socalled stationary probability vector, i.e., 0i1 and i i1.
9.11 Units, Permutation and Shift
9.11.1 Unit vector
Let eiRn1 be the ith unit vector, i.e. the vector which is zero in all entries
except the ith at which it is 1. 9.11.2 Rows and Columns
i.throwofAeTiA j.th column of AAej
9.11.3 Permutations
Let P be some permutation matrix, e.g.
475 476
477
478
479
e T2P1 0 0e2 e1 e3 eT1
001 eT3 For permutation matrices it holds that
and that
PPT I AP Ae2 Ae1 Ae3
e T2 APA eT1 A
e T3 A
example by
0 1 0
That is, the first is a matrix which has columns of A but in permuted sequence and the second is a matrix which has the rows of A but in the permuted se quence.
9.11.4 Translation, Shift or Lag Operators
Let L denote the lag or translation or shift operator defined on a 44
0000
L1 0 0 0 480
0 1 0 00010
SPECIAL MATRICES
470 471 472 473 474
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 56
9.12 Vandermonde Matrices 9 SPECIAL MATRICES
i.e. a matrix of zeros with one on the subdiagonal, Liji,j1. With some signal xt for t1,,N, the n.th power of the lag operator shifts the indices, i.e.
Lnxt0 for t1,..,n 481 xtn for tn1,,N
A related but slightly different matrix is the recurrent shifted operator defined on a 44 example by
0001
1000 0100 0010
L
i.e. a matrix defined by Liji,j1i,1j,dimL. On a signal x it has the
effect
Lnxt xt, t tn modN1 483
That is, L is like the shift operator L except that it wraps the signal as if it was periodic and shifted substituting the zeros with the rear end of the signal. Note that L is invertible and orthogonal, i.e.
L1 LT 9.12 Vandermonde Matrices
A Vandermonde matrix has the form 15
1v v2 vn1
484
485
111
1v v2 vn1 222
V . . . . . .
1 v v2vn1 nnn
The transpose of V is also said to a Vandermonde matrix. The determinant is given by 29
detVvi vj 486 ij
482
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 57
10 10.1
10.1.1
10 FUNCTIONS AND OPERATORS
Functions and Operators
Functions and Series
Finite Series
Xn IXI1 IXX2 Xn1 487
Taylor Expansion of Scalar Function This we can Taylor expand around x0
fxfx0gx0T xx01xx0T Hx0xx0 2
where
10.1.3
10.1.2
Consider some scalar function fx which takes the vector x as an argument.
Hx 0 Matrix Functions by Infinite Series
xx x0
488
fx gx 0
2fx T
x x0
As for analytical functions in one dimension, one can define a matrix function for square matrices X by an infinite series
fX cnXn 489 n0
assuming the limit exists and is finite. If the coefficients cn fulfils n cnxn, then one can prove that the above series exists and is finite, see 1. Thus for any analytical function fx there exists a corresponding matrix function fx constructed by the Taylor expansion. Using this one can prove the following results:
1 A matrix A is a zero of its own characteristic polynomium 1: pdetIA cnnpA0
n
2 If A is square it holds that 1
AUBU1fAUfBU1
3 A useful fact when using power series is that
An 0forn if A1
10.1.4 Identity and commutations
It holds for an analytical matrix function fX that
fABAAfBA
see B.1.2 for a proof.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 58
490
491
492
493
10.2 Kronecker and Vec Operator 10 FUNCTIONS AND OPERATORS
10.1.5 Exponential Matrix Function
In analogy to the ordinary scalar exponential function, one can define exponen tial and logarithmic matrix functions:
494 495 496 497
498 499
500
501 502
503 504
sinAcosA
5
eA 1AnIA1A2
n! 2
eA 11nAnIA1A2
eA eB
eA 1
d etA dt
dTretA dt
eA
AetA etAA,
TrAetA eTrA
tR
n0
n! 2
etA 1tAnItA1t2A2
n0
n! 2 1n1 11
n0
lnIA
Some of the properties of the exponential function are 1
n AnA2A23A3eAB if ABBA
n1
10.1.6
10.2
deteATrigonometric Functions
1n A2n1
1 3 1 2n1! A3!A 5!A
n0
1nA2n 1 2 1 4
n0
2n! I2!A 4!A
Kronecker and Vec Operator
The Kronecker Product mrnq matrix, AB defined as
A11B A12B A1nB
A21B A22B A2nBAB . .. .
Am1B Am2B AmnB
10.2.1
The Kronecker product of an mn matrix A and an rq matrix B, is an
505
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 59
10.2 Kronecker and Vec Operator 10 FUNCTIONS AND OPERATORS
The Kronecker product has the following properties see 19
ABCABABCAABBABTABCDAB1ABrankABTrAB
detABeigABeigAB
eigAB
ABAC
BA in general
ABC
ABAB
AT BT
ACBD A1B1
AB
rankArankB TrATrBTrABdetArankB detBrankA
eigBA if A, B are square eigAeigBT
if A, B are symmetric and square eigAeigB
506 507 508 509 510 511 512 513 514 515
516 517 518
519
Where i denotes the set of values i, that is, the values in no particular order or structure, and A denotes the diagonal matrix with the eigenvalues of A.
10.2.2 The Vec Operator
The vecoperator applied on a matrix A stacks the columns into a vector, i.e.
for a 22 matrix
A11 A12A A A
Properties of the vecoperator include see 19
vecAXBTrAT BvecABvecAaT XBXT c
See B.1.1 for a proof for Eq. 524.
BTAvecX
vecAT vecB
vecAvecB
vecA
vecXT BcaT vecX
520 521 522 523 524
21 22
A11
A21vecAA12
A22
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 60
10.3 Vector Norms 10 FUNCTIONS AND OPERATORS
10.3 Vector Norms
10.3.1 Examples
xFurther reading in e.g. 12, p. 52
10.4 Matrix Norms
10.4.1 Definitions
max xi i
A matrix norm is a mapping which fulfils
A0
A0A0 cAcA,
ABAB 10.4.2 Induced Norm or Operator Norm
cR
x1 xi i
x2xHx
xpxip i
1p
525 526
527 528
529 530 531 532
An induced norm is a matrix norm induced by a vector norm by the following
AsupAxx1 533
whereon the left side is the induced matrix norm, whileon the right side denotes the vector norm. For induced norms it holds that
I1
AxAx,
ABAB, 10.4.3 Examples
for all A, x for all A, B
534 535 536
537
538 539
540 541
A1A2
ApA
AF
maxAij
j
i
max eigAH Amax Axp 1p
xp 1
maxAij
i
j
2H Aij TrAA
ij
Frobenius
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 61
10.5 Rank
10 FUNCTIONS AND OPERATORS
542 Ky Fan 543
E. H. Rasmussen has in yet unpublished material derived and collected the following inequalities. They are collected in a table as below, assuming A is an mn, and drankA
AmaxAKF
max Aijij
singA1
where singA is the vector of singular values of the matrix A.
10.4.4 Inequalities
Amax A1 A A2 AF AKF Amax 11111 A1 m m m m m A n n n n n
A mn n m 1 1
2
AF mn n m d
1
AKF mnd nd which are to be read as, e.g.
md d
mA
d
A210.4.5 Condition Number
544
The 2norm of A equals maxeigAT A 12, p.57. For a symmetric, pos itive definite matrix, this reduces to maxeigA The condition number based on the 2norm thus reduces to
A2A12maxeigAmaxeigA1maxeigA. mineigA
10.5 Rank
10.5.1 Sylvesters Inequality If A is mn and B is nr, then
rankArankBnrankABminrankA, rankB 10.6 Integral Involving Dirac Delta Functions
Assuming A to be square, then
11
545
546
547
548
psxAsdsdetA pA x Assuming A to be underdetermined, i.e. tall, then
See 9.
psxAsds
1 detAT A
0
pAx ifxAAx elsewhere
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 62
10.7 Miscellaneous 10 FUNCTIONS AND OPERATORS
10.7 Miscellaneous
For any A it holds that
rankArankAT rankAAT rankAT A 549
It holds that
A is positive definiteB invertible, such that ABBT 550
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 63
A.1.1 Density
1
px22 exp
x2
A ONEDIMENSIONAL RESULTS
A Onedimensional Results A.1 Gaussian
22
551
552 553 554
555 556 557 558
A.1.2 Normalization
s2
e 22 ds22
A.1.3 Derivatives
dx aexp 4a
ax2bxc 2
e
ec2x c1xc0dx
b2 4ac
c2 4c c
c2
pxpxx2
2 0 4c2
exp 1
lnpxx
2
px 1 x2
px 2 1
lnpx 1 x22 1
A.1.4 Completing the Squares
c2x2 c1xc0 axb2 w
ac 2 b1 c 1 w1 c 21c 0
or
A.1.5 Moments
2 c2 4 c2 c2x2 c1xc01 x2 d
If the density is expressed by
1s2
2
or pxC expc2xc1x
px22 exp22 then the first few basic moments are
559
22c 12 1
dc 0c 21 4c2
2c2
2c2
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 64
A.2
One Dimensional Mixture of GaussiAansONEDIMENSIONAL RESULTS
x
c1 2c2
x222 x3323
1c12
4 4 2 2 4 x632c2 62c2
2c2 32c2
2c2
2c2 3 c21
c1 2c2 2
2c2
c1 4 c1 21 1 2
and the central moments are
x0 0
x22 12c2
x30 0 x434 312
2c2
A kind of pseudomoments unnormalized integrals can easily be derived as
2 n nc21n expc2x c1xx dxZxc exp 4c x
22
From the uncentralized moments one can derive other entities like
560
1 2c2
2c1 2c2 2
K k 1sk2
x2x2 x3x2x
x 4x 22
One Dimensional Mixture of Gaussians
Density and Normalization
2
22
24422
2 2c2 2
14 c 21 2c2
A.2
A.2.1
ps 22 exp 2 kk
k2
561
562
A.2.2 Moments
A useful fact of MoG, is that
xn kxnk k
where k denotes average with respect to the k.th component. We can calculate the first four moments from the densities
pxpx
11xk2 k22exp2 k2
kk
kCk exp ck2x2ck1x k
563 564
as
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 65
B PROOFS AND DETAILS
ck1 k k 2ck2
1 ck1 2kk 2ck22ck2
ck1 3 c2k1k k 2ck22 2ck2
x k k k
00
2 1
k k k k k 2ck20
k k k 2ck2
From the uncentralized moments one can derive other entities like
2 2
2
xkkkk x332 3
k k k k k
x4 k4 622 34 k 1
22 2
k k kk k k 2ck2 If all the gaussians are centered, i.e. k0 for all k, then
ck1 6ck1 3 2ck2 2ck2
x x2x3
0
x4k34 k312
x2x2 x3x2x x4x22
k,k kk 2kk2kk
k,k kk 3k2k3kk22kk
k,k kk 4k62kk23k4k22kk22k
A.2.3 Derivatives
Defining psk kNsk,k2 we get for a parameter j of the j.th compo
nent
that is,
Note that k must be constrained to be proper ratios. Defining the ratios by j erjkerk,weobtain
lnps jNsj,j2 lnjNsj,j2 N,2
565
566
j kkskk jln ps jNsj,j2 1
N,2 j kkskkj
ln ps jNsj,j2 sj
N,22 567
j kkskk j
ln ps jNsj,j2 1 sj2
N,2 2 1 568 j kkskkj j
lnps lnpsl rj l l rj
Proofs and Details
Misc Proofs
Proof of Equation 524
where
l llj j rj
569
B
B.1
B.1.1
The following proof is work of Florian Roemer. Note the the vectors and ma trices below can be complex and the notation XH is used for transpose and conjugated, while XT is only transpose of the complex matrix.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 66
B.1 Misc Proofs B PROOFS AND DETAILS
Define the row vector yaH XB and the column vector zXH c. Then aT XBXT cyzzT yT
Note that y can be rewritten as vecyT which is the same as vecconjyHvecaT conjXconjBH
where conj means complex conjugated. Applying the vec rule for linear forms Eq 520, we get
yBHaT vecconjXHvecXT Bconja
where we have also used the rule for transpose of Kronecker products. For yT this yields BTaH vecX. Similarly we can rewrite z which is the same as veczT veccT conjX. Applying again Eq 520, we get
zIcT vecconjX
where I is the identity matrix. For zT we obtain vecXIc. Finally, the
original expression is zT yT which now takes the form vecXHIcBT aHvecX
the final step is to apply the rule for products of Kronecker products and by that combine the Kronecker products. This gives
vecXHBT caHvecX which is the desired result.
B.1.2 Proof of Equation 493
For any analytical function fX of a matrix argument X, it holds that
f ABA cnABn A
B.1.3 Proof of Equation 91
Essentially we need to calculate
n0
cnABnA n0
cnABAn n0
Acn BAn n0
Af BA
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 67
B.1 Misc Proofs
Xnkl Xij
B PROOFS AND DETAILS
Xk,u1 Xu1,u2 Xun1,l Xij u1,,un1
k,iu1,jXu1,u2 Xun1,l Xk,u1 u1,iu2,jXun1,l
.
Xk,u1 Xu1,u2 un1,il,j n1
XrkiXn1rjl r0
n1
Xr Jij Xn1r kl
r0
Using the properties of the single entry matrix found in Sec. 9.7.4, the result
follows easily.
B.1.4 Details on Eq. 571
detXH AXTrXH AX1XH AX TrXH AX1XH AX
detXH AXTrAXXH AX1XHTrXH AX1XH AX
First, the derivative is found with respect to the real part of X
detXHAXRX
detXHAXTrAXXHAX1XH RX
TrXHAX1XHAX RX
detXH AXTrXH AX1XH AX
detXH AX
detXH AXTrXH AX1XH AXXH AX
detXHAXAXXHAX1 XHAX1XHAT 241, the derivative is found with respect to the imaginary part of X
idetXHAXidetXHAXTrAXXHAX1XH IX IX
TrXHAX1XHAX IX
detXHAXAXXHAX1 XHAX1XHAT Hence, derivative yields
detXHAX1 detXHAXi detXHAX X 2 RX IX
detXH AXXH AX1 XH AT
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 68
Through the calculations, 100 and 240 were used. In addition, by use of
B.1 Misc Proofs B PROOFS AND DETAILS
and the complex conjugate derivative yields
detXHAXX
1 detXHAXi detXHAX 2 RX IX
detXH AXAXXH AX1
Notice, for real X, A, the sum of 249 and 250 is reduced to 54.
Similar calculations yield
detXAXH1 detXAXHi detXAXH X 2 RX IX
and
detXAXHX
detXAXH AXH XAXH 1 T
1 detXAXHi detXAXH 2 RX IX
detXAXH XAXH 1 XA
570
571
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 69
REFERENCES REFERENCES
References
1 Karl Gustav Andersson and LarsChrister Boiers. Ordinaera differentialek vationer. Studenterlitteratur, 1992.
2 J orn Anemu ller, Terrence J. Sejnowski, and Scott Makeig. Complex inde pendent component analysis of frequencydomain electroencephalographic data. Neural Networks, 169:13111323, November 2003.
3 S. Barnet. Matrices. Methods and Applications. Oxford Applied Mathe matics and Computin Science Series. Clarendon Press, 1990.
4 Christopher Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
5 Robert J. Boik. Lecture notes: Statistics 550. Online, April 22 2002. Notes.
6 D. H. Brandwood. A complex gradient operator and its application in adaptive array theory. IEE Proceedings, 1301:1116, February 1983. PTS. F and H.
7 M. Brookes. Matrix Reference Manual, 2004. Website May 20, 2004.
8 Contradsen K., En introduktion til statistik, IMM lecture notes, 1984.
9 Mads Dyrholm. Some matrix results, 2004. Website August 23, 2004.
10 Nielsen F. A., Formula, Neuro Research Unit and Technical university of Denmark, 2002.
11 Gelman A. B., J. S. Carlin, H. S. Stern, D. B. Rubin, Bayesian Data Analysis, Chapman and HallCRC, 1995.
12 Gene H. Golub and Charles F. van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, 3rd edition, 1996.
13 Robert M. Gray. Toeplitz and circulant matrices: A review. Technical report, Information Systems Laboratory, Department of Electrical Engi neering,Stanford University, Stanford, California 94305, August 2002.
14 Simon Haykin. Adaptive Filter Theory. Prentice Hall, Upper Saddle River, NJ, 4th edition, 2002.
15 Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 1985.
16 Mardia K. V., J.T. Kent and J.M. Bibby, Multivariate Analysis, Academic Press Ltd., 1979.
17 Mathpages on Eigenvalue Problems and Matrix Invariants,
http:www.mathpages.comhomekmath128.htm
18 Carl D. Meyer. Generalized inversion of modified matrices. SIAM Journal of Applied Mathematics, 243:315323, May 1973.
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 70
REFERENCES REFERENCES
19 Thomas P. Minka. Old and new matrix algebra useful for statistics, De cember 2000. Notes.
20 Daniele Mortari OrthoSkew and OrthoSym Matrix Trigonometry John Lee Junkins Astrodynamics Symposium, AAS 03265, May 2003. Texas AM University, College Station, TX
21 L. Parra and C. Spence. Convolutive blind separation of nonstationary sources. In IEEE Transactions Speech and Audio Processing, pages 320 327, May 2000.
22 Kaare Brandt Petersen, Jiucang Hao, and TeWon Lee. Generative and filtering approaches for overcomplete representations. Neural Information ProcessingLetters and Reviews, vol. 81, 2005.
23 John G. Proakis and Dimitris G. Manolakis. Digital Signal Processing. PrenticeHall, 1996.
24 Laurent Schwartz. Cours dAnalyse, volume II. Hermann, Paris, 1967. As referenced in 14.
25 Shayle R. Searle. Matrix Algebra Useful for Statistics. John Wiley and Sons, 1982.
26 G. Seber and A. Lee. Linear Regression Analysis. John Wiley and Sons, 2002.
27 S. M. Selby. Standard Mathematical Tables. CRC Press, 1974.
28 Inna Stainvas. Matrix algebra in differential calculus. Neural Computing Research Group, Information Engeneering, Aston University, UK, August 2002. Notes.
29 P. P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice Hall, 1993.
30 Max Welling. The Kalman Filter. Lecture Note.
31 Wikipedia on minors: Minor linear algebra,
http:en.wikipedia.orgwikiMinorlinearalgebra
32 Zhaoshui He, Shengli Xie, et al, Convolutive blind source separation in frequency domain based on sparse representation, IEEE Transactions on Audio, Speech and Language Processing, vol.155:15511563, July 2007.
33 Karim T. AbouMoustafa On Derivatives of Eigenvalues and Eigenvectors of the Generalized Eigenvalue Problem. McGill Technical Report, October 2010.
34 Mohammad Emtiyaz Khan Updating Inverse of a Matrix When a Column is AddedRemoved. Emt CS,UBC February 27, 2008
PetersenPedersen, The Matrix Cookbook, Version: November 15, 2012, Page 71
Index
Antisymmetric, 54 Block matrix, 46
Chain rule, 15 Choleskydecomposition, 32 Cokurtosis, 34 Coskewness, 34
Condition number, 62 Cramers Rule, 29
Derivative of a complex matrix, 24 Derivative of a determinant, 8 Derivative of a trace, 12
Derivative of an inverse, 9 Derivative of symmetric matrix, 15 Derivatives of Toeplitz matrix, 16 Dirichlet distribution, 37
Eigenvalues, 30
Eigenvectors, 30
Exponential Matrix Function, 59
Gaussian, conditional, 40 Gaussian, entropy, 44
Gaussian, linear combination, 41 Gaussian, marginal, 40
Gaussian, product of densities, 42 Generalized inverse, 21
Hadamard inequality, 52 Hermitian, 48
Idempotent, 49
Kronecker product, 59
LDL decomposition, 33 LDMdecomposition, 33 Linear regression, 28 LU decomposition, 32 Lyapunov Equation, 30
MoorePenrose inverse, 21 Multinomial distribution, 37
Nilpotent, 49
Norm of a matrix, 61 Norm of a vector, 61
NormalInverse Gamma distribution, 37 NormalInverse Wishart distribution, 39
Orthogonal, 49
Power series of matrices, 58 Probability matrix, 55 Pseudoinverse, 21
Schur complement, 41, 47
Single entry matrix, 52
Singular Valued Decomposition SVD,
31 SkewHermitian, 48
Skewsymmetric, 54 Stochastic matrix, 55 Studentt, 37
Sylvesters Inequality, 62 Symmetric, 54
Taylor expansion, 58 Toeplitz matrix, 54 Transition matrix, 55 Trigonometric functions, 59
Unipotent, 49
Vandermonde matrix, 57 Vec operator, 59, 60
Wishart distribution, 38 Woodbury identity, 18
72
Reviews
There are no reviews yet.