In doing exercise 1.6.10 in Linear Algebra and Its Applications I was reminded of the general issue of multiplying block matrices, including diagonal block matrices. This also came up in exercise 1.4.24 as well, which I answered without necessarily fully understanding the problem. I therefore wanted to go back and try to prove various claims about block matrices, starting with the following:
Let and
be an
by
matrix and a
by
matrix respectively. Suppose we partition
into submatrices with
row partitions and
column partitions:
and we partition into submatrices with
row partitions and
column partitions:
Show that the by
matrix product
can be described as a block matrix with
row partitions and
column partitions
with submatrices computed as follows:
Proof: (Note that I’m sure a better proof of this exists, not to mention better notation. However this is the best I can come up with.)
We define a block matrix within as follows: First, we divide the
rows of
into
partitions, where
. The partition is done using a vector
of nonnegative integer values defined as follows:
This can be interpreted as follows: is the first row in the
partition,
is the last row in the
partition, and
is the number of rows in the
partition.
We then divide the columns of
into
partitions, where
, using a vector
of nonnegative integer values defined as follows:
The vector has a similar interpretation to the vector
:
is the first column in the
partition,
is the last column in the
partition, and
is the number of columns in the
partition.
We then define times
submatrices of
, with a given submatrix defined in terms of the entries of
as follows:
so that the entry of
has the value
Note that when we have
so that
In other words, in this case has only a single submatrix, namely
itself.
Also note that when and
we have
so that
We then have
In other words, in this case has
times
submatrices, each containing a single entry of
.
We next define a block matrix within as follows: First, we divide the
rows of
into
partitions, using the same vector
used to partition the columns of
.
(Note: It is important that the rows of be partitioned in the exact same manner as the columns of
, and not just that the number of partitions
be the same. Otherwise we won’t be able to multiply submatrices of
by the corresponding submatrices of
, because the number of columns in a submatrix of
won’t necessarily match the number of rows in the corresponding submatrix of
. Unfortunately some references don’t make this clear, including at one point the block matrix article at Wikipedia.)
We then divide the columns of
into
partitions, where
, using a vector
of nonnegative integer values defined as follows:
Again, the vector has the same interpretation as the vectors
and
:
is the first column in the
partition,
is the last column in the
partition, and
is the number of columns in the
partition.
We then define times
submatrices of
, with a given submatrix defined in terms of the entries of
as follows:
As in the case of , if
then there is only a single submatrix of
, namely
itself, and if
and
then each submatrix contains a single entry of
. The
entry of
has the value
Finally, consider the by
matrix
with entries
We can partition the rows of into
partitions according to the same vector
used to partition the rows of
, and partition the columns of
into
partitions according to the same vector
used to partition the columns of
. We then have
with the entry of
having the value
Now consider the following matrix:
for some and
where
and
.
First, note that all the submatrices have
rows and
columns, and all the submatrices
have
rows and
columns. (This follows from the definitions of
,
, and
.)
Since the number of columns in the submatrices is the same as the number of rows in the submatrices
, matrix multiplication is possible, and the resulting product matrices will have
rows and
columns. Since
is the sum of the product matrices, it too will have
rows and
columns, the same number of rows and columns as
(based on the partitioning of
as described above).
Now consider the entry in
. It will consist of the sum of all the
entries in the product matrices. More formally,
Since the number of columns in the submatrices and the number of rows in the submatrices
is
, we then have
so that
Note that the first inner sum (when ) starts at
(because
by definition), that each subsequent inner sum starts at a value of
one greater than the last, and that the last inner sum (when
) ends at
(because
). We thus conclude that
takes on all values from 1 to
, and only those values, so that we can rewrite the above as
Since and
have the same number of rows and columns (as discussed above), and since the
entry of
is equal to the
entry of
for all
and
, we conclude that
, and from the definition of
that
which was what we set out to prove.
Note that if then
so that the above equation reduces to .
On the other hand, if ,
, and
then we have
so that the equation
is essentially equivalent to
UPDATE: Since originally writing this post I slightly changed the definition of a partitioning vector . First, I clarified that
contained nonnegative integer entries. Second, I used the condition
instead of
for
, because I think it better corresponds to the intuitive idea of how a partition would be defined.
Note that we can derive the old condition from the new one:
Suppose for
and we have
with
. Let
. We have
when
. (This is the new condition that we are assuming is true.) Now suppose
for some
. We have
based on the new condition (assuming that
so that
has a value), so
implies that
.
By induction we then have for all
for which
has a value. Since
and
has a value (since
does) we then have
. Thus
for
implies that for any
and
we have
if
.
Also, since , the old condition that
for
implies the new condition that
. Thus the old and new conditions are equivalent, and the choice between them in defining a partitioning vector is simply a matter of convenience.
Beautiful, clear explanation. In many books the block matrix multiplication is given without a proof or as an exercise. This is what I was searching for.
Thanks.
I’m glad you found this useful. I was *very* frustrated that I couldn’t find a good proof online of how to multiply block matrices. Similar experiences prompted me to write the posts on multiplying block diagonal matrices and partitioning matrices into block diagonal form.
Thank you! It was useful to see this technical detail at least somewhere. You saved me a lot of time.
This is awesome. Everything makes sense line by line. It just can’t be better.
Thank you very much. I was stuck on proving this.