In doing exercise 1.6.10 in Linear Algebra and Its Applications I was reminded of the general issue of multiplying block matrices, including diagonal block matrices. This also came up in exercise 1.4.24 as well, which I answered without necessarily fully understanding the problem. I therefore wanted to go back and try to prove various claims about block matrices, starting with the following:

Let and be an by matrix and a by matrix respectively. Suppose we partition into submatrices with row partitions and column partitions:

and we partition into submatrices with row partitions and column partitions:

Show that the by matrix product can be described as a block matrix with row partitions and column partitions

with submatrices computed as follows:

Proof: (Note that I’m sure a better proof of this exists, not to mention better notation. However this is the best I can come up with.)

We define a block matrix within as follows: First, we divide the rows of into partitions, where . The partition is done using a vector of nonnegative integer values defined as follows:

This can be interpreted as follows: is the first row in the partition, is the last row in the partition, and is the number of rows in the partition.

We then divide the columns of into partitions, where , using a vector of nonnegative integer values defined as follows:

The vector has a similar interpretation to the vector : is the first column in the partition, is the last column in the partition, and is the number of columns in the partition.

We then define times submatrices of , with a given submatrix defined in terms of the entries of as follows:

so that the entry of has the value

Note that when we have

so that

In other words, in this case has only a single submatrix, namely itself.

Also note that when and we have

so that

We then have

In other words, in this case has times submatrices, each containing a single entry of .

We next define a block matrix within as follows: First, we divide the rows of into partitions, using the same vector used to partition the columns of .

(Note: It is important that the rows of be partitioned in the exact same manner as the columns of , and not just that the number of partitions be the same. Otherwise we won’t be able to multiply submatrices of by the corresponding submatrices of , because the number of columns in a submatrix of won’t necessarily match the number of rows in the corresponding submatrix of . Unfortunately some references don’t make this clear, including at one point the block matrix article at Wikipedia.)

We then divide the columns of into partitions, where , using a vector of nonnegative integer values defined as follows:

Again, the vector has the same interpretation as the vectors and : is the first column in the partition, is the last column in the partition, and is the number of columns in the partition.

We then define times submatrices of , with a given submatrix defined in terms of the entries of as follows:

As in the case of , if then there is only a single submatrix of , namely itself, and if and then each submatrix contains a single entry of . The entry of has the value

Finally, consider the by matrix with entries

We can partition the rows of into partitions according to the same vector used to partition the rows of , and partition the columns of into partitions according to the same vector used to partition the columns of . We then have

with the entry of having the value

Now consider the following matrix:

for some and where and .

First, note that all the submatrices have rows and columns, and all the submatrices have rows and columns. (This follows from the definitions of , , and .)

Since the number of columns in the submatrices is the same as the number of rows in the submatrices , matrix multiplication is possible, and the resulting product matrices will have rows and columns. Since is the sum of the product matrices, it too will have rows and columns, the same number of rows and columns as (based on the partitioning of as described above).

Now consider the entry in . It will consist of the sum of all the entries in the product matrices. More formally,

Since the number of columns in the submatrices and the number of rows in the submatrices is , we then have

so that

Note that the first inner sum (when ) starts at (because by definition), that each subsequent inner sum starts at a value of one greater than the last, and that the last inner sum (when ) ends at (because ). We thus conclude that takes on all values from 1 to , and only those values, so that we can rewrite the above as

Since and have the same number of rows and columns (as discussed above), and since the entry of is equal to the entry of for all and , we conclude that , and from the definition of that

which was what we set out to prove.

Note that if then

so that the above equation reduces to .

On the other hand, if , , and then we have

so that the equation

is essentially equivalent to

UPDATE: Since originally writing this post I slightly changed the definition of a partitioning vector . First, I clarified that contained nonnegative integer entries. Second, I used the condition instead of for , because I think it better corresponds to the intuitive idea of how a partition would be defined.

Note that we can derive the old condition from the new one:

Suppose for and we have with . Let . We have when . (This is the new condition that we are assuming is true.) Now suppose for some . We have based on the new condition (assuming that so that has a value), so implies that .

By induction we then have for all for which has a value. Since and has a value (since does) we then have . Thus for implies that for any and we have if .

Also, since , the old condition that for implies the new condition that . Thus the old and new conditions are equivalent, and the choice between them in defining a partitioning vector is simply a matter of convenience.

Beautiful, clear explanation. In many books the block matrix multiplication is given without a proof or as an exercise. This is what I was searching for.

Thanks.

I’m glad you found this useful. I was *very* frustrated that I couldn’t find a good proof online of how to multiply block matrices. Similar experiences prompted me to write the posts on multiplying block diagonal matrices and partitioning matrices into block diagonal form.

Thank you! It was useful to see this technical detail at least somewhere. You saved me a lot of time.

This is awesome. Everything makes sense line by line. It just can’t be better.

Thank you very much. I was stuck on proving this.