Multiplying block matrices

In doing exercise 1.6.10 in Linear Algebra and Its Applications I was reminded of the general issue of multiplying block matrices, including diagonal block matrices. This also came up in exercise 1.4.24 as well, which I answered without necessarily fully understanding the problem. I therefore wanted to go back and try to prove various claims about block matrices, starting with the following:

Let A and B be an m by p matrix and a p by n matrix respectively. Suppose we partition A into submatrices with q row partitions and s column partitions:

A = \begin{bmatrix} A_{11}&A_{12}&\cdots&A_{1s} \\ A_{21}&A_{22}&\cdots&A_{2s} \\ \vdots&\vdots&\ddots&\vdots \\ A_{q1}&A_{q2}&\cdots&A_{qs} \end{bmatrix}

and we partition B into submatrices with s row partitions and r column partitions:

B = \begin{bmatrix} B_{11}&B_{12}&\cdots&B_{1r} \\ B_{21}&B_{22}&\cdots&B_{2r} \\ \vdots&\vdots&\ddots&\vdots \\ B_{s1}&B_{s2}&\cdots&B_{sr} \end{bmatrix}

Show that the m by n matrix product C = AB can be described as a block matrix with q row partitions and r column partitions

C = \begin{bmatrix} C_{11}&C_{12}&\cdots&C_{1r} \\ C_{21}&C_{22}&\cdots&C_{2r} \\ \vdots&\vdots&\ddots&\vdots \\ C_{q1}&C_{q2}&\cdots&C_{qr} \end{bmatrix}

with submatrices computed as follows:

C_{\alpha \beta} = \sum_{\gamma = 1}^{s} A_{\alpha \gamma} B_{\gamma \beta}

Proof: (Note that I’m sure a better proof of this exists, not to mention better notation. However this is the best I can come up with.)

We define a block matrix within A as follows: First, we divide the m rows of A into q partitions, where 1 \le q \le m. The partition is done using a vector u of nonnegative integer values defined as follows:

u = \begin{bmatrix} u_1&u_2&\cdots&u_{q+1} \end{bmatrix}

u_1 = 0, u_{q+1} = m

u_i < u_{i+1}, 1 \le i \le q

This can be interpreted as follows: u_i + 1 is the first row in the i^{th} partition, u_{i+1} is the last row in the i^{th} partition, and u_{i+1} - u_i \ge 1 is the number of rows in the i^{th} partition.

We then divide the p columns of A into s partitions, where 1 \le s \le p, using a vector w of nonnegative integer values defined as follows:

w = \begin{bmatrix} w_1&w_2&\cdots&w_{s+1} \end{bmatrix}

w_1 = 0, w_{s+1} = p

w_i < w_{i+1}, 1 \le i \le s

The vector w has a similar interpretation to the vector u: w_i + 1 is the first column in the i^{th} partition, w_{i+1} is the last column in the i^{th} partition, and w_{i+1} - w_i \ge 1 is the number of columns in the i^{th} partition.

We then define q times s submatrices of A, with a given submatrix defined in terms of the entries of A as follows:

A_{\alpha \beta} = \begin{bmatrix} a_{u_\alpha+1, w_\beta+1}&a_{u_\alpha+1,w_\beta + 2}&\cdots&a_{u_\alpha+1,w_{\beta+1}} \\ a_{u_\alpha+2, w_\beta+1}&a_{u_\alpha+2,w_\beta+2}&\cdots&a_{u_\alpha+2,w_{\beta+1}} \\ \vdots&\vdots&\ddots&\vdots \\ a_{u_{\alpha+1}, w_\beta+1}&a_{u_{\alpha+1},w_\beta+2}&\cdots&a_{u_{\alpha+1},w_{\beta+1}} \end{bmatrix}

so that the (i, j) entry of A_{\alpha \beta} has the value

(A_{\alpha \beta})_{ij} = a_{u_\alpha+i,w_\beta+j}

Note that when q = s = 1 we have

u = \begin{bmatrix} 0&m \end{bmatrix}, w = \begin{bmatrix} 0&p \end{bmatrix}

so that

A_{11} = \begin{bmatrix} a_{u_1+1,w_1+1}&a_{u_1+1,w_1 + 2}&\cdots&a_{u_1+1,w_2 } \\ a_{u_1 + 2, w_1+1}&a_{u_1 + 2,w_1 + 2}&\cdots&a_{u_1 + 2,w_2} \\ \vdots&\vdots&\ddots&\vdots \\ a_{u_2, w_1+1}&a_{u_2,w_1+2}&\cdots&a_{u_2,w_2} \end{bmatrix}

= \begin{bmatrix} a_{11}&a_{12}&\cdots&a_{1p} \\ a_{21}&a_{22}&\cdots&a_{2p} \\ \vdots&\vdots&\ddots&\vdots \\ a_{m1}&a_{m2}&\cdots&a_{mp} \end{bmatrix} = A

In other words, in this case A has only a single submatrix, namely A itself.

Also note that when q = m and s = p we have

u = \begin{bmatrix} 0&1&\cdots&m \end{bmatrix}, w = \begin{bmatrix} 0&1&\cdots&p \end{bmatrix}

so that

u_i = i-1, w_j = j-1

We then have

A_{\alpha \beta} = \begin{bmatrix} a_{u_\alpha+1, w_\beta+1}&\cdots&a_{u_\alpha+1,w_{\beta+1}} \\ \vdots&\vdots&\ddots&\vdots \\ a_{u_{\alpha+1}, w_\beta+1}&\cdots&a_{u_{\alpha+1},w_{\beta+1}} \end{bmatrix}

= \begin{bmatrix} a_{(\alpha-1)+1,(\beta-1)+1}&\cdots&a_{(\alpha-1)+1,(\beta+1)-1} \\ \vdots&\vdots&\ddots&\vdots \\ a_{(\alpha+1)-1,(\beta-1)+1}&\cdots&a_{(\alpha+1)-1,(\beta+1)-1} \end{bmatrix} = \begin{bmatrix} a_{\alpha \beta} \end{bmatrix}

In other words, in this case A has m times p submatrices, each containing a single entry of A.

We next define a block matrix within B as follows: First, we divide the p rows of B into s partitions, using the same vector w used to partition the columns of A.

(Note: It is important that the rows of B be partitioned in the exact same manner as the columns of A, and not just that the number of partitions s be the same. Otherwise we won’t be able to multiply submatrices of A by the corresponding submatrices of B, because the number of columns in a submatrix of A won’t necessarily match the number of rows in the corresponding submatrix of B. Unfortunately some references don’t make this clear, including at one point the block matrix article at Wikipedia.)

We then divide the n columns of B into r partitions, where 1 \le r \le n, using a vector v of nonnegative integer values defined as follows:

v = \begin{bmatrix} v_1&v_2&\cdots&v_{r+1} \end{bmatrix}

v_1 = 0, v_{r+1} = n

v_i < v_{i+1}, 1 \le i \le r

Again, the vector v has the same interpretation as the vectors u and w: v_i + 1 is the first column in the i^{th} partition, v_{i+1} is the last column in the i^{th} partition, and v_{i+1} - v_i \ge 1 is the number of columns in the i^{th} partition.

We then define s times r submatrices of B, with a given submatrix defined in terms of the entries of B as follows:

B_{\alpha \beta} = \begin{bmatrix} b_{w_\alpha+1 ,v_\beta+1}&b_{w_\alpha+1,v_\beta+2}&\cdots&b_{w_\alpha+1,v_{\beta+1}} \\ b_{w_\alpha+2, v_\beta+1}&b_{w_\alpha+2,v_\beta+2}&\cdots&b_{w_\alpha+2,v_{\beta+1}} \\ \vdots&\vdots&\ddots&\vdots \\ b_{w_{\alpha+1}, v_\beta+1}&b_{w_{\alpha+1},v_\beta+2}&\cdots&b_{w_{\alpha+1},v_{\beta+1}} \end{bmatrix}

As in the case of A, if s = r = 1 then there is only a single submatrix of B, namely B itself, and if s = p and r = n then each submatrix contains a single entry of B. The (i, j) entry of B_{\alpha \beta} has the value

(B_{\alpha \beta})_{ij} = b_{w_\alpha+i,v_\beta+j}

Finally, consider the m by n matrix C = AB with entries

c_{ij} = \sum_{k=1}^p a_{ik}b_{kj}

We can partition the rows of C into q partitions according to the same vector u used to partition the rows of A, and partition the columns of C into r partitions according to the same vector v used to partition the columns of B. We then have

C_{\alpha \beta} = \begin{bmatrix} c_{u_\alpha+1, v_\beta+1}&c_{u_\alpha+1,v_\beta + 2}&\cdots&c_{u_\alpha+1,v_{\beta+1}} \\ c_{u_\alpha+2, v_\beta+1}&c_{u_\alpha+2,v_\beta+2}&\cdots&c_{u_\alpha+2,v_{\beta+1}} \\ \vdots&\vdots&\ddots&\vdots \\ c_{u_{\alpha+1}, v_\beta+1}&c_{u_{\alpha+1},v_\beta+2}&\cdots&c_{u_{\alpha+1},v_{\beta+1}} \end{bmatrix}

with the (i, j) entry of C_{\alpha \beta} having the value

(C_{\alpha \beta})_{ij} = c_{u_\alpha+i,v_\beta+j}

Now consider the following matrix:

D_{\alpha \beta} = \sum_{\gamma = 1}^{s} A_{\alpha \gamma} B_{\gamma \beta}

for some \alpha and \beta where 1 \le \alpha \le q and 1 \le \beta \le r.

First, note that all the submatrices A_{\alpha \gamma} have u_{\alpha+1} - u_\alpha rows and w_{\gamma+1} - w_\gamma columns, and all the submatrices B_{\gamma \beta} have w_{\gamma+1} - w_\gamma rows and v_{\beta+1} - v_\beta columns. (This follows from the definitions of u, w, and v.)

Since the number of columns in the submatrices A_{\alpha \gamma} is the same as the number of rows in the submatrices B_{\gamma \beta}, matrix multiplication is possible, and the resulting product matrices will have u_{\alpha+1} - u_\alpha rows and v_{\beta+1} - v_\beta columns. Since D_{\alpha \beta} is the sum of the product matrices, it too will have u_{\alpha+1} - u_\alpha rows and v_{\beta+1} - v_\beta columns, the same number of rows and columns as C_{\alpha \beta} (based on the partitioning of C as described above).

Now consider the (i, j) entry in D_{\alpha \beta}. It will consist of the sum of all the (i, j) entries in the product matrices. More formally,

(D_{\alpha \beta})_{ij} = \sum_{\gamma = 1}^{s} (A_{\alpha \gamma} B_{\gamma \beta})_{ij}, 1 \le i \le u_{\alpha+1} - u_\alpha, 1 \le j \le v_{\beta+1} - v_\beta

Since the number of columns in the submatrices A_{\alpha \gamma} and the number of rows in the submatrices B_{\gamma \beta} is w_{\gamma+1} - w_\gamma, we then have

(A_{\alpha \gamma} B_{\gamma \beta})_{ij} = \sum_{l=1}^{w_{\gamma+1} - w_\gamma} (A_{\alpha \gamma})_{il} (B_{\gamma \beta})_{lj}

= \sum_{l=1}^{w_{\gamma+1} - w_\gamma} a_{u_\alpha+i,w_\gamma+l}b_{w_\gamma+l,v_\beta+j} = \sum_{k=w_\gamma+1}^{w_{\gamma+1}} a_{u_\alpha+i,k}b_{k,v_\beta+j}

so that

(D_{\alpha \beta})_{ij} = \sum_{\gamma = 1}^{s} \sum_{k=w_\gamma+1}^{w_{\gamma+1}} a_{u_\alpha+i,k}b_{k,v_\beta+j}

Note that the first inner sum (when \gamma = 1) starts at k = 1 (because w_1 = 0 by definition), that each subsequent inner sum starts at a value of k one greater than the last, and that the last inner sum (when \gamma = s) ends at k = p (because w_{s+1} = p). We thus conclude that k takes on all values from 1 to p, and only those values, so that we can rewrite the above as

(D_{\alpha \beta})_{ij} = \sum_{k=1}^{p} a_{u_\alpha+i,k}b_{k,v_\beta+j} = c_{u_\alpha+i,v_\beta+j} = (C_{\alpha \beta})_{ij}

Since D_{\alpha \beta} and C_{\alpha \beta} have the same number of rows and columns (as discussed above), and since the (i, j) entry of D_{\alpha \beta} is equal to the (i, j) entry of C_{\alpha \beta} for all i and j, we conclude that C_{\alpha \beta} = D_{\alpha \beta}, and from the definition of D that

C_{\alpha \beta} = \sum_{\gamma = 1}^{s} A_{\alpha \gamma} B_{\gamma \beta}

which was what we set out to prove.

Note that if q = s = r = 1 then

A_{11} = A, B_{11} = B, C_{11} = C

so that the above equation reduces to C = AB.

On the other hand, if q = m, s = p, and r = n then we have

A_{\alpha \beta} = \begin{bmatrix} a_{\alpha \beta} \end{bmatrix}

B_{\alpha \beta} = \begin{bmatrix} b_{\alpha \beta} \end{bmatrix}

C_{\alpha \beta} = \begin{bmatrix} c_{\alpha \beta} \end{bmatrix}

so that the equation

C_{\alpha \beta} = \sum_{\gamma = 1}^{s} A_{\alpha \gamma} B_{\gamma \beta}

is essentially equivalent to

c_{\alpha \beta} = \sum_{\gamma = 1}^{s} a_{\alpha \gamma} b_{\gamma \beta}

UPDATE: Since originally writing this post I slightly changed the definition of a partitioning vector v. First, I clarified that v contained nonnegative integer entries. Second, I used the condition v_i < v_{i+1} instead of v_i < v_j for i < j, because I think it better corresponds to the intuitive idea of how a partition would be defined.

Note that we can derive the old condition from the new one:

Suppose v_i < v_{i+1} for 1 \le i \le q and we have v_j with j > i. Let l = j-i \ge 1. We have v_i < v_{i+k} when k = 1. (This is the new condition that we are assuming is true.) Now suppose v_i < v_{i+k} for some k \ge 1. We have v_{i+k} < v_{(i+k)+1} based on the new condition (assuming that (i+k) \le q so that v_{(i+k)+1} has a value), so v_i < v_{i+k} implies that v_i < v_{i+(k+1)}.

By induction we then have v_i < v_{i+k} for all k \ge 1 for which v_{i+k} has a value. Since l = j - i \ge 1 and v_{i+l} has a value (since v_j = v_{i+l} does) we then have v_i < v_{i+l} = v_j. Thus v_i < v_{i+1} for 1 \le i \le q implies that for any v_i and v_j we have v_i < v_j if j > i.

Also, since i+1 > i, the old condition that v_i < v_j for j > i implies the new condition that v_i < v_{i+1}. Thus the old and new conditions are equivalent, and the choice between them in defining a partitioning vector is simply a matter of convenience.

 Buy me a snack to sponsor more posts like this!

This entry was posted in linear algebra. Bookmark the permalink.

5 Responses to Multiplying block matrices

  1. Alex says:

    Beautiful, clear explanation. In many books the block matrix multiplication is given without a proof or as an exercise. This is what I was searching for.
    Thanks.

    • hecker says:

      I’m glad you found this useful. I was *very* frustrated that I couldn’t find a good proof online of how to multiply block matrices. Similar experiences prompted me to write the posts on multiplying block diagonal matrices and partitioning matrices into block diagonal form.

  2. Vadim says:

    Thank you! It was useful to see this technical detail at least somewhere. You saved me a lot of time.

  3. Martin says:

    This is awesome. Everything makes sense line by line. It just can’t be better.

  4. Harsh Bhardwaj says:

    Thank you very much. I was stuck on proving this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s