## Multiplying block matrices

In doing exercise 1.6.10 in Linear Algebra and Its Applications I was reminded of the general issue of multiplying block matrices, including diagonal block matrices. This also came up in exercise 1.4.24 as well, which I answered without necessarily fully understanding the problem. I therefore wanted to go back and try to prove various claims about block matrices, starting with the following:

Let $A$ and $B$ be an $m$ by $p$ matrix and a $p$ by $n$ matrix respectively. Suppose we partition $A$ into submatrices with $q$ row partitions and $s$ column partitions:

$A = \begin{bmatrix} A_{11}&A_{12}&\cdots&A_{1s} \\ A_{21}&A_{22}&\cdots&A_{2s} \\ \vdots&\vdots&\ddots&\vdots \\ A_{q1}&A_{q2}&\cdots&A_{qs} \end{bmatrix}$

and we partition $B$ into submatrices with $s$ row partitions and $r$ column partitions:

$B = \begin{bmatrix} B_{11}&B_{12}&\cdots&B_{1r} \\ B_{21}&B_{22}&\cdots&B_{2r} \\ \vdots&\vdots&\ddots&\vdots \\ B_{s1}&B_{s2}&\cdots&B_{sr} \end{bmatrix}$

Show that the $m$ by $n$ matrix product $C = AB$ can be described as a block matrix with $q$ row partitions and $r$ column partitions

$C = \begin{bmatrix} C_{11}&C_{12}&\cdots&C_{1r} \\ C_{21}&C_{22}&\cdots&C_{2r} \\ \vdots&\vdots&\ddots&\vdots \\ C_{q1}&C_{q2}&\cdots&C_{qr} \end{bmatrix}$

with submatrices computed as follows:

$C_{\alpha \beta} = \sum_{\gamma = 1}^{s} A_{\alpha \gamma} B_{\gamma \beta}$

Proof: (Note that I’m sure a better proof of this exists, not to mention better notation. However this is the best I can come up with.)

We define a block matrix within $A$ as follows: First, we divide the $m$ rows of $A$ into $q$ partitions, where $1 \le q \le m$. The partition is done using a vector $u$ of nonnegative integer values defined as follows:

$u = \begin{bmatrix} u_1&u_2&\cdots&u_{q+1} \end{bmatrix}$

$u_1 = 0, u_{q+1} = m$

$u_i < u_{i+1}, 1 \le i \le q$

This can be interpreted as follows: $u_i + 1$ is the first row in the $i^{th}$ partition, $u_{i+1}$ is the last row in the $i^{th}$ partition, and $u_{i+1} - u_i \ge 1$ is the number of rows in the $i^{th}$ partition.

We then divide the $p$ columns of $A$ into $s$ partitions, where $1 \le s \le p$, using a vector $w$ of nonnegative integer values defined as follows:

$w = \begin{bmatrix} w_1&w_2&\cdots&w_{s+1} \end{bmatrix}$

$w_1 = 0, w_{s+1} = p$

$w_i < w_{i+1}, 1 \le i \le s$

The vector $w$ has a similar interpretation to the vector $u$: $w_i + 1$ is the first column in the $i^{th}$ partition, $w_{i+1}$ is the last column in the $i^{th}$ partition, and $w_{i+1} - w_i \ge 1$ is the number of columns in the $i^{th}$ partition.

We then define $q$ times $s$ submatrices of $A$, with a given submatrix defined in terms of the entries of $A$ as follows:

$A_{\alpha \beta} = \begin{bmatrix} a_{u_\alpha+1, w_\beta+1}&a_{u_\alpha+1,w_\beta + 2}&\cdots&a_{u_\alpha+1,w_{\beta+1}} \\ a_{u_\alpha+2, w_\beta+1}&a_{u_\alpha+2,w_\beta+2}&\cdots&a_{u_\alpha+2,w_{\beta+1}} \\ \vdots&\vdots&\ddots&\vdots \\ a_{u_{\alpha+1}, w_\beta+1}&a_{u_{\alpha+1},w_\beta+2}&\cdots&a_{u_{\alpha+1},w_{\beta+1}} \end{bmatrix}$

so that the $(i, j)$ entry of $A_{\alpha \beta}$ has the value

$(A_{\alpha \beta})_{ij} = a_{u_\alpha+i,w_\beta+j}$

Note that when $q = s = 1$ we have

$u = \begin{bmatrix} 0&m \end{bmatrix}, w = \begin{bmatrix} 0&p \end{bmatrix}$

so that

$A_{11} = \begin{bmatrix} a_{u_1+1,w_1+1}&a_{u_1+1,w_1 + 2}&\cdots&a_{u_1+1,w_2 } \\ a_{u_1 + 2, w_1+1}&a_{u_1 + 2,w_1 + 2}&\cdots&a_{u_1 + 2,w_2} \\ \vdots&\vdots&\ddots&\vdots \\ a_{u_2, w_1+1}&a_{u_2,w_1+2}&\cdots&a_{u_2,w_2} \end{bmatrix}$

$= \begin{bmatrix} a_{11}&a_{12}&\cdots&a_{1p} \\ a_{21}&a_{22}&\cdots&a_{2p} \\ \vdots&\vdots&\ddots&\vdots \\ a_{m1}&a_{m2}&\cdots&a_{mp} \end{bmatrix} = A$

In other words, in this case $A$ has only a single submatrix, namely $A$ itself.

Also note that when $q = m$ and $s = p$ we have

$u = \begin{bmatrix} 0&1&\cdots&m \end{bmatrix}, w = \begin{bmatrix} 0&1&\cdots&p \end{bmatrix}$

so that

$u_i = i-1, w_j = j-1$

We then have

$A_{\alpha \beta} = \begin{bmatrix} a_{u_\alpha+1, w_\beta+1}&\cdots&a_{u_\alpha+1,w_{\beta+1}} \\ \vdots&\vdots&\ddots&\vdots \\ a_{u_{\alpha+1}, w_\beta+1}&\cdots&a_{u_{\alpha+1},w_{\beta+1}} \end{bmatrix}$

$= \begin{bmatrix} a_{(\alpha-1)+1,(\beta-1)+1}&\cdots&a_{(\alpha-1)+1,(\beta+1)-1} \\ \vdots&\vdots&\ddots&\vdots \\ a_{(\alpha+1)-1,(\beta-1)+1}&\cdots&a_{(\alpha+1)-1,(\beta+1)-1} \end{bmatrix} = \begin{bmatrix} a_{\alpha \beta} \end{bmatrix}$

In other words, in this case $A$ has $m$ times $p$ submatrices, each containing a single entry of $A$.

We next define a block matrix within $B$ as follows: First, we divide the $p$ rows of $B$ into $s$ partitions, using the same vector $w$ used to partition the columns of $A$.

(Note: It is important that the rows of $B$ be partitioned in the exact same manner as the columns of $A$, and not just that the number of partitions $s$ be the same. Otherwise we won’t be able to multiply submatrices of $A$ by the corresponding submatrices of $B$, because the number of columns in a submatrix of $A$ won’t necessarily match the number of rows in the corresponding submatrix of $B$. Unfortunately some references don’t make this clear, including at one point the block matrix article at Wikipedia.)

We then divide the $n$ columns of $B$ into $r$ partitions, where $1 \le r \le n$, using a vector $v$ of nonnegative integer values defined as follows:

$v = \begin{bmatrix} v_1&v_2&\cdots&v_{r+1} \end{bmatrix}$

$v_1 = 0, v_{r+1} = n$

$v_i < v_{i+1}, 1 \le i \le r$

Again, the vector $v$ has the same interpretation as the vectors $u$ and $w$: $v_i + 1$ is the first column in the $i^{th}$ partition, $v_{i+1}$ is the last column in the $i^{th}$ partition, and $v_{i+1} - v_i \ge 1$ is the number of columns in the $i^{th}$ partition.

We then define $s$ times $r$ submatrices of $B$, with a given submatrix defined in terms of the entries of $B$ as follows:

$B_{\alpha \beta} = \begin{bmatrix} b_{w_\alpha+1 ,v_\beta+1}&b_{w_\alpha+1,v_\beta+2}&\cdots&b_{w_\alpha+1,v_{\beta+1}} \\ b_{w_\alpha+2, v_\beta+1}&b_{w_\alpha+2,v_\beta+2}&\cdots&b_{w_\alpha+2,v_{\beta+1}} \\ \vdots&\vdots&\ddots&\vdots \\ b_{w_{\alpha+1}, v_\beta+1}&b_{w_{\alpha+1},v_\beta+2}&\cdots&b_{w_{\alpha+1},v_{\beta+1}} \end{bmatrix}$

As in the case of $A$, if $s = r = 1$ then there is only a single submatrix of $B$, namely $B$ itself, and if $s = p$ and $r = n$ then each submatrix contains a single entry of $B$. The $(i, j)$ entry of $B_{\alpha \beta}$ has the value

$(B_{\alpha \beta})_{ij} = b_{w_\alpha+i,v_\beta+j}$

Finally, consider the $m$ by $n$ matrix $C = AB$ with entries

$c_{ij} = \sum_{k=1}^p a_{ik}b_{kj}$

We can partition the rows of $C$ into $q$ partitions according to the same vector $u$ used to partition the rows of $A$, and partition the columns of $C$ into $r$ partitions according to the same vector $v$ used to partition the columns of $B$. We then have

$C_{\alpha \beta} = \begin{bmatrix} c_{u_\alpha+1, v_\beta+1}&c_{u_\alpha+1,v_\beta + 2}&\cdots&c_{u_\alpha+1,v_{\beta+1}} \\ c_{u_\alpha+2, v_\beta+1}&c_{u_\alpha+2,v_\beta+2}&\cdots&c_{u_\alpha+2,v_{\beta+1}} \\ \vdots&\vdots&\ddots&\vdots \\ c_{u_{\alpha+1}, v_\beta+1}&c_{u_{\alpha+1},v_\beta+2}&\cdots&c_{u_{\alpha+1},v_{\beta+1}} \end{bmatrix}$

with the $(i, j)$ entry of $C_{\alpha \beta}$ having the value

$(C_{\alpha \beta})_{ij} = c_{u_\alpha+i,v_\beta+j}$

Now consider the following matrix:

$D_{\alpha \beta} = \sum_{\gamma = 1}^{s} A_{\alpha \gamma} B_{\gamma \beta}$

for some $\alpha$ and $\beta$ where $1 \le \alpha \le q$ and $1 \le \beta \le r$.

First, note that all the submatrices $A_{\alpha \gamma}$ have $u_{\alpha+1} - u_\alpha$ rows and $w_{\gamma+1} - w_\gamma$ columns, and all the submatrices $B_{\gamma \beta}$ have $w_{\gamma+1} - w_\gamma$ rows and $v_{\beta+1} - v_\beta$ columns. (This follows from the definitions of $u$, $w$, and $v$.)

Since the number of columns in the submatrices $A_{\alpha \gamma}$ is the same as the number of rows in the submatrices $B_{\gamma \beta}$, matrix multiplication is possible, and the resulting product matrices will have $u_{\alpha+1} - u_\alpha$ rows and $v_{\beta+1} - v_\beta$ columns. Since $D_{\alpha \beta}$ is the sum of the product matrices, it too will have $u_{\alpha+1} - u_\alpha$ rows and $v_{\beta+1} - v_\beta$ columns, the same number of rows and columns as $C_{\alpha \beta}$ (based on the partitioning of $C$ as described above).

Now consider the $(i, j)$ entry in $D_{\alpha \beta}$. It will consist of the sum of all the $(i, j)$ entries in the product matrices. More formally,

$(D_{\alpha \beta})_{ij} = \sum_{\gamma = 1}^{s} (A_{\alpha \gamma} B_{\gamma \beta})_{ij}, 1 \le i \le u_{\alpha+1} - u_\alpha, 1 \le j \le v_{\beta+1} - v_\beta$

Since the number of columns in the submatrices $A_{\alpha \gamma}$ and the number of rows in the submatrices $B_{\gamma \beta}$ is $w_{\gamma+1} - w_\gamma$, we then have

$(A_{\alpha \gamma} B_{\gamma \beta})_{ij} = \sum_{l=1}^{w_{\gamma+1} - w_\gamma} (A_{\alpha \gamma})_{il} (B_{\gamma \beta})_{lj}$

$= \sum_{l=1}^{w_{\gamma+1} - w_\gamma} a_{u_\alpha+i,w_\gamma+l}b_{w_\gamma+l,v_\beta+j} = \sum_{k=w_\gamma+1}^{w_{\gamma+1}} a_{u_\alpha+i,k}b_{k,v_\beta+j}$

so that

$(D_{\alpha \beta})_{ij} = \sum_{\gamma = 1}^{s} \sum_{k=w_\gamma+1}^{w_{\gamma+1}} a_{u_\alpha+i,k}b_{k,v_\beta+j}$

Note that the first inner sum (when $\gamma = 1$) starts at $k = 1$ (because $w_1 = 0$ by definition), that each subsequent inner sum starts at a value of $k$ one greater than the last, and that the last inner sum (when $\gamma = s$) ends at $k = p$ (because $w_{s+1} = p$). We thus conclude that $k$ takes on all values from 1 to $p$, and only those values, so that we can rewrite the above as

$(D_{\alpha \beta})_{ij} = \sum_{k=1}^{p} a_{u_\alpha+i,k}b_{k,v_\beta+j} = c_{u_\alpha+i,v_\beta+j} = (C_{\alpha \beta})_{ij}$

Since $D_{\alpha \beta}$ and $C_{\alpha \beta}$ have the same number of rows and columns (as discussed above), and since the $(i, j)$ entry of $D_{\alpha \beta}$ is equal to the $(i, j)$ entry of $C_{\alpha \beta}$ for all $i$ and $j$, we conclude that $C_{\alpha \beta} = D_{\alpha \beta}$, and from the definition of $D$ that

$C_{\alpha \beta} = \sum_{\gamma = 1}^{s} A_{\alpha \gamma} B_{\gamma \beta}$

which was what we set out to prove.

Note that if $q = s = r = 1$ then

$A_{11} = A, B_{11} = B, C_{11} = C$

so that the above equation reduces to $C = AB$.

On the other hand, if $q = m$, $s = p$, and $r = n$ then we have

$A_{\alpha \beta} = \begin{bmatrix} a_{\alpha \beta} \end{bmatrix}$

$B_{\alpha \beta} = \begin{bmatrix} b_{\alpha \beta} \end{bmatrix}$

$C_{\alpha \beta} = \begin{bmatrix} c_{\alpha \beta} \end{bmatrix}$

so that the equation

$C_{\alpha \beta} = \sum_{\gamma = 1}^{s} A_{\alpha \gamma} B_{\gamma \beta}$

is essentially equivalent to

$c_{\alpha \beta} = \sum_{\gamma = 1}^{s} a_{\alpha \gamma} b_{\gamma \beta}$

UPDATE: Since originally writing this post I slightly changed the definition of a partitioning vector $v$. First, I clarified that $v$ contained nonnegative integer entries. Second, I used the condition $v_i < v_{i+1}$ instead of $v_i < v_j$ for $i < j$, because I think it better corresponds to the intuitive idea of how a partition would be defined.

Note that we can derive the old condition from the new one:

Suppose $v_i < v_{i+1}$ for $1 \le i \le q$ and we have $v_j$ with $j > i$. Let $l = j-i \ge 1$. We have $v_i < v_{i+k}$ when $k = 1$. (This is the new condition that we are assuming is true.) Now suppose $v_i < v_{i+k}$ for some $k \ge 1$. We have $v_{i+k} < v_{(i+k)+1}$ based on the new condition (assuming that $(i+k) \le q$ so that $v_{(i+k)+1}$ has a value), so $v_i < v_{i+k}$ implies that $v_i < v_{i+(k+1)}$.

By induction we then have $v_i < v_{i+k}$ for all $k \ge 1$ for which $v_{i+k}$ has a value. Since $l = j - i \ge 1$ and $v_{i+l}$ has a value (since $v_j = v_{i+l}$ does) we then have $v_i < v_{i+l} = v_j$. Thus $v_i < v_{i+1}$ for $1 \le i \le q$ implies that for any $v_i$ and $v_j$ we have $v_i < v_j$ if $j > i$.

Also, since $i+1 > i$, the old condition that $v_i < v_j$ for $j > i$ implies the new condition that $v_i < v_{i+1}$. Thus the old and new conditions are equivalent, and the choice between them in defining a partitioning vector is simply a matter of convenience.

This entry was posted in linear algebra. Bookmark the permalink.

### 5 Responses to Multiplying block matrices

1. Alex says:

Beautiful, clear explanation. In many books the block matrix multiplication is given without a proof or as an exercise. This is what I was searching for.
Thanks.

• hecker says:

I’m glad you found this useful. I was *very* frustrated that I couldn’t find a good proof online of how to multiply block matrices. Similar experiences prompted me to write the posts on multiplying block diagonal matrices and partitioning matrices into block diagonal form.