The composition of linear transformations is a linear transformation

In doing the answers to exercise 2.6.14 in Gilbert Strang’s Linear Algebra and Its Applications, Third Edition I noticed one of the downsides of the book: While Strang’s focus on practical applications is usually welcome, sometimes in his desire to avoid abstract concepts and arguments he hand waves his way through important points and leaves the reader somewhat confused. At least, I was confused in his discussion of rule 2V on page 123, in which he doesn’t really provide a lot of background (let alone a real proof) for why the composition of two linear transformations should itself be a linear transformation.

As I’ve done before in a couple of cases, I thought it was worth stopping and reviewing the basic definition and consequent properties of linear transformations, ignoring the connection with matrices and focusing just on the abstract concept.

1) Definition of a linear transformation. First, a linear transformation is a function from one vector space to another vector space (which may be itself). So if we have two vector spaces V and W, a linear transformation A takes a vector v in V and produces a vector w in W. In other words w = A(v) using function notation. (For clarity I’ll continue to use function notation for the rest of this post.)

What makes a linear transformation linear is that it has the property that

A(ax+by) = aA(x) + bA(y)

for any x and y in V and any scalars a and b that could be used to multiply vectors in V and W.

2) Alternate definition of a linear transformation. Note that the property above is often expressed instead in the form of two simpler properties:

A(ax) = aA(x)

A(x+y) = A(x) + A(y)

for any x and y in V and any scalars a and b that could be used to multiply vectors in V and W.

This alternate definition is equivalent to the definition in (1) above, as shown by the following argument:

Suppose we have A(ax+by). Since x and y are vectors in V and a and b are scalars, by the definition of a vector space we know that ax and by are also vectors in V. (Vector spaces are closed under scalar multiplication.) By the alternate definition we thus have A(ax+by) = A(ax) + A(by). By the same definition we also have A(ax) = aA(x) and A(by) = bA(y) so that A(ax) + A(by) = aA(x) + bA(y). Combining the equations we see that A(ax+by) = aA(x) + bA(y).

Note also that the original property A(ax+by) = aA(x) + bA(y) reduces to A(ax) = aA(x) if b = 0 and reduces to A(x+y) = A(x)+A(y) if a = b= 1.

3) Applying a linear transformation to an arbitrary linear combination of vectors. Suppose we have a linear transformation A from V to W, an arbitrary set of vectors v_1, v_2, through v_m in V and an arbitrary set of scalars c_1, c_2, through c_m. Then we have

A(\sum_{i=1}^{m} c_iv_i) = \sum_{i=1}^{m} c_iA(v_i)

This is easily proved using induction: First, for m = 2 from the definition in (1) above we have

A(\sum_{i=1}^{2} c_iv_i) = A(c_1v_1 + c_2v_2)

= c_1A(v_1) + c_2A(v_2) = \sum_{i=1}^{2} c_iA(v_i)

Now suppose for some k \ge 2 we have

A(\sum_{i=1}^{k} c_iv_i) = \sum_{i=1}^{k} c_iA(v_i)

Then for k+1 we have

A(\sum_{i=1}^{k+1} c_iv_i) = A(\sum_{i=1}^{k} c_iv_i + c_{k+1}v_{k+1})

= A(\sum_{i=1}^{k} c_iv_i) + c_{k+1}A(v_{k+1})

= \sum_{i=1}^{k} c_iA(v_i) + c_{k+1}A(v_{k+1})

= \sum_{i=1}^{k+1} c_iA(v_i)

Since the proposition is true for m=2 and is also true for k+1 for any k \ge 2, it is true for all m \ge 2.

4) The composition of two linear transformations. Suppose A is a linear transformation from a vector space V to a vector space W and B is a linear transformation from a vector space U to V. We define their composition AB to be A(B(u)) for all u in U; the result w = A(B(u)) is a vector in W.

We can show that AB is a linear transformation as follows: Given x and y in U we have

A(B(ax)) = A(aB(x))

A(B(x+y)) = A(B(x)+B(y))

since B is a linear transformation and

A(aB(x)) = aA(B(x))

A(B(x)+B(y)) = A(B(x)) + A(B(y))

since A is a linear transformation.

Since

A(B(ax)) = aA(B(x))

A(B(x+y)) = A(B(x)) + A(B(y))

we see that AB is a linear transformation as well.

Finally, if we have a third linear transformation C from a vector space X to U then the result of applying C and then AB to form the composition (AB)C is the same as applying BC then A to form the composition A(BC). (In other words, composition of linear transformations is associative.) For the proof of this see the answers to exercise 2.6.14.

NOTE: This continues a series of posts containing worked out exercises from the (out of print) book Linear Algebra and Its Applications, Third Edition by Gilbert Strang.

If you find these posts useful I encourage you to also check out the more current Linear Algebra and Its Applications, Fourth Edition, Dr Strang’s introductory textbook Introduction to Linear Algebra, Fourth Edition and the accompanying free online course, and Dr Strang’s other books.

 Buy me a snack to sponsor more posts like this!

This entry was posted in linear algebra. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s