All length-preserving matrices are unitary

I recently read the (excellent) online resource Quantum Computing for the Very Curious by Andy Matuschak and Michael Nielsen. Upon reading the proof that all length-preserving matrices are unitary and trying it out myself, I came to believe that there is an error in the proof as written, specifically with trying to show that off-diagonal entries in M^\dagger M are zero if M is length-preserving.

Using the identity || M \left|\psi\right> ||^2 = \left<\psi\right| M^\dagger M \left|\psi\right>, a suitable choice of \left|\psi\right> = \left|e_j\right> + \left|e_k\right> with j \ne k, and the fact that M is length-preserving, Nielsen first shows that (M^\dagger M)_{jk} + (M^\dagger M)_{kj} = 0 for j \ne k.

He then goes on to write “But what if we’d done something slightly different, and instead of using \left|\psi\right> = \left|e_j\right> + \left|e_k\right> we’d used \left|\psi\right> = \left|e_j\right> - \left|e_k\right>? … I won’t explicitly go through the steps – you can do that yourself – but if you do go through them you end up with the equation: (M^\dagger M)_{jk} - (M^\dagger M)_{kj} = 0.”

I was an undergraduate physics and math major, but either I never worked with bra-ket notation and Hermitian conjugates or I’ve forgotten whatever I knew about them. In any case in working through this I could not get the same result as Nielsen; I simply ended up once again proving that (M^\dagger M)_{jk} + (M^\dagger M)_{kj} = 0.

After some thought and experimentation I concluded that the key is to choose \left|\psi\right> = \left|e_j\right> + i\left|e_k\right>. Below is my (possibly mistaken!) attempt at a correct proof that all length-preserving matrices are unitary.

Proof: Let M be a length-preserving matrix such that for any vector \left|\psi\right> we have || M \left|\psi\right> || = || \left|\psi\right> ||. We wish to show that M is unitary, i.e., M^\dagger M = I.

We first show that the diagonal elements of M^\dagger M, or (M^\dagger M)_{jj}, are equal to 1.

To do this we start with the unit vectors \left|e_j\right> and \left|e_k\right> with 1 in positions j and k respectively, and 0 otherwise. The product M^\dagger M \left|e_k\right> is then the kth column of M^\dagger M, and \left<e_j\right| M^\dagger M \left|e_k\right> is the jkth entry of M^\dagger M or (M^\dagger M)_{jk}.

From the general identity \left<\psi\right| M^\dagger M \left|\psi\right> = || M \left|\psi\right> ||^2 we also have \left<e_j\right| M^\dagger M \left|e_j\right> = || M \left|e_j\right> ||^2. But since M is length-preserving we have || M \left|e_j\right> ||^2 = || \left|e_j\right> ||^2 = 1^2 = 1 since \left|e_j\right> is a unit vector.

We thus have (M^\dagger M)_{jj} = \left<e_j\right| M^\dagger M \left|e_j\right> = || M \left|e_j\right> ||^2 =  1. So all diagonal entries of M^\dagger M are 1.

We next show that the non-diagonal elements of M^\dagger M, or (M^\dagger M)_{jk} with j \ne k, are equal to zero.

Let \left|\psi\right> = \left|e_j\right> + \left|e_k\right> with j \ne k. Since M is length-preserving we have

|| M \left|\psi\right> ||^2 = || \left|\psi\right> ||^2 = || \left|e_j\right> + \left|e_k\right> ||^2 = 1^2 + 1^2 = 2

We also have || M \left|\psi\right> ||^2 = \left<\psi\right| M^\dagger M \left|\psi\right> where \left<\psi\right| = \left|\psi\right>^\dagger = (\left|e_j\right> + \left|e_k\right>)^\dagger. From the definition of the dagger operation and the fact that the nonzero entries of \left|e_j\right> and \left|e_k\right> have no imaginary parts we have (\left|e_j\right> + \left|e_k\right>)^\dagger = \left<e_j\right| + \left<e_k\right|.

We then have

|| M \left|\psi\right> ||^2 = \left<\psi\right| M^\dagger M \left|\psi\right>

= \left|\psi\right>^\dagger M^\dagger M \left|\psi\right>

= (\left|e_j\right> + \left|e_k\right>)^\dagger M^\dagger M (\left|e_j\right> + \left|e_k\right>)

= (\left<e_j\right| + \left<e_k\right|) M^\dagger M (\left|e_j\right> + \left|e_k\right>)

= \left<e_j\right| M^\dagger M \left|e_j\right> + \left<e_j\right| M^\dagger M \left|e_k\right> + \left<e_k\right| M^\dagger M \left|e_j\right> + \left<e_k\right| M^\dagger M \left|e_k\right>

= (M^\dagger M)_{jj} + (M^\dagger M)_{jk} + (M^\dagger M)_{kj} + (M^\dagger M)_{kk}

= 2 + (M^\dagger M)_{jk} + (M^\dagger M)_{kj}

since we previously showed that all diagonal entries of M^\dagger M are 1.

Since || M \left|\psi\right> ||^2 = 2 and also || M \left|\psi\right> ||^2 = 2 + (M^\dagger M)_{jk} + (M^\dagger M)_{kj} we thus have (M^\dagger M)_{jk} + (M^\dagger M)_{kj} = 0 for j \ne k.

Now let \left|\psi\right> = \left|e_j\right> + i\left|e_k\right> with j \ne k. Again we have || M \left|\psi\right> ||^2 = || \left|\psi\right> ||^2 since M is length-preserving, so that

|| M \left|\psi\right> ||^2 = || \left|\psi\right> ||^2 = || \left|e_j\right> + i\left|e_k\right> ||^2

= (\left|e_j\right> + i\left|e_k\right>)^\dagger (\left|e_j\right> + i\left|e_k\right>)

Since i\left|e_k\right> has an imaginary part for its (single) nonzero entry, in performing the dagger operation and taking complex conjugates we obtain (\left|e_j\right> + i\left|e_k\right>)^\dagger = \left<e_j\right| - i\left<e_k\right|. We thus have

|| M \left|\psi\right> ||^2 = (\left|e_j\right> + i\left|e_k\right>)^\dagger (\left|e_j\right> + i\left|e_k\right>)

= (\left<e_j\right| - i\left<e_k\right|)(\left|e_j\right> + i\left|e_k\right>)

= \left<e_j\right| \left|e_j\right> + \left<e_j\right| i \left|e_k\right> - i \left<e_k\right| \left|e_j\right> - i \left<e_k\right| i \left|e_k\right>

= \left<e_j|e_j\right> + i\left<e_j|e_k\right> - i \left<e_k|e_j\right> - i^2\left<e_k|e_k\right>

= \left<e_j|e_j\right> + i\left<e_j|e_k\right> - i\left<e_k|e_j\right> + \left<e_k|e_k\right>

= 1^2 + i\cdot 0 - i\cdot 0 + 1^2 = 2

We also have

|| M \left|\psi\right> ||^2 = \left<\psi\right| M^\dagger M \left|\psi\right>

= \left|\psi\right>^\dagger M^\dagger M \left|\psi\right>

= (\left|e_j\right> + i\left|e_k\right>)^\dagger M^\dagger M (\left|e_j\right> + i\left|e_k\right>)

= (\left<e_j\right| - i\left<e_k\right|) M^\dagger M (\left|e_j\right> + i\left|e_k\right>)

= \left<e_j\right| M^\dagger M \left|e_j\right> + \left<e_j\right| M^\dagger M i\left|e_k\right> - i\left<e_k\right| M^\dagger M \left|e_j\right> - i\left<e_k\right| M^\dagger M i\left|e_k\right>

= \left<e_j\right| M^\dagger M \left|e_j\right> + i\left<e_j\right| M^\dagger M i\left|e_k\right> - i\left<e_k\right| M^\dagger M \left|e_j\right> - i^2\left<e_k\right| M^\dagger M \left|e_k\right>

= (M^\dagger M)_{jj} + i(M^\dagger M)_{jk} - i(M^\dagger M)_{kj} + (M^\dagger M)_{kk}

= 2 + i\left((M^\dagger M)_{jk} - (M^\dagger M)_{kj}\right)

Since || M \left|\psi\right> ||^2 = 2 we have 2 = 2 + i\left((M^\dagger M)_{jk} - (M^\dagger M)_{kj}\right) or 0 = i\left((M^\dagger M)_{jk} - (M^\dagger M)_{kj}\right) so that (M^\dagger M)_{jk} - (M^\dagger M)_{kj} = 0.

But we showed above that (M^\dagger M)_{jk} + (M^\dagger M)_{kj} = 0. Adding the two equations the terms for (M^\dagger M)_{kj} cancel out and we get (M^\dagger M)_{jk} = 0 for j \ne k. So all nondiagonal entries of M^\dagger M are equal to zero.

Since all diagonal entries of M^\dagger M are equal to 1 and all nondiagonal entries of M^\dagger M are equal to zero, we have M^\dagger M = I and thus the matrix M is unitary.

Since we assumed M was a length-preserving matrix we have thus shown that all length-preserving matrices are unitary.

Posted in Uncategorized | Leave a comment

Linear Algebra and Its Applications, Exercise 3.4.28

Exercise 3.4.28. Given the plane x_1 + x_2 + x_3 = 0 and the following vectors

\begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} \qquad \begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix} \qquad \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}

in the plane, find an orthonormal basis for the subspace represented by the plane. Report the dimension of the subspace and the number of nonzero vectors produced by Gram-Schmidt orthogonalization.

Answer: We start with the vector a_1 = (1, -1, 0) and normalize it to create q_1:

\|a_1\|^2 = 1^2 + (-1)^2 + 0^2 = 1 + 1 = 2

q_1 = a_1/\|a_1\| = \frac{1}{\sqrt{2}} a_1 = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix}

We then take the second vector a_2 = (0, 1, -1) and create a second orthogonal vector a_2' by subtracting from a_2 its projection on q_1:

a_2' = a_2 - (q_1^Ta_2)q_1

= a_2 - \left[ \frac{1}{\sqrt{2}} \cdot 0 + (-\frac{1}{\sqrt{2}}) \cdot 1 + 0 \cdot (-1) \right]q_1 = a_2 - (-\frac{1}{\sqrt{2}})q_1 = a_2 + \frac{1}{\sqrt{2}}q_1

= \begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix} + \frac{1}{\sqrt{2}} \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix} + \begin{bmatrix} \frac{1}{2} \\ -\frac{1}{2} \\ 0 \end{bmatrix} = \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \\ -1 \end{bmatrix}

We then normalize a_2' to create q_2:

\|a_2'\|^2 = (\frac{1}{2})^2 + (\frac{1}{2})^2 + (-1)^2 = \frac{1}{4} + \frac{1}{4} + 1 = \frac{3}{2}

q_2 = a_2'/\|a_2'\| = a_2'/\sqrt{\frac{3}{2}} = \frac{\sqrt{2}}{\sqrt{3}} \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \\ -1 \end{bmatrix} = \begin{bmatrix} \frac{\sqrt{2}}{2\sqrt{3}} \\ \frac{\sqrt{2}}{2\sqrt{3}} \\ -\frac{\sqrt{2}}{\sqrt{3}} \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \end{bmatrix}

Finally, we take the third vector a_3 = (1, 0, -1) and attempt to create another orthogonal vector a_3' by subtracting from a_3 its projections on q_1 and q_2:

a_3' = a_3 - (q_1^Ta_3)q_1 - (q_2^Ta_3)q_2

= a_3 - \left[ \frac{1}{\sqrt{2}} \cdot 1 + (-\frac{1}{\sqrt{2}}) \cdot 0 + 0 \cdot (-1) \right]q_1- \left[ \frac{1}{\sqrt{6}} \cdot 1 + \frac{1}{\sqrt{6}} \cdot 0 + (-\frac{2}{\sqrt{6}}) \cdot (-1) \right] q_2

= a_3 - \frac{1}{\sqrt{2}}q_1 - \frac{3}{\sqrt{6}}q_2 = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix} - \frac{1}{\sqrt{2}} \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix} - \frac{3}{\sqrt{6}} \begin{bmatrix} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \end{bmatrix}

= \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix} - \begin{bmatrix} \frac{1}{2} \\ -\frac{1}{2} \\ 0 \end{bmatrix} - \begin{bmatrix} \frac{3}{6} \\ \frac{3}{6} \\ -\frac{6}{6} \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix} - \begin{bmatrix} \frac{1}{2} \\ -\frac{1}{2} \\ 0 \end{bmatrix} - \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \\ -1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}

Since a_3' = 0 we cannot create a third orthogonal vector to q_1 and q_2. The vectors

q_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix} \qquad q_2 = \begin{bmatrix} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \end{bmatrix}

are an orthonormal basis for the subspace, and the dimension of the subspace is 2.

(In hindsight we could have predicted this result by inspecting the original vectors a_1, a_2, and a_3 and noticing that a_3 = a_1 + a_2. Thus only a_1 and a_2 were linearly independent, a_3 being linearly dependent on the first two vectors, so that only two orthonormal basis vectors could be created from the three vectors given.)

NOTE: This continues a series of posts containing worked out exercises from the (out of print) book Linear Algebra and Its Applications, Third Edition by Gilbert Strang.

If you find these posts useful I encourage you to also check out the more current Linear Algebra and Its Applications, Fourth Edition, Dr Strang’s introductory textbook Introduction to Linear Algebra, Fifth Edition and the accompanying free online course, and Dr Strang’s other books.

Posted in linear algebra | Tagged , | Leave a comment

Linear Algebra and Its Applications, Exercise 3.4.27

Exercise 3.4.27. Given the subspace spanned by the three vectors

a_1 = \begin{bmatrix} 1 \\ -1 \\ 0 \\ 0 \end{bmatrix} \qquad a_2 = \begin{bmatrix} 0 \\ 1 \\ -1 \\ 0 \end{bmatrix} \qquad a_3 = \begin{bmatrix} 0 \\ 0 \\ 1 \\ -1 \end{bmatrix}

find vectors q_1, q_2, and q_3 that form an orthonormal basis for the subspace.

Answer: We can save some time by noting that a_1 and a_3 are already orthogonal. We can normalize these two vectors to create q_1 and q_3:

\|a_1\|^2 = 1^2 + (-1)^2 + 0^2 + 0^2 = 1 + 1 = 2

q_1 = a_1/\|a_1\| = \frac{1}{\sqrt{2}} a_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{bmatrix}

\|a_3\|^2 = 0^2 + 0^2 + 1^2 + (-1)^2 = 1 + 1 = 2

q_3 = a_3/\|a_3\| = \frac{1}{\sqrt{2}} a_3 = \begin{bmatrix} 0 \\ 0 \\ \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \end{bmatrix}

We can then compute a third orthogonal vector a_2' by subtracting from a_2 its projections on q_1 and q_3:

a_2' = a_2 - (q_1^Ta_2)q_1 - (q_3^Ta_2)q_3

= a_2 - \left[ \frac{1}{\sqrt{2}} \cdot 0 + (-\frac{1}{\sqrt{2}}) \cdot 1 + 0 \cdot (-1) + 0 \cdot 0 \right]q_1 - \left[ 0 \cdot 0 + 0 \cdot 1 + \frac{1}{\sqrt{2}} \cdot (-1) +  (-\frac{1}{\sqrt{2}}) \cdot 0 \right]q_3

= a_2 - (-\frac{1}{\sqrt{2}})q_1 - (-\frac{1}{\sqrt{2}})q_3 = a_2 + \frac{1}{\sqrt{2}}q_1 + \frac{1}{\sqrt{2}}q_3

= \begin{bmatrix} 0 \\ 1 \\ -1 \\ 0 \end{bmatrix} + \frac{1}{\sqrt{2}} \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{bmatrix} + \frac{1}{\sqrt{2}} \begin{bmatrix} 0 \\ 0 \\ \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \\ -1 \\ 0 \end{bmatrix} + \begin{bmatrix} \frac{1}{2} \\ -\frac{1}{2} \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \\ \frac{1}{2} \\ -\frac{1}{2} \end{bmatrix} = \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \\ -\frac{1}{2} \\ -\frac{1}{2} \end{bmatrix}

Finally, we normalize a_2' to create q_2:

\|a_2'\|^2 = (\frac{1}{2})^2 + (\frac{1}{2})^2 + (-\frac{1}{2})^2 + (-\frac{1}{2})^2 = \frac{1}{4} + \frac{1}{4} + \frac{1}{4} + \frac{1}{4} = 1

q_2 = a_2'/\|a_2'\| = a_2' = \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \\ -\frac{1}{2} \\ -\frac{1}{2} \end{bmatrix}

An orthonormal basis for the space is therefore

q_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{bmatrix} \qquad q_2 = \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \\ -\frac{1}{2} \\ -\frac{1}{2} \end{bmatrix} \qquad q_3 = \begin{bmatrix} 0 \\ 0 \\ \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \end{bmatrix}

(It’s worth noting that the solution for this exercise on page 480 is different than the solution given above. That’s presumably because we computed the orthonormal vectors in the order q_1, q_3, q_2 rather than the standard order q_1, q_2, q_3, taking advantage of the fact that the original vectors a_1 and a_3 were already orthogonal. Recall that a basis set is not unique, so it is possible to have different orthonormal bases for the same subspace.)

NOTE: This continues a series of posts containing worked out exercises from the (out of print) book Linear Algebra and Its Applications, Third Edition by Gilbert Strang.

If you find these posts useful I encourage you to also check out the more current Linear Algebra and Its Applications, Fourth Edition, Dr Strang’s introductory textbook Introduction to Linear Algebra, Fifth Edition and the accompanying free online course, and Dr Strang’s other books.

Posted in linear algebra | Tagged , | Leave a comment

Linear Algebra and Its Applications, Exercise 3.4.26

Exercise 3.4.26. In the Gram-Schmidt orthogonalization process the third component c' is computed as c' = c - (q_1^Tc)q_1 - (q_2^Tc)q_2. Verify that c' is orthogonal to both q_1 and q_2.

Answer: Taking the dot product of q_1 and c' we have

q_1^Tc' = q_1^T \left[ c - (q_1^Tc)q_1 - (q_2^Tc)q_2 \right] = q_1^Tc - q_1^T(q_1^Tc)q_1 - q_1^T(q_2^Tc)q_2

Since q_1^Tc and q_2^Tc are scalars and q_1 and q_2 are orthonormal we then have

q_1^Tc' = q_1^Tc - q_1^T(q_1^Tc)q_1 - q_1^T(q_2^Tc)q_2 = q_1^Tc - (q_1^Tc)q_1^Tq_1 - (q_2^Tc)q_1^Tq_2

= q_1^Tc - (q_1^Tc) \cdot 1 - (q_2^Tc) \cdot 0 = q_1^Tc - q_1^Tc = 0

So c' is orthogonal to q_1.

Taking the dot product of q_2 and c' we have

q_2^Tc' = q_2^T \left[ c - (q_1^Tc)q_1 - (q_2^Tc)q_2 \right] = q_2^Tc - q_2^T(q_1^Tc)q_1 - q_2^T(q_2^Tc)q_2

= q_1^Tc - (q_1^Tc)q_1^Tq_1 - (q_2^Tc)q_1^Tq_2 = q_2^Tc - q_2^Tc = 0

So c' is also orthogonal to q_2.

NOTE: This continues a series of posts containing worked out exercises from the (out of print) book Linear Algebra and Its Applications, Third Edition by Gilbert Strang.

If you find these posts useful I encourage you to also check out the more current Linear Algebra and Its Applications, Fourth Edition, Dr Strang’s introductory textbook Introduction to Linear Algebra, Fifth Edition and the accompanying free online course, and Dr Strang’s other books.

Posted in linear algebra | Tagged , | Leave a comment

Linear Algebra and Its Applications, Exercise 3.4.25

Exercise 3.4.25. Given y = x^2 over the interval -1 \le x \le 1 what is the closest line C + Dx to the parabola formed by y?

Answer: This amounts to finding a least-squares solution to the equation \begin{bmatrix} 1&x \end{bmatrix} \begin{bmatrix} C \\ D \end{bmatrix} = y, where the entries 1, x, and y = x^2 are understood as functions of x over the interval -1 to 1 (as opposed to being scalar values).

Interpreting the traditional least squares equation A^TAx = A^Tb in this context, here the matrix A = \begin{bmatrix} 1&x \end{bmatrix} and we have

A^TA = \begin{bmatrix} 1 \\ x \end{bmatrix} \begin{bmatrix} 1&x \end{bmatrix} = \begin{bmatrix} (1, 1)&(1, x) \\ (x, 1)&(x, x) \end{bmatrix}

where the entries of A^TA are the dot products of the functions, i.e., the integrals of their products over the interval -1 to 1.

We then have

(1, 1) = \int_{-1}^1 1 \cdot 1 \;\mathrm{d}x = 2

(1, x) = (x, 1) = \int_{-1}^1 1 \cdot x \;\mathrm{d}x = \left( \frac{1}{2}x^2 \right) \;\big|_{-1}^1 = \frac{1}{2} \cdot 1^2 - \frac{1}{2} \cdot (-1)^2 = \frac{1}{2} - \frac{1}{2} = 0

(x, x) = \int_{-1}^1 x^2 \;\mathrm{d}x = \left( \frac{1}{3}x^3 \right) \;\big|_{-1}^1 = \frac{1}{3} \cdot 1^3 - \frac{1}{3} \cdot (-1)^3 = \frac{1}{3} + \frac{1}{3} = \frac{2}{3}

so that

A^TA = \begin{bmatrix} (1, 1)&(1, x) \\ (x, 1)&(x, x) \end{bmatrix} = \begin{bmatrix} 2&0 \\ 0&\frac{2}{3} \end{bmatrix}

Continuing the interpretation of the least squares equation A^TAx = A^Tb in this context, the role of b is played by the function y = x^2, and we have

A^Ty = \begin{bmatrix} 1 \\ x \end{bmatrix} x^2 = \begin{bmatrix} (1,x^2) \\ (x, x^2) \end{bmatrix}

where again the entries are dot products of the functions. From above we have

(1, x^2) = \int_{-1}^1 1 \cdot x^2 \;\mathrm{d}x = \frac{2}{3}

and from previous exercises we have

(x, x^2) = \int_{-1}^1 x \cdot x^2 \;\mathrm{d}x = \int_{-1}^1 x^3 \;\mathrm{d}x = 0

so that

A^Ty = \begin{bmatrix} \frac{2}{3} \\ 0 \end{bmatrix}

To get the least squares solution \bar{C} + \bar{D}x we then have

\begin{bmatrix} 2&0 \\ 0&\frac{2}{3} \end{bmatrix} \begin{bmatrix} \bar{C} \\ \bar{D} \end{bmatrix} = \begin{bmatrix} \frac{2}{3} \\ 0 \end{bmatrix}

From the second equation we have \bar{D} = 0. From the first equation we have 2\bar{C} = \frac{2}{3} or C = \frac{1}{3}.

The line of best fit to the parabola y = x^2 over the interval -1 \le x \le 1 is therefore the horizontal line with y-intercept of \frac{1}{3}.

NOTE: This continues a series of posts containing worked out exercises from the (out of print) book Linear Algebra and Its Applications, Third Edition by Gilbert Strang.

If you find these posts useful I encourage you to also check out the more current Linear Algebra and Its Applications, Fourth Edition, Dr Strang’s introductory textbook Introduction to Linear Algebra, Fifth Edition and the accompanying free online course, and Dr Strang’s other books.

Posted in linear algebra | Tagged , | Leave a comment

Linear Algebra and Its Applications, Exercise 3.4.24

Exercise 3.4.24. As discussed on page 178, the first three Legendre polynomials are 1, x, and x^2 - \frac{1}{3}. Find the next Legendre polynomial; it will be a cubic polynomial defined for -1 \le x \le 1 and will be orthogonal to the first three Legendre polynomials.

Answer: The process of finding the fourth Legendre poloynomial is essentially an application of Gram-Schmidt orthogonalization. The first three polynomials are

v_1 = 1 \qquad v_2 = x \qquad v_3 = x^2 - \frac{1}{3}

We can find the fourth Legendre polynomial by starting with x^3 and subtracting off the projections of x_3 on the first three polynomials:

v_4 = x^3 - \frac{(v_1, x^3)}{(v_1, v_1)}v_1 - \frac{(v_2, x^3)}{(v_2, v_2)}v_2 - \frac{(v_3, x^3)}{(v_3, v_3)}v_3

= \frac{(1, x^3)}{(1, 1)}\cdot 1 - \frac{(x, x^3)}{(x, x)}x - \frac{(x^2-\frac{1}{3}, x^3)}{(x^2-\frac{1}{3}, x^2-\frac{1}{3})}(x^2-\frac{1}{3})

For the first term we have

(1, x^3) = \int_{-1}^1 1 \cdot x^3 \;\mathrm{d}x = \int_{-1}^1 x^3 \;\mathrm{d}x = 0

so that the first term \frac{(v_1, x^3)}{(v_1, v_1)}v_1 does not appear in the expression for v_4.

The third term \frac{(v_3, x^3)}{(v_3, v_3)}v_3 drops out for the same reason: its numerator is

(x^2-\frac{1}{3}, x^3) = \int_{-1}^1 (x^2 - \frac{1}{3}) x^3 \;\mathrm{d}x

= \int_{-1}^1 x^5 \;\mathrm{d}x - \frac{1}{3} \int_{-1}^1 x^3 \;\mathrm{d}x = 0 - \frac{1}{3} \cdot 0 = 0

That leaves the second term \frac{(v_2, x^3)}{(v_2, v_2)}v_2 with numerator of

(x, x^3) = \int_{-1}^1 x \cdot x^3 \;\mathrm{d}x = \int_{-1}^1 x^4 \;\mathrm{d}x

= \left( \frac{1}{5} x^5 \right) \;\big|_{-1}^1 = \frac{1}{5} \cdot 1^5 - \frac{1}{5} \cdot (-1)^5 = \frac{1}{5} - (-\frac{1}{5}) = \frac{2}{5}

and denominator

(x, x) = \int_{-1}^1 x^2 \;\mathrm{d}x = \left( \frac{1}{3}x^3 \right) \;\big|_{-1}^1 = \frac{1}{3} \cdot 1^3 - \frac{1}{3} \cdot (-1)^3 = \frac{1}{3} + \frac{1}{3} = \frac{2}{3}

We then have

v_4 = x^3 - \left[ \frac{2}{5}/\frac{2}{3} \right] x = x^3 - \frac{3}{5}x

NOTE: This continues a series of posts containing worked out exercises from the (out of print) book Linear Algebra and Its Applications, Third Edition by Gilbert Strang.

If you find these posts useful I encourage you to also check out the more current Linear Algebra and Its Applications, Fourth Edition, Dr Strang’s introductory textbook Introduction to Linear Algebra, Fifth Edition and the accompanying free online course, and Dr Strang’s other books.

Posted in linear algebra | Tagged , | Leave a comment

Linear Algebra and Its Applications, Exercise 3.4.23

Exercise 3.4.23. Given the step function y with y(x) = 1 for 0 \le x \le \pi and y(x) = 0 for \pi < x < 2\pi, find the following Fourier coefficients:

a_0 = \frac{(y, 1)}{(1, 1)} \qquad a_1 = \frac{(y, \cos x)}{(\cos x, \cos x)} \qquad b_1 = \frac{(y, \sin x)}{(\sin x, \sin x)}

Answer: For a_0 the numerator is

(y, 1) = \int_0^{2\pi} y(x) \cdot 1 \;\mathrm{d}x = \int_0^{\pi} 1 \;\mathrm{d}x + \int_{\pi}^{2\pi} 0 \;\mathrm{d}x = \pi

and the denominator is

(1, 1) = \int_0^{2\pi} 1^2 \;\mathrm{d}x = 2\pi

so that a_0 = \frac{\pi}{2\pi} = \frac{1}{2}.

For a_1 the numerator is

(y, \cos x) = \int_0^{2\pi} y(x) \cos x \;\mathrm{d}x = \int_0^{\pi} 1 \cdot \cos x \;\mathrm{d}x + \int_{\pi}^{2\pi} 0 \cdot \cos x \;\mathrm{d}x

= \int_0^{\pi} \cos x = \sin x \;\big|_0^{\pi} = 0 - 0 = 0

so that a_1 = 0.

For b_1 the numerator is

(y, \sin x) = \int_0^{2\pi} y(x) \sin x \;\mathrm{d}x = \int_0^{\pi} 1 \cdot \sin x \;\mathrm{d}x + \int_{\pi}^{2\pi} 0 \cdot \sin x \;\mathrm{d}x

= \int_0^{\pi} \sin x = (-\cos x) \;\big|_0^{\pi} = -(-1) - (-1) = 1 + 1 = 2

and the denominator is

(\sin x, \sin x) = \int_0^{2\pi} \sin^2 x \;\mathrm{d}x = \left[ \frac{1}{2}x - \frac{1}{4} \sin 2x \right] \;\big|_0^{2\pi}

= \left[ \frac{1}{2}\cdot(2\pi) - \frac{1}{4} \sin 2\pi \right] - \left[ \frac{1}{2} \cdot 0 - \frac{1}{4} \sin 2 \cdot 0 \right] = \pi - \frac{1}{4} \cdot 0 - 0 + \frac{1}{4} \cdot 0 = \pi

so that b_1 = \frac{2}{\pi}.

So we have a_0 = \frac{1}{2}, a_1 = 0, and b_1 = \frac{2}{\pi}.

NOTE: This continues a series of posts containing worked out exercises from the (out of print) book Linear Algebra and Its Applications, Third Edition by Gilbert Strang.

If you find these posts useful I encourage you to also check out the more current Linear Algebra and Its Applications, Fourth Edition, Dr Strang’s introductory textbook Introduction to Linear Algebra, Fifth Edition and the accompanying free online course, and Dr Strang’s other books.

Posted in linear algebra | Tagged , | Leave a comment