3 Metrics and Connections

Finally we come to the definition of a Riemannian metric, the object that gives this field its name. Let us dispel a common misunderstanding: a Riemannian metric is not a distance function, which goes against modern terminology (a la metric spaces). Instead it is a generalisation of an inner product. As we saw for surfaces, an inner product allows us to define a notion of length, so there is a close relation between distance functions and inner products on manifolds. But a new student to the field must get used to the change in terminology.

The functions

g_{i j}

are sufficient to determine the inner product of any two vectors by bilinearity:

The symmetry and positive definiteness of

g

imply that the matrix

(g_{i j})

is symmetric and positive definite.

We can ask how the functions

g_{i j}

in a chart

U

are related to those

{\tilde{g}}_{i j}

in an overlapping chart

\tilde{U}

. We know that the inner product should be independent of basis, so we compute it in two ways:

The term for objects that transform with

\frac{\partial x^{k}}{\partial y^{i}}

, like

g_{i j}

, is covariant, whereas those that transform with

\frac{\partial y^{j}}{\partial x^{i}}

, like the coefficients of vectors, are called contravariant. The convention is to use lower indices for covariant things, and upper indices for contravariant things. Historically this convention came before the summation convention. Because

\frac{\partial x^{i}}{\partial y^{j}} \frac{\partial y^{j}}{\partial x^{i}} = 1

by the chain rule, when covariant and contravariant objects are ‘multiplied’, as in the above formula for

g

, then the result is independent of charts. This explains why there are so many sums of upper index with lower index, and was the motivation of the summation convention.

Clearly one can endow a manifold with functions

g_{i j}

that satisfy the necessary properties and thereby make it a Riemannian manifold. But this is not usually how we construct Riemannian manifolds. It is far more common to ‘inherit’ a metric from a bigger Riemannian manifold. This is how we got a metric on the helicoid. In general, we use the tangent map to move vectors on one manifold into the tangent space of another.

Let’s go through how the definitions of Section 1.4 fit with the definitions in this section. First we have the definition of a regular parameterised surface

Φ : U \to ℝ^{3}

, Definition 1.20.

Φ

is a function between euclidean spaces, so the tangent map is just the Jacobian

T_{p} Φ = J_{p} Φ

. The condition that the Jacobian is rank two is equivalent to it being injective by the rank-nullity theorem of linear algebra. Therefore regular and immersed are equivalent.

The first fundamental form is exactly the standard metric on

ℝ^{3}

pullbacked by

Φ

. In the coordinate basis vectors, we have

Example 3.6 (Stereographic Projection). What does the induced metric from $ℝ^{2}$ look like in stereographic coordinates on $𝕊^{1}$ ? Well, we need to compute the pushforward of the coordinates vector fields and take the dot product. The pushforward was already computed for the $U_{N}$ chart in Example 2.22:

\begin{array}{l} (J_{x} ϕ_{N}^{- 1}) (\frac{\partial}{∂x}) & = \frac{2}{{(x^{2} + 1)}^{2}} (\begin{matrix} - x^{2} + 1 \\ 2 x \end{matrix}) (\begin{matrix} 1 \end{matrix}) = \frac{2}{{(x^{2} + 1)}^{2}} (\begin{matrix} - x^{2} + 1 \\ 2 x \end{matrix}) . \end{array}

Therefore

\begin{array}{l} g_{1 1} & = \frac{4}{{(x^{2} + 1)}^{4}} [{(- x^{2} + 1)}^{2} + {(2 x)}^{2}] = \frac{4}{{(x^{2} + 1)}^{4}} [x^{4} - 2 x^{2} + 1 + 4 x^{2}] \\ = \frac{4}{{(x^{2} + 1)}^{4}} {[x^{2} + 1]}^{2} = \frac{4}{{(x^{2} + 1)}^{2}} . \end{array}

The matrix of the metric has only one entry because the dimension of the manifold is one.

Using this we can calculate the lengths of vectors. For example $∂x | 0$ has length

∥ ∂x | 0 ∥^{2} = {(\begin{matrix} 1 \end{matrix})}^{T} (\begin{matrix} g_{1 1} (0) \end{matrix}) (\begin{matrix} 1 \end{matrix}) = 4 .

This is because we saw in Example 2.22 that it pushes forward to $(2, 0)$ .

On the other hand $∂x | 1$ has length

∥ ∂x | 1 ∥^{2} = {(\begin{matrix} 1 \end{matrix})}^{T} (\begin{matrix} g_{1 1} (1) \end{matrix}) (\begin{matrix} 1 \end{matrix}) = 1 .

So although the vector field $∂x$ appears to be constant in the $U_{N}$ chart, its length is in fact changing.

Finally, consider the notion of isometry in Definition 1.39. It says that two parameterised surfaces are isometric if their parametrisations induce equal metrics. We give the following more general definition.

As above, if

M

is just a manifold and we have an immersion

f : M \to N

to a Riemannian manifold, then we can endow

M

with the pullback metric. Then

f

becomes a Riemannian immersion by definition.

A conformal map does not preserve lengths or distances, but it does preserve angles since

Example 3.13 (Stereographic Projection). Consider inverse stereographic projection $Φ = ϕ_{N}^{- 1}$ as a function between $U_{N} = ℝ^{n}$ with the standard metric and the sphere $𝕊^{n}$ with the induced metric of $ℝ^{n + 1}$ .

For $n = 1$ , and indeed on any one-dimensional manifold, all metrics are conformally equivalent because there is only one metric coefficient $g_{1 1}$ .

For $n = 2$ , Exercise 3.8 shows us that $Φ$ is not a Riemannian immersion, because the pullback metric $Φ^{*} g^{𝕊^{2}}$ is not equal to the standard metric $δ_{i j}$ . However, $Φ$ is conformal because

Φ^{*} g^{𝕊^{2}} = \frac{4}{{(∥ x ∥^{2} + 1)}^{2}} δ_{i j} .

Notice for example, that in stereographic coordinates the lines through the origin are lines of longitude and circles centered at the origin are lines of latitude, and these are always perpendicular to one another.

A calculation similar to the $n = 2$ case shows that sterographic projection is conformal for all $n$ . Therefore stereographic charts have the advantage that the angle between vectors as naively calculated in the chart is the same as in $ℝ^{n + 1}$ .

Example 3.14 (Helicoid). We have seen the pullback metric of the helicoid in Example 3.3. It is a metric on $U = ℝ^{2}$ . On the other hand we could give the plane the standard metric $δ_{i j}$ . With these metrics, the immersion $Φ$ is not conformal.

We could use a different parameterisation of the helicoid $\tilde{Φ} : ℝ^{2} \to ℝ^{3}$

\tilde{Φ} (u, v) = (sinh u cos v, sinh u sin v, v) .

The pushforwards of the coordinate vectors are

\begin{array}{l} \frac{\partial \tilde{Φ}}{∂u} & = (cosh u cos v, cosh u sin v, 0) \\ \frac{\partial \tilde{Φ}}{∂v} & = (- sinh u sin v, sinh u cos v, 1) . \end{array}

The pullback of the standard metric on $ℝ^{3}$ by this map is

\begin{array}{l} {({\tilde{Φ}}^{*} g^{ℝ^{3}})}_{1 1} & = g^{ℝ^{3}} (\frac{\partial \tilde{Φ}}{∂u}, \frac{\partial \tilde{Φ}}{∂u}) = {cosh}^{2} u, \\ {({\tilde{Φ}}^{*} g^{ℝ^{3}})}_{1 2} = {({\tilde{Φ}}^{*} g^{ℝ^{3}})}_{2 1} & = g^{ℝ^{3}} (\frac{\partial \tilde{Φ}}{∂u}, \frac{\partial \tilde{Φ}}{∂v}) = 0, \\ {({\tilde{Φ}}^{*} g^{ℝ^{3}})}_{2 2} & = {sinh}^{2} u + 1 = {cosh}^{2} u . \end{array}

That is to say

{({\tilde{Φ}}^{*} g^{ℝ^{3}})}_{i j} = {cosh}^{2} u δ_{i j} = {cosh}^{2} u g_{i j}^{ℝ^{2}}

Therefore $\tilde{Φ}$ is a conformal map between $ℝ^{2}$ and $ℝ^{3}$ with the standard metrics.

3.2 Quaternions and

𝕊^{3}

In this section we introduce the quaternions as a means to understand the rotations of the 3-Sphere

𝕊^{3}

. The 3-sphere is a beautiful manifold because it is also a group. A manifold that is also a group is called a Lie group. We will not go into the general theory of Lie groups, but they come with a natural way to move vectors around, something we are trying to achieve in this chapter. The example of Lie groups is therefore very instructive for us.

The quaternions are a four dimensional real vector space

{a_{0} + a_{1} i + a_{2} j + a_{3} k}

. A quaternion has a real part

Re a = a_{0}

and an imaginary part

Im a = a_{1} i + a_{2} j + a_{3} k

. Unlike for complex numbers, the imaginary part of a quaternion is not real. The quaternionic conjugate is

ā = Re a - Im a

. Clearly

Re ā = Re a

and

Im ā = - Im a

. Elements of the subspace

{a_{1} i + a_{2} j + a_{3} k}

are called imaginary.

Famously the quaternions have an associative but non-commutative multiplication, defined by

i^{2} = j^{2} = k^{2} = i j k = - 1

and

1

is the identity. We also use the notation

e = 1

to aid clarity. For example

i j = k

because we multiply

i j k = - 1

on the right by

k

to get

i j k^{2} = - k

and use

k^{2} = - 1

. On the other hand

j i = - k

: from

i j k = - 1

we get

1 = k j i

and now multiply on the left by

k

. This doesn’t mean that every multiplication of quaternions is anti-commuting:

According to legend on Monday 16 October 1843, as Hamilton was walking to the Royal Irish Academy, he had the idea that to define a multiplication on

ℝ^{4}

it must be non-commutative, whereupon he carved the above equations into the side of Brougham Bridge. I have been to the bridge but was unable to find the carving, so instead I offer the following simple trick to remember the multiplication rule. Draw

i, j, k

on a directed circle. Multiplication of two elements gives the third, with a plus sign if they are in the correct direction and a minus sign if they are in the reverse direction. This is of course the same rule as for the cross product in

ℝ^{3}

A direct computation shows that

aā = āa = a_{0}^{2} + a_{1}^{2} + a_{2}^{2} + a_{3}^{2}

is always real and non-negative. Thus we can define the norm

| a | = \sqrt{aā}

. The norm shows that every non-zero quaternion has a two-sided inverse, namely

a^{- 1} = | a |^{- 2} ā

. Therefore the quaternions are a non-commutative field.

This norm is plainly the same as the usual norm on

ℝ^{4}

. The unit quaternions (those with norm

1

) are as a set

𝕊^{3} \subset ℝ^{4}

. Therefore the 3-sphere is a Lie group, because we can multiply two elements of it together in a way that can be undone. This is rather special, the only spheres that are Lie groups are

𝕊^{0}

(

𝕊^{0} = {\pm 1}

ℝ^{1}

𝕊^{1}

(add the angles), and

𝕊^{3}

If we choose

a \in 𝕊^{3}

we can look at the function

L_{a} : 𝕊^{3} \to 𝕊^{3}

defined by

L_{a} (q) = aq

This is a bijective function, because the inverse is

L_{a^{- 1}}

. And

L_{a} (e) = a e = a

. Therefore the tangent map of

L_{a}

takes

T_{e} 𝕊^{3}

T_{a} 𝕊^{3}

. Moreover, the tangent map is also bijective: from the chain rule

Indeed, this inverse has the property that it takes

a

to the identity

L_{a^{- 1}} (a) = a^{- 1} a = e

. This gives us a way to move any tangent vector of

𝕊^{3}

T_{e} 𝕊^{3}

. Just as in Example 2.28, this shows us that

T 𝕊^{3}

is trivial. The function

T_{a} L_{a^{- 1}} : T_{a} 𝕊^{3} \to T_{e} 𝕊^{3}

is called the left trivialisation. Likewise we can define

R_{a} (q) = qa

and we have the right trivialisation

T_{a} R_{a^{- 1}} : T_{a} 𝕊^{3} \to T_{e} 𝕊^{3}

Example 3.16 (3-Sphere). Let us compute the trivialisations for the point $a = i = (0, 1, 0, 0)$ in $𝕊^{3}$ . The inverse of $a$ is $a^{- 1} = - i$ , since $i (- i) = 1$ . If we have any point $q = q^{0} + q^{1} i + q^{2} j + q^{3} k$ then

L_{a^{- 1}} (q) = (- i) (q^{0} + q^{1} i + q^{2} j + q^{3} k) = q^{1} - q^{0} i + q^{3} j - q^{2} k .

This does indeed have the property that $L_{a^{- 1}} (a) = 1 - 0 + 0 - 0 = 1 = e$ . Next we use some geometry to avoid using charts. We know that the tangent vectors in $T_{a} 𝕊^{3}$ are perpendicular to $a$ , because this is a sphere. We write

T_{a} 𝕊^{3} = {v^{1} e + v^{2} j + v^{3} k ∣ v^{1}, v^{2}, v^{3} \in ℝ} .

Because $L_{a^{- 1}} (q)$ is linear in $q$ , we know

T_{a} L_{a^{- 1}} (v^{1} e + v^{2} j + v^{3} k) = - v^{1} i + v^{3} j - v^{2} k .

For the right trivialisation

\begin{matrix} R_{a^{- 1}} (q) = (q^{0} + q^{1} i + q^{2} j + q^{3} k) (- i) = q^{1} - q^{0} i - q^{3} j + q^{2} k, \\ T_{a} R_{a^{- 1}} (v^{1} e + v^{2} j + v^{3} k) = - v^{1} i - v^{3} j + v^{2} k . \end{matrix}

So these two trivialisations on $𝕊^{3}$ are different from one another.

Example 3.17. We can generalise the previous example to work for any point $a \in 𝕊^{3}$ . Just like $i$ is a right-angle rotation of the complex plane, $i, j, k$ are all right-angle rotations of the quaternions. Therefore $a i$ , $a j$ , $a k$ is an orthonormal basis of $T_{a} 𝕊^{3}$ . Alternatively, since

L_{a} (q) = a (q^{0} + q^{1} i + q^{2} j + q^{3} k) = q^{0} a + q^{1} a i + q^{2} a j + q^{3} a k

and $i, j, k$ is a basis for $T_{e} 𝕊^{3}$ we know that

T_{e} L_{a} (v^{1} i + v^{2} j + v^{3} k) = v^{1} a i + v^{2} a j + v^{3} a k

is all of $T_{a} 𝕊^{3}$ . This shows us that identifying $T_{a} 𝕊^{3}$ with $T_{e} 𝕊^{3}$ is the same as writing it with respect to the pushforward of a basis. If $v \in T_{a} 𝕊^{3}$ then we get

T_{a} L_{a^{- 1}} v = a^{- 1} v .

We call the vector field on

𝕊^{3}

a left-invariant field when it has the form

for

v \in T_{e} 𝕊^{3}

, because every vector

X |_{a}

corresponds to

v

using the left trivialisation. Ditto we have the right-invariant vector fields

3.3 Covariant Derivatives

We have seen numerous examples thus far of how we cannot simply move vectors around in a chart like we can in euclidean space. If you take a tangent vector at one point of the sphere and translate it in

ℝ^{3}

to another point of the sphere, it may not be tangent anymore. As we observed below Example 2.19, a vector might have the same coordinates at different points in one chart, but not in another. And in Example 3.3 we saw that one coordinate basis vector changed its length as you moved around, while the other stayed the same length.

There is also a common thought experiment. Suppose that you are standing on the equator facing east. You walk forward without turning, until you have walked half way around the Earth. Then, still without turning, you begin to sidestep to the north. You sidestep all the way to the north pole, but keep going until you have returned to your original position. The remarkable fact is, even though at no stage did you turn, you are now facing west.

asks us to subtract two vectors at different points. Indeed, any non-trivial definition of a derivative of a vector field is going to require us to compare vectors at different points. Geometrically, thinking about a surface, what we want to do is to ‘roll’ the tangent plane along the surface to another point. This idea is called development and the relation between two tangent planes was called an affine connection, because it was an affine transformation of one plane to another. In modern terminology it is more common to call this a parallel transport operator, for reasons that will be explained in Section 3.4. Already from the above thought experiment we see that a parallel transport operator will depend not just on the two start and end points, but on the path between those points.

The modern approach, which we will ultimately take, uses a different point of view. It asks: how much are vector fields are changing? Once we have a basis of vector fields and we know their changes, then we can measure all other vector fields against them. This leads to the definition of a covariant derivative, a type of differential operator on vector fields. It is extremely common to call this an connection, but we will refrain from doing so, at least until we have made clear the relationship with the parallel transport operator. Though the two approaches are equivalent, the modern approach is the much easier place to begin. On the other hand, some of the definitions and motivations for the modern approach only really make sense from the point of view of the traditional approach.

The above example suggests that there are many covariant derivatives on a manifold. At least for a manifold that can be covered by a single chart, every set of coordinates gives a covariant derivative. In the following theorem we characterise the set of covariant derivatives.

Proof. $C^{\infty}$ -linear in $X$ is immediate from Property a of covariant derivatives. $C^{\infty}$ -linear in $Y$ is not too much harder to show, we use Properties b and c:

\begin{array}{l} A (X, fY + \tilde{Y}) & = \nabla_{X}^{0} (fY) + \nabla_{X}^{0} \tilde{Y} - \nabla_{X}^{1} (fY) - \nabla_{X}^{1} \tilde{Y} \\ = X (f) Y + f \nabla_{X}^{0} Y - X (f) Y - \nabla_{X}^{1} Y + A (X, \tilde{Y}) \\ = fA (X, Y) + A (X, \tilde{Y}) . & □ \end{array}

Proof. Observe that $\nabla^{t} = \nabla^{0} + t (\nabla^{1} - \nabla^{0})$ . The corollary now follows from Theorem 3.23 and its converse Exercise 3.24. □

The above theorems give us a way to construct new covariant derivatives from existing ones (and in fact construct every covariant derivative). But we need one to start with. One can prove¹ that every manifold has a covariant derivative, but the proof is technical and not practically useful. We have seen in Example 3.21 that if one chart covers the whole space, then we can declare it is special and use the directional derivative. For manifolds that are a submanifold of a bigger space, the following example is typical.

Example 3.26 (Stereographic Projection,Tangent Connection). Consider the sphere $𝕊^{1}$ inside $ℝ^{2}$ . We can understand any vector field $Y$ on $𝕊^{1}$ as a function $\tilde{Y} : 𝕊^{1} \to ℝ^{2}$ using the pushforward. Therefore we can differentiate $\tilde{Y}$ as an $ℝ^{2}$ valued function in the usual way.

For the sake of a numerical example, let us take both $X$ and $Y$ to be the vector field from Example 2.19. The pushforward of the vector field is

X = {\begin{matrix} \frac{- 2 x^{2} + 2}{{(x^{2} + 1)}^{2}} \frac{\partial}{\partial p^{1}} + \frac{4 x}{{(x^{2} + 1)}^{2}} \frac{\partial}{\partial p^{2}} & for x \in U_{N} \\ 0 & for p = N \end{matrix}

and interpreting this a function to $ℝ^{2}$ we have

Y = {\begin{matrix} (\frac{- 2 x^{2} + 2}{{(x^{2} + 1)}^{2}}, \frac{4 x}{{(x^{2} + 1)}^{2}}) & for x \in U_{N} \\ 0 & for p = N \end{matrix} = {\begin{matrix} \frac{2}{x^{2} + 1} (- p^{2}, p^{1}) & for p \neq N \\ 0 & for p = N . \end{matrix}

If we differentiate $\tilde{Y}$ along $X$ , then using the product rule to avoid some nasty but unimportant terms we get

\begin{array}{l} X (\tilde{Y}) & = X (\frac{2}{x^{2} + 1}) (- p^{2}, p^{1}) + \frac{2}{x^{2} + 1} (- \frac{4 x}{{(x^{2} + 1)}^{2}}, \frac{- 2 x^{2} + 2}{{(x^{2} + 1)}^{2}}) \\ = X (\frac{2}{x^{2} + 1}) (- p^{2}, p^{1}) - \frac{4}{{(x^{2} + 1)}^{2}} (p^{1}, p^{2}) \end{array}

The first term is tangent to the circle, but the second is not. So we see the trouble is that the directional derivative $X (\tilde{Y}) = X^{i} \frac{\partial Y^{j}}{\partial p^{i}} \frac{\partial}{\partial p^{j}}$ is no longer be tangent to $𝕊^{1}$ . Therefore this does not meet the definition of a covariant derivative on $𝕊^{1}$ .

What we can do however is to project this directional derivative onto the tangent space. We define the tangent covariant derivative as

\nabla_{X}^{⊤} Y = \underset{T_{p} 𝕊^{1}}{proj} X^{i} \frac{\partial Y^{j}}{\partial p^{i}} \frac{\partial}{\partial p^{j}} .

Let’s check the three required properties. The two linearity properties just follow from the linearity of the projection

\begin{array}{l} \nabla_{fX + \tilde{X}}^{⊤} Y & = \underset{T_{p} 𝕊^{1}}{proj} (f X^{i} \frac{\partial Y^{j}}{\partial p^{i}} \frac{\partial}{\partial p^{j}} + {\tilde{X}}^{i} \frac{\partial Y^{j}}{\partial p^{i}} \frac{\partial}{\partial p^{j}}) \\ = f \underset{T_{p} 𝕊^{1}}{proj} X^{i} \frac{\partial Y^{j}}{\partial p^{i}} \frac{\partial}{\partial p^{j}} + \underset{T_{p} 𝕊^{1}}{proj} {\tilde{X}}^{i} \frac{\partial Y^{j}}{\partial p^{i}} \frac{\partial}{\partial p^{j}} = f \nabla_{X}^{⊤} Y + \nabla_{\tilde{X}}^{⊤} Y, \\ \nabla_{X}^{⊤} (Y + \tilde{Y}) & = \underset{T_{p} 𝕊^{1}}{proj} (X^{i} \frac{\partial Y^{j}}{\partial p^{i}} \frac{\partial}{\partial p^{j}} + X^{i} \frac{\partial {\tilde{Y}}^{j}}{\partial p^{i}} \frac{\partial}{\partial p^{j}}) = \nabla_{X}^{⊤} Y + \nabla_{X}^{⊤} \tilde{Y} . \end{array}

For the third property, we need to recognise that $X (f) Y$ is already tangent to $𝕊^{1}$ , so the projection leaves it unaltered:

\begin{array}{l} \nabla_{X}^{⊤} (fY) & = \underset{T_{p} 𝕊^{1}}{proj} X^{i} \frac{\partial (f Y^{j})}{\partial p^{i}} \frac{\partial}{\partial p^{j}} = \underset{T_{p} 𝕊^{1}}{proj} (X (f) Y + X^{i} f \frac{\partial Y^{j}}{\partial p^{i}} \frac{\partial}{\partial p^{j}}) \\ = X (f) Y + f \nabla_{X}^{⊤} Y . \end{array}

Nothing in the calculation depended on $𝕊^{1}$ specifically, so this is a general construction for immersed submanifolds.

Next we examine what type of derivative a covariant derivative is. We will show that it is a directional derivative, in a sense that will be developed. To this end, the first property to notice is that although the direction and the derived vector fields have dramatically different behaviour under scaling by a smooth function, they are both

ℝ

-linear. If

a

is a constant then

Consequently, if either field is zero, then so is the covariant derivative. Moreover, using cutoff functions, the covariant derivative only depends on local information.² In fact something stronger is true of

X

Proof. By linearity, it suffices to prove that $X |_{p} = 0$ implies $(\nabla_{X} Y) |_{p} = 0$ . Writing $X$ in a chart we have $X = X^{i} ∂i$ and $X^{i} (p) = 0$ for all the coefficients. Then

(\nabla_{X^{i} ∂i} Y) |_{p} = (X^{i} \nabla_{∂i} Y) |_{p} = X^{i} (p) (\nabla_{∂i} Y) |_{p} = 0 . □

For this reason we sometimes speak of the covariant derivative

\nabla_{v} Y

in a direction

v \in T_{p} M

. The same is not true for

Y

: the covariant derivative really is a derivative of

Y

and depends on its values in a neighbourhood of a point. However, to compute

\nabla_{v} Y

you don’t need to know

Y

completely on an open neighbourhood of

p

, it is enough to know

Y

on a curve whose tangent is

v

Proof. Let us consider the situation in a chart, writing $v = v^{i} ∂i |_{p}$ , $Y = Y^{i} ∂i$ and $\tilde{Y} = {\tilde{Y}}^{i} ∂i$ . Then by the properties of covariant derivatives,

\begin{array}{l} \nabla_{v} Y = \nabla_{v^{i} ∂i |_{p}} (Y^{j} \partial j) = v^{i} \nabla_{∂i |_{p}} (Y^{j} \partial j) = v^{i} {\frac{\partial Y^{j}}{\partial x^{i}} |}_{p} ∂j + v^{i} Y^{j} (p) \nabla_{∂i |_{p}} ∂j, \end{array}

and likewise for $\tilde{Y}$ . Now, $Y$ and $\tilde{Y}$ agree on $α$ , so $Y (p) = \tilde{Y} (p)$ . Moreover, by the chain rule

v^{i} {\frac{\partial Y^{j}}{\partial x^{i}} |}_{p} = {\frac{d}{dt} (Y^{j} \circ α) |}_{p} = {\frac{d}{dt} ({\tilde{Y}}^{j} \circ α) |}_{p} = v^{i} {\frac{\partial {\tilde{Y}}^{j}}{\partial x^{i}} |}_{p} .

Hence

\nabla_{v} Y = v^{i} {\frac{\partial Y^{j}}{\partial x^{i}} |}_{p} ∂j + v^{i} Y^{j} (p) \nabla_{∂i |_{p}} ∂j = v^{i} {\frac{\partial {\tilde{Y}}^{j}}{\partial x^{i}} |}_{p} ∂j + v^{i} {\tilde{Y}}^{j} (p) \nabla_{∂i |_{p}} ∂j = \nabla_{v} \tilde{Y} □

This lemma tells us that we can really view the covariant derivative as a generalisation of a directional derivative. This is in contrast to other derivatives of vector fields. Recall Example 2.37. Now consider the vector fields from that example along the curve

α (t) = (t, 0)

, the

x

-axis. We have

X \circ α = \partial 1

Y \circ α = \partial 2

, and

V \circ α = \partial 1

. But

[X, Y] = 0

while

[V, Y] = - \partial 1

. This shows that the Lie bracket is not a covariant derivative.

Example 3.29 (3-Sphere). We define a covariant derivative $\nabla^{L}$ on $𝕊^{3}$ in the following way. Given any vector field $Y$ on $𝕊^{3}$ , use left trivialisation to write it as a function $\tilde{Y} : 𝕊^{3} \to T_{e} 𝕊^{3}$ . From Example 3.17 we know this has the formula $p \mapsto p^{- 1} Y |_{p}$ using quaternions. Now that we have a function to the same vector space, there is no problem differentiating. This gives us a function $X (\tilde{Y}) : M \to T_{e} 𝕊^{3}$ . Use the left trivialisation again to move the result back to $T_{p} 𝕊^{3}$ .

Putting this all in one formula gives

(\nabla_{X}^{L} Y) |_{p} : = (T_{e} L_{p} \circ X \circ T_{p} L_{p^{- 1}}) Y .

This covariant derivative has the property that the derivative of a left-invariant vector field is always zero. This is because, by definition, after you bring its vectors to $e$ they are all the same. In other words $\tilde{Y}$ is constant and thus has zero derivative.

So to see an interesting example, we need to use a non-left-invariant vector field. Consider $Y |_{p} = i p$ . We know that $\tilde{Y} (p) = p^{- 1} i p$ . To proceed we need to choose a direction field $X$ . We know that the value of the covariant derivative at any point only depends on the value of $X$ at that point. So for simplicity let us calculate for the point $i$ in the direction $j = \frac{\partial}{\partial p^{2}}$ :

\begin{array}{l} X |_{i} \tilde{Y} & = {\frac{\partial}{\partial p^{2}} p^{- 1} i p |}_{i} = {- p^{- 1} \frac{∂p}{\partial p^{2}} p^{- 1} i p + p^{- 1} i \frac{∂p}{\partial p^{2}} |}_{i} = {- p^{- 1} j p^{- 1} i p + p^{- 1} i j |}_{i} \\ = - i^{- 1} j i^{- 1} i i + i^{- 1} i j = j + j = 2 j . \end{array}

Finally, we move this back to $T_{i} 𝕊^{3}$

(\nabla_{X} Y) |_{i} = T_{e} L_{i} (2 j) = i 2 j = 2 k .

In the same manner, we can define a covariant derivative $\nabla^{R}$ using the right trivialisation.

In the examples above, to define a covariant derivative we really gave a directional derivative. But what is the minimal information required to specify a covariant derivative? Because covariant derivatives are local, we give the answer in a chart. Let

∂i

be the coordinate vector fields. Then for each pair

i, j

we have a vector field

\nabla_{∂i} ∂j

. This vector field must be able to be written

for some coefficients

Γ_{i j}^{k}

. These coefficients are called Christoffel coefficients, though be aware that some authors reserve this name for a special case. This is sufficient information to determine

\nabla

because

Example 3.30 (Polar Coordinates). Let us consider $ℝ^{2}$ with $\nabla^{euc}$ . We see by comparison of its definition in Example 3.21 with the formula above that $Γ_{i j}^{k}$ is zero for all points and all indices in the standard chart.

But let us compute it with respect to polar coordinates. By the definition of $\nabla^{euc}$ , we have to calculate in the $x^{1}, x^{2}$ coordinates. We have

\begin{array}{l} \frac{\partial}{∂r} & = cos 𝜃 \frac{\partial}{\partial x^{1}} + sin 𝜃 \frac{\partial}{\partial x^{2}} = \frac{x^{1}}{\sqrt{{(x^{1})}^{2} + {(x^{2})}^{2}}} \frac{\partial}{\partial x^{1}} + \frac{x^{2}}{\sqrt{{(x^{1})}^{2} + {(x^{2})}^{2}}} \frac{\partial}{\partial x^{2}} \\ \frac{\partial}{∂𝜃} & = - r sin 𝜃 \frac{\partial}{\partial x^{1}} + r cos 𝜃 \frac{\partial}{\partial x^{2}} = - x^{2} \frac{\partial}{\partial x^{1}} + x^{1} \frac{\partial}{\partial x^{2}} . \end{array}

Hence we can calculate

\begin{array}{l} \nabla_{\frac{\partial}{∂r}} \frac{\partial}{∂𝜃} & = \frac{x^{1}}{\sqrt{{(x^{1})}^{2} + {(x^{2})}^{2}}} \nabla_{\frac{\partial}{\partial x^{1}}} \frac{\partial}{∂𝜃} + \frac{x^{2}}{\sqrt{{(x^{1})}^{2} + {(x^{2})}^{2}}} \nabla_{\frac{\partial}{\partial x^{2}}} \frac{\partial}{∂𝜃} \\ = \frac{x^{1}}{\sqrt{{(x^{1})}^{2} + {(x^{2})}^{2}}} (\frac{\partial (- x^{2})}{\partial x^{1}} \frac{\partial}{\partial x^{1}} + \frac{\partial x^{1}}{\partial x^{1}} \frac{\partial}{\partial x^{2}}) \\ + \frac{x^{2}}{\sqrt{{(x^{1})}^{2} + {(x^{2})}^{2}}} (\frac{\partial (- x^{2})}{\partial x^{2}} \frac{\partial}{\partial x^{1}} + \frac{\partial x^{1}}{\partial x^{2}} \frac{\partial}{\partial x^{2}}) \\ = \frac{x^{1}}{\sqrt{{(x^{1})}^{2} + {(x^{2})}^{2}}} \frac{\partial}{\partial x^{2}} - \frac{x^{2}}{\sqrt{{(x^{1})}^{2} + {(x^{2})}^{2}}} \frac{\partial}{\partial x^{1}} \\ = - sin 𝜃 \frac{\partial}{\partial x^{1}} + cos 𝜃 \frac{\partial}{\partial x^{2}} = \frac{1}{r} \frac{\partial}{∂𝜃}, \end{array}

and hence in polar coordinates

Γ_{r, 𝜃}^{r} = 0, Γ_{r, 𝜃}^{𝜃} = \frac{1}{r} .

The other six coefficients are calculated similarly.

Example 3.31 (Tangent Connection). Let’s calculate the Christoffel coefficients for a submanifold $f : M \to ℝ^{n}$ with the connection $\nabla^{⊤}$ from Example 3.26 in some chart $U$ . Let $Φ = f \circ ϕ^{- 1}$ be a parameterisation, a map from a chart $U$ to $ℝ^{n}$ . Because the definition of $\nabla^{⊤}$ uses the geometry of $ℝ^{n}$ we need the pushforwards of the coordinate basis vectors. We use the notation $E_{i} = T_{x} (f) \frac{\partial}{\partial x^{i}} = J_{x} {(Φ)}_{i}^{j} \frac{\partial}{\partial p^{j}}$ . From the directional derivative definition of the pushforward map

E_{i} (Y^{j}) = TΦ (\frac{\partial}{\partial x^{i}}) (Y^{j}) = \frac{\partial}{\partial x^{i}} (Y^{j} \circ Φ) .

Therefore the covariant derivative is

\nabla_{∂i}^{⊤} \partial j = \underset{T_{p} M}{proj} \frac{\partial}{\partial x^{i}} (E_{j}^{k} \circ Φ) \frac{\partial}{\partial p^{k}} .

Finally to give the Christoffel coefficients, we write this vector in the coordinate basis $E_{i}$ . This requires solving some linear algebra problem.

Example 3.32 (Stereographic Projection). Let’s calculate the Christoffel coefficients for $𝕊^{1}$ with the connection $\nabla^{⊤}$ from Example 3.26 in the chart $U_{N}$ . This is a special case of the previous example. The immersion $f$ is the identity map, so the parameterisation is $Φ = ϕ_{N}^{- 1}$ . There is only one coordinate vector field

E \circ Φ = - 2 \frac{x^{2} - 1}{{(x^{2} + 1)}^{2}} \frac{\partial}{\partial p^{1}} + \frac{4 x}{{(x^{2} + 1)}^{2}} \frac{\partial}{\partial p^{2}} = \frac{2}{{(x^{2} + 1)}^{2}} ((1 - x^{2}) \frac{\partial}{\partial p^{1}} + 2 x \frac{\partial}{\partial p^{2}}) .

The composition with $Φ$ is simply saying that we should express the coefficients in the variables of the chart. We prepare some calculations

\begin{array}{l} \frac{\partial}{∂x} (E^{1} \circ Φ) & = \frac{- 8 x}{{(x^{2} + 1)}^{3}} (1 - x^{2}) + \frac{2}{{(x^{2} + 1)}^{2}} (- 2 x) \\ \frac{\partial}{∂x} (E^{2} \circ Φ) & = \frac{- 8 x}{{(x^{2} + 1)}^{3}} (2 x) + \frac{2}{{(x^{2} + 1)}^{2}} (2) \end{array}

You can do the orthogonal projection in the standard linear algebra way, but because this is the plane it’s easy to write down a vector perpendicular to $E$ . This leads to

\begin{array}{l} \frac{\partial}{\partial x^{i}} (E^{k} \circ Φ) \frac{\partial}{\partial p^{k}} = \frac{- 4 x}{x^{2} + 1} E + \frac{2}{{(x^{2} + 1)}^{2}} (- 2 x \frac{\partial}{\partial p^{1}} + 2 \frac{\partial}{\partial p^{2}}) \\ = \frac{- 4 x}{x^{2} + 1} E + \frac{4}{{(x^{2} + 1)}^{3}} [x ((1 - x^{2}) \frac{\partial}{\partial p^{1}} + 2 x \frac{\partial}{\partial p^{2}}) + (- 2 x \frac{\partial}{\partial p^{1}} + (1 - x^{2}) \frac{\partial}{\partial p^{2}})] \\ = \frac{- 2 x}{x^{2} + 1} E + \frac{4}{{(x^{2} + 1)}^{3}} (- 2 x \frac{\partial}{\partial p^{1}} + (1 - x^{2}) \frac{\partial}{\partial p^{2}}) . \end{array}

Hence

\begin{array}{l} \nabla_{\partial 1}^{⊤} \partial 1 & = \underset{T_{p} 𝕊^{1}}{proj} \frac{\partial}{\partial x^{i}} (E_{j}^{k} \circ Φ) \frac{\partial}{\partial p^{k}} = \frac{- 2 x}{x^{2} + 1} E & \Rightarrow Γ_{11}^{1} & = \frac{- 2 x}{x^{2} + 1} . \end{array}

Exercise 3.35 (Stereographic Projection). Repeat the calculation of the Christoffel coefficients from Example 3.32 for $𝕊^{2}$ in the chart $U_{N}$ . The following formulas may prove useful. Here we have the pushforwards of the coordinate vector fields and combinations that align with longitude and latitude:

\begin{array}{l} E_{1} & = \frac{2}{{(∥ x ∥^{2} + 1)}^{2}} (\begin{matrix} - {(x^{1})}^{2} + {(x^{2})}^{2} + 1 \\ - 2 x^{1} x^{2} \\ 2 x^{1} \end{matrix}) & x^{1} E_{1} + x^{2} E_{2} & = \frac{2}{{(∥ x ∥^{2} + 1)}^{2}} (\begin{matrix} - x^{1} (∥ x ∥^{2} - 1) \\ - x^{2} (∥ x ∥^{2} - 1) \\ 2 ∥ x ∥^{2} \end{matrix}) \\ E_{2} & = \frac{2}{{(∥ x ∥^{2} + 1)}^{2}} (\begin{matrix} - 2 x^{1} x^{2} \\ {(x^{1})}^{2} - {(x^{2})}^{2} + 1 \\ 2 x^{2} \end{matrix}) & x^{2} E_{1} - x^{1} E_{2} & = \frac{2}{∥ x ∥^{2} + 1} (\begin{matrix} x^{2} \\ - x^{1} \\ 0 \end{matrix}) . \end{array}

The derivatives are

\begin{array}{l} \frac{\partial}{\partial x^{1}} E_{1} & = \frac{- 4 x^{1}}{∥ x ∥^{2} + 1} E_{1} + \frac{4}{{(∥ x ∥^{2} + 1)}^{2}} [- p + \frac{1}{2} (∥ x ∥^{2} + 1) (x^{1} E_{1} + x^{2} E_{2})] \\ \frac{\partial}{\partial x^{2}} E_{1} & = \frac{- 4 x^{2}}{∥ x ∥^{2} + 1} E_{1} + \frac{2}{∥ x ∥^{2} + 1} [x^{2} E_{1} - x^{1} E_{2}], \end{array}

and

\begin{array}{l} \frac{\partial}{\partial x^{1}} E_{2} & = \frac{- 4 x^{1}}{∥ x ∥^{2} + 1} E_{2} - \frac{2}{∥ x ∥^{2} + 1} [x^{2} E_{1} - x^{1} E_{2}] \\ \frac{\partial}{\partial x^{2}} E_{2} & = \frac{- 4 x^{2}}{∥ x ∥^{2} + 1} E_{2} + \frac{4}{{(∥ x ∥^{2} + 1)}^{2}} [- p + \frac{1}{2} (∥ x ∥^{2} + 1) (x^{1} E_{1} + x^{2} E_{2})] . \end{array}

With the derivatives in this form, you should be able to calculate the Christoffel coefficients easily. For example, from

\begin{array}{l} \nabla_{\partial 1}^{⊤} \partial 1 & = \frac{- 2 x^{1}}{∥ x ∥^{2} + 1} E_{1} + \frac{2 x^{2}}{∥ x ∥^{2} + 1} E_{2} \end{array}

we read that

Γ_{11}^{1} = \frac{- 2 x^{1}}{∥ x ∥^{2} + 1}, Γ_{11}^{2} = \frac{2 x^{2}}{∥ x ∥^{2} + 1} .

For the other derivative of $E_{1}$ , the projection is trivial, and

Γ_{21}^{1} = \frac{- 2 x^{2}}{∥ x ∥^{2} + 1}, Γ_{21}^{2} = \frac{- 2 x^{1}}{∥ x ∥^{2} + 1} .

And from the derivatives of $E_{2}$ we obtain:

\begin{array}{l} Γ_{12}^{1} & = \frac{- 2 x^{2}}{∥ x ∥^{2} + 1}, Γ_{12}^{2} = \frac{- 2 x^{1}}{∥ x ∥^{2} + 1}, \\ Γ_{22}^{1} & = \frac{2 x^{1}}{∥ x ∥^{2} + 1}, Γ_{22}^{2} = \frac{- 2 x^{2}}{∥ x ∥^{2} + 1} . \end{array}

3.4 Parallel Transport

We began Section 3.3 with the motivation that we want to compare different tangent spaces to one another and a thought experiment about walking around the Earth. Then we went on to define covariant derivatives. Now it is time to connect the two (pardon the pun).

The inspiration of the name parallel is that the vectors of the vector field at different points are meant to be (in some sense) parallel to one another. Phrased different: we have a field of parallel vectors. Even though

α^{'}

is not a vector field on

M

, this is well-defined due to Lemma 3.27. Similarly, we really only need to values of

Y

along the curve

α

to compute this condition, due to Lemma 3.28. Therefore many books build a theory of ‘vector fields on curves’. We will avoid this extra theory by assuming the main result: so long as the curve

α

is injective and not pathological, every vector field on

α

can be extended to a vector field on

M

In a chart we have

α^{'} (t) = \frac{d α^{i}}{dt} ∂i

, so the condition becomes

where we treat the vector field as a function of

t

, i.e.

Y (α (t))

. Since

Γ_{i j}^{k}

and

\frac{d α^{i}}{dt}

are specified, we treat this as a system of ODEs for the functions

Y^{i} (t) : (a, b) \to ℝ

. By the uniqueness of solutions to ODEs, a parallel vector field is uniquely determined by its value at one point of the curve. On the other hand the existence of solutions to ODEs ensures that given a vector

v \in T_{α (t_{0})}

there exists a unique parallel field

Y

along

α

with

Y (t_{0}) = v

Let us make our thought experiment rigorous by using the tangent covariant derivative. We can expand the thought experiment in the following way: while we are walking around the world without turning, we are holding a stick. The stick represents a vector field along the curve of our journey. Suppose at the start of our journey, the stick is pointing south (recall we are facing east). As we walk east around the world, our stick will continue to point south. Thus we ask whether the vector field

Y |_{p} = (0, 0, - 1) \in T_{p} 𝕊^{2}

is parallel with respect to

\nabla^{⊤}

along the equator

α (t) = (cos t, sin t, 0)

. Indeed it is, since

Y

is constant with respect to

p

Now what about the original thought experiment? This time as we walk around the world, let the stick point forward. Clearly, if we don’t turn, it should continue to point forward. In other words

We see from the calculation that the derivative of

Y

along the curve points towards the center of the sphere, so when projected to the tangent plane it becomes zero. In summary, parallel transport by

\nabla^{⊤}

on the sphere matches our intuition of ‘walking without turning’. Of course there are many other covariant derivatives on the sphere, and with respect to them perhaps these two vector fields are not parallel.

Example 3.38 (3-Sphere). Let us consider the covariant derivative $\nabla^{L}$ on $𝕊^{3}$ from Example 3.29. We noted there that left-invariant vector fields have $\nabla^{L}$ -derivative zero at any point and in any direction. Hence left-invariant fields are parallel along every curve in $𝕊^{3}$ .

Conversely, suppose $Y$ is parallel along $α$ . It follows from the definition of $\nabla^{L}$ that $t \mapsto T_{α (t)} L_{α {(t)}^{- 1}} Y (α (t))$ is constant. In words, if we consider $Y$ as a function of $t$ , ie $Y (α (t))$ and move the vectors to $e$ using the tangent map of the left action, ie $T_{α (t)} L_{α {(t)}^{- 1}}$ , then this function is constant. Though we don’t have a formal definition, it is fair to say that $Y$ is left-invariant along the curve.

The final observation for this example is that given any vector $w \in T_{p} 𝕊^{3}$ there is a unique left-invariant vector field $Y$ with $Y |_{p} = w$ . Let $v = T_{p} L p^{- 1} w$ . Then $Y |_{p} = pv$ is the field. Therefore there is a unique way to parallel transport any vector to any other point of $𝕊^{3}$ . Manifolds with this property are called parallelisable. It is equivalent to having a trivial tangent bundle.

In the above example, we encountered the idea of taking a vector

v

at one point

α (t_{0})

, finding a vector field

Y

with

Y |_{α (t_{0})}

that is parallel along

α

, and in particular calculating the parallel vector at another point

w = Y |_{α (t_{1})}

. We call

w

the parallel transport of

v

along

α

. This is a function

P {(α)}_{t}^{s} : T_{α (t)} M \to T_{α (s)} M

called the parallel transport operator. Because the ODE is linear in

Y

, the parallel transport operator is linear: If

Y

is the parallel vector field with

Y |_{α (t)} = v

and

\tilde{Y}

is the parallel vector field with

\tilde{Y} |_{α (t)} = \tilde{v}

, then

Y + \tilde{Y}

is also parallel and

(Y + \tilde{Y}) |_{α (t)} = v + \tilde{v}

. The same idea works with scaling

v

Some other properties of

P {(α)}_{t}^{s}

follow easily from its definition as the solution of an ODE. We have semi-group properties

P {(α)}_{t}^{t} = id

and

P {(α)}_{s}^{u} \circ P {(α)}_{t}^{s} = P {(α)}_{t}^{u}

. By the uniqueness of the solutions to ODEs, we have that

P {(α)}_{t}^{s}

is injective, and therefore an isomorphism of vector spaces. And so on.

Conversely, if one has the parallel transport operator for a curve

α

, the we can recover the covariant derivative in the direction

α^{'}

through the formula

So intuitively the two approaches, covariant derivatives and parallel transport operators, are equivalent. The reason that it is difficult to start with parallel transport operators is that is tricky to characterise exactly when a set of linear functions between tangent spaces, one for every curve, correspond to a covariant derivative. Note our logic above: if we begin with a covariant derivative, then we have a parallel transport operator, and taking a limit we can recover the covariant derivative. But if you begin with a arbitrary set of operators, there is no guarantee that the limit will exist. You need to have some type of smooth dependence of

P {(α)}_{t}^{s}

t

and

s

. Further, what conditions should you impose on the dependence of

P (α)

α

such if two curves are tangent at a point, the above limit produces the same result. Hopefully, these questions give you an appreciation of the difficulty involved.

Special mention should go to Appendix B in Sharpe, which does start with the classical idea of rolling a plane (or another space) around on a surface and shows how that gives various modern structures on the manifold.

3.5 Torsion

In this section we discuss a quantity called torsion that is derived from a covariant derivative. There is a relation between the torsion of a connection and the torsion of a space curve, but we will not be explore it in this course³. Ultimately we will only be interested in covariant derivatives with zero torsion, so in a sense we are introducing it only to rule it out. Which brings us to the point: how should we motivate the definitions in this section without going deep into theory we will not use? We ask some natural questions and give some reasonable answers.

In euclidean space we have Schwarz’ theorem, also known as Clairaut’s theorem, that the partial derivatives with respect to different variables commute (for smooth functions among others). This result is embedded in the definition of the Lie bracket, where it was necessary to have the second order terms cancel. In fact sometimes the theorem is expressed as

[∂i, ∂j] = 0

. So naturally we ask this question of the covariant derivative, but the answer is negative in general:

In this first definition, torsion of a covariant derivative is a measure of the non-commutativity of coordinate vector fields. It seems natural therefore that this should depend on the choice of chart as much as the covariant derivative. But if you have done Exercise 3.34, you may already know that if

Γ_{i j}^{k} = Γ_{j i}^{k}

at a point in one chart then it also holds at that point in any overlapping chart. We will return to this idea shortly.

We have the expectation that the coordinate vector fields should commute, or that this is a desirable property, but we do not have that expectation for general vector fields

X, Y

. We find

The meaning of this equation is that the ‘covariant derivative commutator’ of two vector fields is their Lie bracket plus a factor coming from the fact that the coordinate vector fields do not ‘covariantly commute’.

The definition of

T (X, Y)

is in terms of three vector fields

\nabla_{X} Y

\nabla_{Y} X

, and

[X, Y]

, so clearly is independent of charts. A covariant derivative is torsion-free if

T_{i j}^{k} = 0

, and so this too is independent of charts. The second formula is just a rearrangement of the calculation preceding the definition. We say that the second formula is remarkable because although

T

is defined using derivatives both of which depend on the local behaviour of vector fields, the torsion only depends on the pointwise values of the vector fields. Because the Lie bracket is an antisymmetric function of

X, Y

, so too is the torsion

T (X, Y) = - T (Y, X)

Example 3.45 (3-Sphere). In this example we show that the torsion of the covariant derivative $\nabla^{L}$ on $𝕊^{3}$ from Example 3.29 is non-zero. The trick is to not work with coordinate vector fields, but rather work with left-invariant vector fields. Let $E_{1} |_{p} = p i$ and likewise $E_{2} |_{p} = p j, E_{3} |_{p} = p k$ denote the left-invariant vector fields that are obtained by pushing forward $i, j, k \in T_{e} 𝕊^{3}$ . We have already noted in Example 3.38 that $\nabla_{v}^{L} E_{i} = 0$ for any vector $v \in T_{p} M$ .

Further at any point $E_{1} |_{p}, E_{2} |_{p}, E_{3} |_{p}$ is a basis for $T_{p} 𝕊^{3}$ . This means that every vector field $X$ on $𝕊^{3}$ can be written as

X = X^{1} E_{1} + X^{2} E_{2} + X^{3} E_{3} .

Thus $E_{i}$ have similar properties to the coordinate vector basis field, except that they do not come from coordinates. A set of vector fields with this basis property is called a frame, but we will not explore this concept in generality. In this frame, the covariant derivative can be reckoned with

\nabla_{X}^{L} Y = \nabla_{X}^{L} (Y^{j} E_{j}) = X (Y^{j}) E_{j} + Y^{j} \nabla_{X}^{L} E_{j} = X (Y^{j}) E_{j} .

Similarly the Lie bracket simplifies

\begin{array}{l} [E_{i}, Y] & = [E_{i}, Y^{j} E_{j}] = E_{i} (Y^{j}) E_{j} + Y^{j} [E_{i}, E_{j}] \\ [X, Y] & = [X^{i} E_{i}, Y] = X^{i} [E_{i}, Y] - Y (X^{i}) E_{i} \\ = X^{i} E_{i} (Y^{j}) E_{j} + X^{i} Y^{j} [E_{i}, E_{j}] - Y^{j} E_{j} (X^{i}) E_{i} \\ = X (Y^{j}) E_{j} + X^{i} Y^{j} [E_{i}, E_{j}] - Y (X^{i}) E_{i} . \end{array}

Together this yields

\begin{array}{l} T^{L} (X, Y) & = \nabla_{X}^{L} Y - \nabla_{Y}^{L} X - [X, Y] \\ = X (Y^{j}) E_{j} - Y (X^{j}) E_{j} - X (Y^{j}) E_{j} - X^{i} Y^{j} [E_{i}, E_{j}] + Y (X^{i}) E_{i} \\ = - X^{i} Y^{j} [E_{i}, E_{j}] . \end{array}

Thus the torsion comes down to the Lie brackets of this frame.

For this example we will evaluate $[E_{1}, E_{2}]$ :

\begin{array}{l} [E_{1}, E_{2}] & = [p i, p j] = [- p^{1} + p^{0} i + p^{3} j - p^{2} k, - p^{2} - p^{3} i + p^{0} j + p^{1} k] \\ = [- p^{1} \partial 0 + p^{0} \partial 1 + p^{3} \partial 2 - p^{2} \partial 3, - p^{2} \partial 0 - p^{3} \partial 1 + p^{0} \partial 2 + p^{1} \partial 3] \\ = - p^{1} \partial 2 + p^{0} \partial 3 + p^{3} (- \partial 0) - p^{2} (- \partial 1) - [- p^{2} \partial 1 - p^{3} (- \partial 0) + p^{0} (- \partial 3) + p^{1} \partial 2] \\ = - 2 p^{3} \partial 0 + 2 p^{2} \partial 1 - 2 p^{1} \partial 2 + 2 p^{0} \partial 3 = 2 E_{3} . \end{array}

We can generalise this argument; set $i_{1} = i, i_{2} = j, i_{3} = k$ so that we can use index notation.

[E_{i}, E_{j}] = [p∂i, p∂j] = p i_{i} ∂j - p i_{j} ∂i = p (i_{i} i_{j} - i_{j} i_{i}) .

When $i = j$ , the quaternions commute and the bracket is zero (as expected). If they are not equal then the quaternions anti-commute. This gives $[E_{2}, E_{3}] = 2 E_{1}$ and $[E_{3}, E_{1}] = 2 E_{2}$ . (There is in fact a close relationship between the Lie bracket of $𝕊^{3}$ and the cross product of $ℝ^{3}$ ).

Example 3.46 (3-Sphere). We can also ask for the torsion of $\nabla^{R}$ on $𝕊^{3}$ . Of course we could do the same as the previous example, except using a right-invariant frame, and get a similar answer. But to make the two examples comparable, let us compute the torsion of $\nabla^{R}$ using the left-invariant frame $E_{i}$ .

What changes about the calculation is that $\nabla_{E_{i}}^{R} E_{j} \neq 0$ . Instead we must generalise the calculation from Example 3.29:

\begin{array}{l} E_{i} (p) & = p∂ip = p i_{i}, \\ \nabla_{E_{i}}^{R} E_{j} & = (E_{i} (p i_{j} p^{- 1})) p = (E_{i} (p) i_{j} p^{- 1} - p i_{j} p^{- 1} E_{i} (p) p^{- 1}) p = p i_{i} i_{j} - p i_{j} i_{i} = [E_{i}, E_{j}] . \end{array}

The covariant derivative of an arbitrary vector field is

\nabla_{X}^{R} Y = X (Y^{j}) E_{j} + X^{i} Y^{j} \nabla_{E_{i}}^{R} E_{j} = X (Y^{j}) E_{j} + X^{i} Y^{j} [E_{i}, E_{j}] .

Hence

\begin{array}{l} T^{R} (X, Y) & = \nabla_{X}^{R} Y - \nabla_{Y}^{R} X - [X, Y] \\ = X (Y^{j}) E_{j} + X^{i} Y^{j} [E_{i}, E_{j}] - Y (X^{j}) E_{j} - X^{i} Y^{j} [E_{j}, E_{i}] \\ - X (Y^{j}) E_{j} - X^{i} Y^{j} [E_{i}, E_{j}] + Y (X^{i}) E_{i} \\ = X^{i} Y^{j} [E_{i}, E_{j}] . \end{array}

Thus the torsion of $\nabla^{R}$ is the negative of the torsion of $\nabla^{L}$ .

Recall Exercise 3.24 that given one connection we can create another by the addition of a vector valued function

A (X, Y)

. We can ask how the torsion of the new covariant derivative related to the torsion of the original. This follows easily, for

\tilde{\nabla} = \nabla + A

Purely algebraically, for any function of two variables we can split it into a symmetric and antisymmetric parts

A

is already symmetric or antisymmetric, then it is just equal to its symmetric or antisymmetric part respectively and the other part is zero. Thus we can express the relationship of the torsions by the dictum “adding

A

to a covariant derivative adds twice the antisymmetric part of

A

to its torsion”. In particular, for any covariant derivative, we can absorb the torsion. This means we construct a new torsion-free covariant derivative

\tilde{\nabla} : = \nabla - \frac{1}{2} T

Example 3.47 (3-Sphere). We have just seen in Examples 3.45 and 3.46 that with respect to the left-invariant fields $E_{i}$ the covariant derivatives are

\begin{array}{l} \nabla_{E_{i}}^{L} E_{j} & = 0 & \nabla_{E_{i}}^{R} E_{j} & = [E_{i}, E_{j}] \\ T^{L} (E_{i}, E_{j}) & = - [E_{i}, E_{j}] & T^{R} (E_{i}, E_{j}) & = [E_{i}, E_{j}] . \end{array}

(Aside: the formula on the right makes it seem as if $\nabla^{R}$ and $T^{R}$ are equal. They are not in general, only for left-invariant vector fields. Remember: a covariant derivative has the product rule in $Y$ , whereas the torsion is $C^{\infty}$ -linear.)

If we absorb the torsion on these two connections we get the torsion-free connection

\begin{array}{l} \nabla_{E_{i}}^{LC} E_{j} = \frac{1}{2} [E_{i}, E_{j}] = \nabla_{E_{i}}^{L} E_{j} + \frac{1}{2} [E_{i}, E_{j}] = \nabla_{E_{i}}^{R} E_{j} - \frac{1}{2} [E_{i}, E_{j}] . \end{array}

This fits nicely with Corollary 3.25, because $\nabla^{LC}$ can also be understood as the average of the left and right covariant derivatives: $\nabla^{LC} = \frac{1}{2} \nabla^{L} + \frac{1}{2} \nabla^{R}$ . I’ll give you one guess what the $LC$ stands for!

We have seen now that for a torsion-free connection that the coordinate vector fields will ‘covariant commute’ but general vector fields will not.

Therefore we have two vector fields: the tangents in the main direction and the tangents in the transverse direction. Well, this is not completely true as we do not really have vector fields because the curves may cross each other, giving multiple vectors at the same point. (Technically what we have is the pushforwards of two vector fields.) Regardless, for each value of

(s, t)

it makes sense to ask how the derivative

\partial_{s} α

is changing in comparison to

\partial_{t} α

Proof. This is a purely computational proof. In a chart, the tangent vectors are

\partial_{s} α = \frac{\partial α^{k}}{∂s} ∂k, \partial_{t} α = \frac{\partial α^{k}}{∂t} ∂k .

Then

\begin{array}{l} \nabla_{\partial_{s} α} \partial_{t} α & = (\frac{\partial^{2} α^{k}}{∂s∂t} + Γ_{i j}^{k} \frac{\partial α^{i}}{∂s} \frac{\partial α^{j}}{∂t}) ∂k, \\ \nabla_{\partial_{t} α} \partial_{s} α & = (\frac{\partial^{2} α^{k}}{∂t∂s} + Γ_{i j}^{k} \frac{\partial α^{i}}{∂t} \frac{\partial α^{j}}{∂s}) ∂k . \end{array}

By the symmetry of the Christoffel coefficients for torsion-free covariant derivatives, these are equal. □

We should comment about why the expression

\nabla_{\partial_{s} α} \partial_{t} α

is well-defined even though the tangents do not necessarily form a vector field. We know that the direction of

\nabla

depends only on the pointwise value, so this is no issue. And for

\partial_{t} α

we need to know its values along a curve in the direction of

\partial_{s} α

, but this is exactly the meaning of partial derivative. So understood correctly, these expressions are valid. This is an instance where a fleshed out notion of ‘vector field on a curve’ would have been more precise, but hopefully you see that not much has been lost by skipping this concept.

3.6 The Levi-Civita connection

Let us once more return to the thought experiment of walking along the equator

α (t) = (cos t, sin t, 0)

with our stick. We now understand that we are parallel transporting our stick. But consider the vector field

Z (α (t)) = (0, 0, - {cos}^{2} t)

. To push the metaphor into silliness, it is an telescoping selfie stick that is lengthening and shortening. The vector field

Z

always points south, but it is not parallel according to definition. If we write

Z = {cos}^{2} tY

for

Y (α (t)) = (0, 0, - 1)

, a known parallel vector field, then

This illustrates the point that parallel is about more than just direction, it also concerns length (which is unlike how we use the term in elementary geometry and linear algebra). Therefore, among the many covariant derivatives that exists on a Riemannian manifold, we are interested in those whose parallel transport preserves length and angle.

Let us now turn this intuition into a definition. Suppose

M

is a Riemannian manifold with metric

g

and that

\nabla

is a connection that preserves the lengths and angles of parallel transport vectors. For any curve

γ

, let

X, Y

be parallel fields along

γ

with respect to

\nabla

. This means that

g (X, Y)

is a constant function along

γ

. For all smooth functions

a, b

, we must have

The choice to define this property using a third vector field

Z

instead of the tangent vector

γ^{'}

is purely a matter of style. The converse of the above argument is immediate: if

X, Y

are parallel along a curve

γ

then the right hand side is zero and thus

g (X, Y)

is constant on the curve.

Example 3.51 (3-Sphere). We can show that the left and right covariant derivatives are compatible with the metric on $𝕊^{3}$ coming from $ℝ^{4}$ . Write vector fields $X = X^{i} E_{i}$ and $Y = Y^{i} E_{i}$ with respect to the left-invariant basis fields from Example 3.45. By the property of quaternions that $a \cdot b = Re āb$ we see that

E_{i} \cdot E_{j} = Re \bar{p i_{i}} p i_{j} = Re {\bar{i}}_{i} \bar{p} p i_{j} = Re {\bar{i}}_{i} i_{j} = i_{i} \cdot i_{j},

since $p \in 𝕊^{3}$ has unit length. In particular it is constant on all of $𝕊^{3}$ . Additionally, the covariant derivatives of the $E_{i}$ are zero in every direction. Therefore, similar to the calculation before the definition, we have

\begin{array}{l} Z (X \cdot Y) & = Z (X^{i} Y^{j} E_{i} \cdot E_{j}) = Z (X^{i}) Y^{j} E_{i} \cdot E_{j} + X^{i} Z (Y^{j}) E_{i} \cdot E_{j} \\ = (Z (X^{i}) E_{i}) \cdot Y + X \cdot (Z (Y^{j}) E_{j}) \\ = (Z (X^{i}) E_{i} + X^{i} \nabla_{Z}^{L} E_{i}) \cdot Y + X \cdot (Z (Y^{j}) E_{j} + Y^{j} \nabla_{Z}^{L} E_{j}) \\ = (\nabla_{Z}^{L} X) \cdot Y + X \cdot (\nabla_{Z}^{L} Y) . \end{array}

This shows that $\nabla^{L}$ is metric-compatible.

For $\nabla^{R}$ we can reuse some of this calculation. What changes is that $\nabla_{Z}^{R} E_{i}$ may not be zero. Instead $\nabla_{Z}^{R} E_{i} = Z^{k} [E_{k}, E_{i}]$ . We need to prove a version of the cyclic property for the triple product (for vectors in $ℝ^{3}$ we have $a \cdot (b \times c) = b \cdot (c \times a)$ ):

\begin{array}{l} [E_{k}, E_{i}] \cdot E_{j} + E_{i} \cdot [E_{k}, E_{j}] & = Re \bar{(i_{k} i_{i} - i_{i} i_{k})} i_{j} + Re {\bar{i}}_{i} (i_{k} i_{j} - i_{j} i_{k}) \\ = Re (i_{i} i_{k} i_{j} - i_{k} i_{i} i_{j} - i_{i} i_{k} i_{j} + i_{i} i_{j} i_{k}) \\ = Re (- i_{k} i_{i} i_{j} + i_{i} i_{j} i_{k}) = 0 . \end{array}

This allows us to write

\begin{array}{l} Z (X \cdot Y) & = (Z (X^{i}) E_{i}) \cdot Y + X \cdot (Z (Y^{j}) E_{j}) + X^{i} Y^{j} Z^{k} ([E_{k}, E_{i}] \cdot E_{j} + E_{i} \cdot [E_{k}, E_{j}]) \\ = (Z (X^{i}) E_{i}) \cdot Y + X \cdot (Z (Y^{j}) E_{j}) + X^{i} \nabla_{Z}^{R} E_{i} \cdot Y + Y^{j} X \cdot \nabla_{Z}^{R} E_{j} \\ = (\nabla_{Z}^{R} X) \cdot Y + X \cdot (\nabla_{Z}^{R} Y) . \end{array}

This proves that $\nabla^{R}$ is also metric-compatible.

It is useful to reduce the metric-compatibility condition to a condition on the Christoffel coefficients in some chart.

Proof. Notice that the formula for metric-compatibility is $C^{\infty}$ -linear in $Z$ , so it enough to show it holds for each coordinate basis vector. The following calculation is a set of equivalences:

\begin{matrix} ∂k (g (X, Y)) = g (\nabla_{∂k} X, Y) + g (X, \nabla_{∂k} Y) \\ ∂k (X^{i} Y^{j} g_{i j}) = g ((∂k X^{i} + X^{l} Γ_{k l}^{i}) \partial i, Y^{j} \partial j) + g (X^{i} \partial i, (∂k Y^{j} + Y^{l} Γ_{k l}^{j}) \partial j) \\ ∂k X^{i} Y^{j} g_{i j} + X^{i} \partial k Y^{j} g_{i j} + X^{i} Y^{j} \partial k g_{i j} = (∂k X^{i} + X^{l} Γ_{k l}^{i}) Y^{j} g_{i j} + X^{i} (\partial k Y^{j} + Y^{l} Γ_{k l}^{j}) g_{i j} \\ X^{i} Y^{j} \partial k g_{i j} = X^{l} Γ_{k l}^{i} Y^{j} g_{i j} + X^{i} Y^{l} Γ_{k l}^{j} g_{i j} \\ ∂k g_{i j} = Γ_{k i}^{l} g_{l j} + Γ_{k j}^{l} g_{i l} . \end{matrix}

In other words, a covariant derivative is metric-compatible if and only if its Christoffel coefficients satisfy (3.53). □

The above equation seems to say that metric-compatibility is a rather strong condition. We know that there are

n^{3}

choices of smooth functions for the Christoffel coefficients, and counting the possible values for

i, j, k

gives

n^{3}

conditions. It is almost enough to guarantee uniqueness, but not quite, because the we get the same condition if we swap

i

and

j

. However, metric-compatibility and torsion-free are enough to ensure uniqueness. This result is given a rather impressive sounding name, though sometimes it is called a theorem and other times a lemma. We have our cake and eat it too:

Proof. Our strategy for the proof is as follows. First we will establish the so-called Koszul formula. Uniqueness is then a direct consequence. To prove existence we will show that the Koszul formula defines a torsion-free metric-compatible covariant derivative in every chart. Since we already have uniqueness, we can conclude that these give a well-defined covariant derivative on the whole manifold.

The idea of the Koszul formula is to use the symmetries of the metric and the Lie bracket to get an expression with exact one covariant derivative. Begin with the metric-compatibility property and then use the fact that torsion is zero:

\begin{array}{l} Z (g (X, Y)) & = g (\nabla_{Z} X, Y) + g (X, \nabla_{Z} Y) \\ = g (\nabla_{Z} X, Y) + g (X, T (Z, Y) + \nabla_{Y} Z + [Z, Y]) \\ = g (\nabla_{Z} X, Y) + g (X, \nabla_{Y} Z) + g (X, [Z, Y]) . \end{array}

Now write this equation two more times with the vector fields permuted

\begin{array}{l} Y (g (Z, X)) & = g (\nabla_{Y} Z, X) + g (Z, \nabla_{X} Y) + g (Z, [Y, X]) \\ X (g (Y, Z)) & = g (\nabla_{X} Y, Z) + g (Y, \nabla_{Z} X) + g (Y, [X, Z]) . \end{array}

Notice that of the six possible permutations, only $\nabla_{Z} X$ , $\nabla_{Y} Z$ and $\nabla_{X} Y$ occur. This is a result of using the torsion-free property. Each of the three covariant derivatives occurs twice. Now, add any two equations and subtract the other. We will add the second and third and subtract the first, but it’s not important which you choose.

\begin{array}{l} X (g (Y, Z)) & + Y (g (Z, X)) - Z (g (X, Y)) \\ = 2 g (Z, \nabla_{X} Y) + g (Z, [Y, X]) + g (Y, [X, Z]) - g (X, [Z, Y]) . \end{array}

If you like, you can clean this up a little, though the role each of the vector fields play in $g (Z, \nabla_{X} Y)$ is different, so there cannot be perfect symmetry in the formula. Here is a version I like:

\begin{array}{l} 2 g (\nabla_{X} Y, Z) & = X (g (Y, Z)) - g (X, [Y, Z]) \\ + Y (g (X, Z)) - g (Y, [X, Z]) \\ - Z (g (X, Y)) + g (Z, [X, Y]) \end{array}

This is the Koszul formula. Since the metric is non-degenerate, what it shows is that if there is a metric-compatible torsion-free covariant derivative then it can be calculated purely in terms of Lie brackets and inner products. Therefore we have established uniqueness.

For existence, it is possible to take the Koszul formula as the definition and directly check all the required properties. It is easier however to first reduce the Koszul formula to an expression in charts. Choose any chart and suppose that $X = ∂i, Y = ∂j, Z = ∂k$ are coordinate vector fields. The Lie brackets are zero. We get

\begin{array}{l} 2 g (Γ_{i j}^{l} \partial l, ∂k) = 2 Γ_{i j}^{l} g_{l k} = ∂i g_{j k} + ∂j g_{i k} - ∂k g_{i j} . \end{array}

If we view the left hand side in matrix notation rather than index notation, we see that to solve for $Γ$ we need to invert the matrix $G = (g_{i j})$ . There is a sneaky convention that the components of the inverse matrix use upper indices $(g^{i j}) = G^{- 1}$ . With this convention, the fact that these matrices are inverse can be written $g^{i j} g_{j k} = δ_{k}^{i}$ . In index notation, multiplying by the inverse matrix looks like $Γ_{i j}^{l} g_{l k} g^{k m} = Γ_{i j}^{l} δ_{l}^{m} = Γ_{i j}^{m}$ . Thus we can write

(3.55) Γ_{i j}^{m} = \frac{1}{2} g^{k m} (\partial i g_{j k} + ∂j g_{i k} - ∂k g_{i j}) .

So given a metric $g$ , define a covariant derivative on this chart using this formula for the Christoffel coefficients. Exercise 3.33 tells us that this does indeed define a covariant derivative on this chart, but it remains to show that it is metric-compatible and torsion-free. Torsion-free is an easy because the above formula is symmetric in $i$ and $j$ . Using the Christoffel coefficients defined through the Koszul formula, we see that the condition of Lemma 3.52 satisfied:

\begin{array}{l} 2 Γ_{k i}^{l} g_{l j} + 2 Γ_{k j}^{l} g_{i l} & = (∂k g_{i j} + ∂i g_{k j} - ∂j g_{k i}) + (∂k g_{j i} + ∂j g_{k i} - ∂i g_{k j}) = 2 ∂k g_{i j} . \end{array}

Therefore the covariant derivative that we have defined in each chart is metric-compatible and torsion-free. As mentioned at the outset of the proof, it only remains to show that this definition in each chart agrees, but this follows due to uniqueness. □

We celebrate this result with more terminology. It honours the Italian mathematician Tullio Levi-Civita, who developed much of the ‘tensor calculus’ (covariant, contravariant, indices, etc). His name tricks many students (myself included) into thinking there are two mathematicians Levi and Civita. In response to being asked what he liked best about Italy, Einstein once said “spaghetti and Levi-Civita”.

Example 3.58 (3-Sphere). A corollary of Lemma 3.52 is that the set of metric-compatible covariant derivatives has a affine structure. Corollary 3.25 shows us that the affine combination of covariant derivatives is again a covariant derivative. The Christoffel coefficients of the new covariant derivative is the same affine combination of Christoffel coefficients ${(Γ^{t})}_{i j}^{k} = (1 - t) {(Γ^{0})}_{i j}^{k} + t {(Γ^{1})}_{i j}^{k}$ . Inserting this into Equation (3.53) shows that such an affine combination is also metric-compatible.

We saw in Example 3.51 that both $\nabla^{L}$ and $\nabla^{R}$ were metric-compatible. Therefore $\nabla^{LC} = \frac{1}{2} \nabla^{L} + \frac{1}{2} \nabla^{R}$ is metric-compatible. Additionally, we proved in Example 3.47 that it is torsion-free. Therefore $\nabla^{LC}$ really is the Levi-Civita connection for $𝕊^{3}$ .

Another obvious example of a Levi-Civita connection would be

\nabla^{⊤}

𝕊^{2}

. Instead of proving it for the specific case, instead we generalise the construction to any Riemannian immersed submanifold.

This is definition extends the previous definition from Example 3.26 because

\nabla^{euc}

is the Levi-Civita connection of

ℝ^{n}

Proof. The proof that it is in fact a covariant derivative is entirely similar to the corresponding statement in Example 3.26. We check the three properties of a covariant derivative:

\begin{array}{l} \nabla_{fX + \tilde{X}}^{⊤} Y & = \underset{T_{p} M}{proj} \nabla_{fX + \tilde{X}}^{N} Y = \underset{T_{p} M}{proj} (f \nabla_{X}^{N} Y + \nabla_{\tilde{X}}^{N} Y) = f \nabla_{X}^{⊤} Y + \nabla_{\tilde{X}}^{⊤} Y, \\ \nabla_{X}^{⊤} (Y + \tilde{Y}) & = \underset{T_{p} M}{proj} (\nabla_{X}^{N} Y + \nabla_{X}^{N} \tilde{Y}) = \nabla_{X}^{⊤} Y + \nabla_{X}^{⊤} \tilde{Y}, \\ \nabla_{X}^{⊤} (fY) & = \underset{T_{p} M}{proj} (X (f) Y + \nabla_{X}^{N} Y) = X (f) Y + f \nabla_{X}^{⊤} Y, \end{array}

using that $Y$ is already tangent to $M$ . You might observe that this part of the proof works for any covariant derivative and that we have not yet used the metric-compatibility or torsion-free of $\nabla^{N}$ .

For torsion-free, we need to know that if $X, Y$ are tangent to $M$ that $[X, Y]$ is too, even when we consider them as vector fields on $N$ . To prove this fact requires a proper investigation of submanifolds, and the construction of a special chart on $N$ that aligns with a chart on $M$ . This is beyond the scope of this course, which has tried to avoid manifold theory as much as possible. We have seen an example of this phenomenon though: in Example 3.45 the Lie bracket of the $E_{i}$ fields was again an $E_{i}$ field. Assuming this result,

\begin{array}{l} T^{⊤} (X, Y) & = \nabla_{X}^{⊤} Y - \nabla_{Y}^{⊤} X - [X, Y] = \underset{T_{p} M}{proj} (\nabla_{X}^{N} Y - \nabla_{Y}^{N} X - [X, Y]) = \underset{T_{p} M}{proj} T^{N} (X, Y) \end{array}

is zero. The generalised statement for arbitrary connections would be that the tangent connection is torsion-free iff the torsion of $\nabla^{N}$ is perpendicular to $T_{p} M$ at every point of $M$ . Though we hadn’t defined it, one could also say iff $T^{N}$ lies in the normal bundle of $M$ .

Lastly, we need to show that the tangent connection is metric-compatible. This is where we need to use that $M$ is Riemannian immersed, so that the metric on $M$ and the metric on $N$ agree for tangent vectors to $M$ .

\begin{array}{l} Z (g^{M} (X, Y)) & = Z (g^{N} (X, Y)) = g^{N} (\nabla_{Z}^{N} X, Y) + g^{N} (X, \nabla_{Z}^{N} Y) \\ = g^{N} (\underset{T_{p} M}{proj} \nabla_{Z}^{N} X, Y) + g^{N} (X, \underset{T_{p} M}{proj} \nabla_{Z}^{N} Y) \\ = g^{N} (\nabla_{Z}^{⊤} X, Y) + g^{N} (X, \nabla_{Z}^{⊤} Y) \\ = g^{M} (\nabla_{Z}^{⊤} X, Y) + g^{M} (X, \nabla_{Z}^{⊤} Y) . \end{array}

To explain the working here a little, for any vector in $T_{p} N$ we can split it into a part in $T_{p} M$ and a part perpendicular to $T_{p} M$ . Because $Y \in T_{p} M$ , the inner product of $Y$ with a vector perpendicular to $T_{p} M$ is zero. Thus we can go from the first to the second line.

Now we know that $\nabla^{⊤}$ is a metric-compatible torsion-free covariant derivative on $M$ . By the uniqueness in Theorem 3.54, it is the Levi-Civita connection. □

To close the chapter, we revisit the question “why torsion-free”? Our first answer was that it is a natural expectation, based on the commutativity of partial derivatives in the euclidean setting. Our second answer is that torsion-free is a matter of convenience:

¹Lee Proposition 4.5

²See Lee Lemma 4.1 for a proof. We prove a stronger statement in Lemma 3.28.

³See the ‘American football example’ Lee Problem 6-1

Chapter 3
Metrics and Connections

3.1 Riemannian Metrics

3.2 Quaternions and $𝕊^{3}$

3.3 Covariant Derivatives

3.4 Parallel Transport

3.5 Torsion

3.6 The Levi-Civita connection

Chapter 3Metrics and Connections

3.1 Riemannian Metrics

3.2 Quaternions and 𝕊3

3.3 Covariant Derivatives

3.4 Parallel Transport

3.5 Torsion

3.6 The Levi-Civita connection

Chapter 3
Metrics and Connections

3.2 Quaternions and $𝕊^{3}$