Chapter 3
Metrics and Connections

3.1 Riemannian Metrics

Finally we come to the definition of a Riemannian metric, the object that gives this field its name. Let us dispel a common misunderstanding: a Riemannian metric is not a distance function, which goes against modern terminology (a la metric spaces). Instead it is a generalisation of an inner product. As we saw for surfaces, an inner product allows us to define a notion of length, so there is a close relation between distance functions and inner products on manifolds. But a new student to the field must get used to the change in terminology.

Definition 3.1. A Riemannian metric g on a manifold M is a choice of inner product for every tangent space TpM. If U is a chart of M, then we can express g in charts using the coordinate basis vectors:

gij(p) = g ( xi | p, xj | p ) .

A Riemannian metric should be smooth in the sense that the functions gij are smooth in any chart. A manifold with a Riemannian metric is called a Riemannian manifold. Length of and angle between vectors X,Y TpM is defined in the usual way

Xg := g(X, X),cos 𝜃 = g(X,Y ) XY

The functions gij are sufficient to determine the inner product of any two vectors by bilinearity:

g (Xi xi,Y j xj ) = XiY jg ( xi, xj ) = XiY jg ij.

The symmetry and positive definiteness of g imply that the matrix (gij) is symmetric and positive definite.

Example 3.2 (Euclidean Space). We have seen in Example 2.3 that any open subset of euclidean space is a manifold with one chart. It is also a Riemannian manifold with the usual dot product

gij = g ( xi, xj ) = xi xj = δij.

Notice that the matrix of the metric in charts is symmetric and positive definite. This is also called the standard metric on n.

Example 3.3 (Helicoid). In fact we have seen Riemannian metrics already, namely the first fundamental form of a surface. For the helicoid, in Example 1.22, in the chart U = 2 we had coordinates x1 = u,x2 = v and

g11(u,v) = 1,g12(u,v) = g21(u,v) = 0,g22(u,v) = u2 + b2.

For this example we see that gij are non-constant functions (at least, g22 is non-constant). We understand that the length coordinate basis vector

∂v |(u,v) = u2 + b2

is different at different points of the helicoid.

We can ask how the functions gij in a chart U are related to those g~ij in an overlapping chart U~. We know that the inner product should be independent of basis, so we compute it in two ways:

g~ij = g ( yi, yj ) = g (xk yi xk, xl yj xl ) = xk yi xl yjg ( xk, xl ) = xk yi xl yjgkl.

Notice the subtle contrast to the equivalence relation for vectors:

vi xi = viyj xi yj = v~j yj v~j = viyj xi.

The term for objects that transform with xk yi , like gij, is covariant, whereas those that transform with yj xi, like the coefficients of vectors, are called contravariant. The convention is to use lower indices for covariant things, and upper indices for contravariant things. Historically this convention came before the summation convention. Because xi yj yj xi = 1 by the chain rule, when covariant and contravariant objects are ‘multiplied’, as in the above formula for g, then the result is independent of charts. This explains why there are so many sums of upper index with lower index, and was the motivation of the summation convention.

Clearly one can endow a manifold with functions gij that satisfy the necessary properties and thereby make it a Riemannian manifold. But this is not usually how we construct Riemannian manifolds. It is far more common to ‘inherit’ a metric from a bigger Riemannian manifold. This is how we got a metric on the helicoid. In general, we use the tangent map to move vectors on one manifold into the tangent space of another.

Definition 3.4. Let M be a manifold, N a Riemannian manifold with metric g. Let f : M N be an immersion. That means that Tpf is injective at every point. Then we define a metric fg on M, called the pullback metric or the induced metric, by

fg(v,w) := g (T pf(v),Tpf(w))

for any v,w TpM.

Exercise 3.5. The formula for fg is well-defined for all smooth functions f : M N, so why is it necessary that f is an immersion?

Let’s go through how the definitions of Section 1.4 fit with the definitions in this section. First we have the definition of a regular parameterised surface Φ : U 3, Definition 1.20. Φ is a function between euclidean spaces, so the tangent map is just the Jacobian TpΦ = JpΦ. The condition that the Jacobian is rank two is equivalent to it being injective by the rank-nullity theorem of linear algebra. Therefore regular and immersed are equivalent.

The first fundamental form is exactly the standard metric on 3 pullbacked by Φ. In the coordinate basis vectors, we have

gij = Φg3 ( xi, xj ) = g3 ((JpΦ)ik xk,(JpΦ)jl xl ) = (JpΦ)ik(J pΦ)jlg3 ( xk, xl ) = (JpΦ)ik(J pΦ)jlδ kl = Φk xi Φl xjδkl = ∂Φ xi ∂Φ xj,

which is the definition of the first fundamental form.

Example 3.6 (Stereographic Projection). What does the induced metric from 2 look like in stereographic coordinates on 𝕊1? Well, we need to compute the pushforward of the coordinates vector fields and take the dot product. The pushforward was already computed for the UN chart in Example 2.22:

(JxϕN1)( ∂x) = 2 (x2 + 1)2 ( x2 + 1 2x ) ( 1 ) = 2 (x2 + 1)2 ( x2 + 1 2x ).

Therefore

g11 = 4 (x2 + 1)4 [(x2 + 1)2 + (2x)2] = 4 (x2 + 1)4 [x4 2x2 + 1 + 4x2] = 4 (x2 + 1)4 [x2 + 1]2 = 4 (x2 + 1)2.

The matrix of the metric has only one entry because the dimension of the manifold is one.

Using this we can calculate the lengths of vectors. For example ∂x |0 has length

∂x |02 = ( 1 )T ( g 11(0) ) ( 1 ) = 4.

This is because we saw in Example 2.22 that it pushes forward to (2,0).

On the other hand ∂x |1 has length

∂x |12 = ( 1 )T ( g 11(1) ) ( 1 ) = 1.

So although the vector field ∂x appears to be constant in the UN chart, its length is in fact changing.

Exercise 3.7 (Stereographic Projection). Compute g~ij in the chart US 𝕊1 and verify the change of chart formula for the metric.

Exercise 3.8 (Stereographic Projection). For the UN chart of 𝕊2 verify

g11 = g22 = 4 (x2 + 1)2, g12 = g21 = 0.

Finally, consider the notion of isometry in Definition 1.39. It says that two parameterised surfaces are isometric if their parametrisations induce equal metrics. We give the following more general definition.

Definition 3.9. Let M,N be Riemannian manifolds and let f : M N be an immersion. We call f an Riemannian immersion if gM = fgN. In words, if the metric on M induced by the immersion is equal to the existing metric on M. If additionally f is a diffeomorphism (bijective, smooth, smooth inverse) then we call f an isometry. Two Riemannian manifolds are isometric if there is a isometry between them.

As above, if M is just a manifold and we have an immersion f : M N to a Riemannian manifold, then we can endow M with the pullback metric. Then f becomes a Riemannian immersion by definition.

Example 3.10. Suppose that we have an Riemannian immersion f : M 3 and let R : 3 3 be a rotation. Define R f : M 3; this is also a Riemannian immersion, as we will now prove. The essential step of the calculation is to notice that TR = R because R is a linear transformation, and that R a rotation doesn’t change the inner product g3 . Therefore

(R f)g3 (v,w) = g3 (T(R f)v,T(R f)w) = g3 ((TR Tf)v,(TR Tf)w) = g3 (R(Tf(v)),R(Tf(w))) = g3 (Tf(v),Tf(w)) = fg3 (v,w) = gM(v,w).

In the last line we used that f is a Riemannian immersion.

Exercise 3.11. Generalise the above example to prove: the composition of two Riemannian immersions is a Riemannian immersion.

A weaker condition to isometry is that of a conformal map.

Definition 3.12. Let M,N be Riemannian manifolds and let f : M N be an immersion. We say that f is conformal if there exists a smooth function λ : M such that gM = λfgN.

A conformal map does not preserve lengths or distances, but it does preserve angles since

gN(Tf(X),Tf(Y )) = fgN(X,Y ) = λgM(X,Y )

implies

gN(Tf(X),Tf(Y )) Tf(X)gNTf(Y )gN = λgM(X,Y ) λXgMλY gM = gM(X,Y ) XgMY gM.

Example 3.13 (Stereographic Projection). Consider inverse stereographic projection Φ = ϕN1 as a function between UN = n with the standard metric and the sphere 𝕊n with the induced metric of n+1.

For n = 1, and indeed on any one-dimensional manifold, all metrics are conformally equivalent because there is only one metric coefficient g11.

For n = 2, Exercise 3.8 shows us that Φ is not a Riemannian immersion, because the pullback metric Φg𝕊2 is not equal to the standard metric δij. However, Φ is conformal because

Φg𝕊2 = 4 (x2 + 1)2δij.

Notice for example, that in stereographic coordinates the lines through the origin are lines of longitude and circles centered at the origin are lines of latitude, and these are always perpendicular to one another.

A calculation similar to the n = 2 case shows that sterographic projection is conformal for all n. Therefore stereographic charts have the advantage that the angle between vectors as naively calculated in the chart is the same as in n+1.

Example 3.14 (Helicoid). We have seen the pullback metric of the helicoid in Example 3.3. It is a metric on U = 2. On the other hand we could give the plane the standard metric δij. With these metrics, the immersion Φ is not conformal.

We could use a different parameterisation of the helicoid Φ~ : 2 3

Φ~(u,v) = (sinh ucos v,sinh usin v,v).

The pushforwards of the coordinate vectors are

Φ~ ∂u = (cosh ucos v,cosh usin v,0) Φ~ ∂v = (sinh usin v,sinh ucos v,1).

The pullback of the standard metric on 3 by this map is

(Φ~g3 )11 = g3 (Φ~ ∂u , Φ~ ∂u ) = cosh 2u, (Φ~g3 )12 = (Φ~g3 )21 = g3 (Φ~ ∂u , Φ~ ∂v ) = 0, (Φ~g3 )22 = sinh 2u + 1 = cosh 2u.

That is to say

(Φ~g3 )ij = cosh 2uδ ij = cosh 2ug ij2

Therefore Φ~ is a conformal map between 2 and 3 with the standard metrics.

3.2 Quaternions and 𝕊3

In this section we introduce the quaternions as a means to understand the rotations of the 3-Sphere 𝕊3. The 3-sphere is a beautiful manifold because it is also a group. A manifold that is also a group is called a Lie group. We will not go into the general theory of Lie groups, but they come with a natural way to move vectors around, something we are trying to achieve in this chapter. The example of Lie groups is therefore very instructive for us.

The quaternions are a four dimensional real vector space {a0 + a1i + a2j + a3k}. A quaternion has a real part Re a = a0 and an imaginary part Im a = a1i + a2j + a3k. Unlike for complex numbers, the imaginary part of a quaternion is not real. The quaternionic conjugate is ā = Re a Im a. Clearly Re ā = Re a and Im ā = Im a. Elements of the subspace {a1i + a2j + a3k} are called imaginary.

Famously the quaternions have an associative but non-commutative multiplication, defined by i2 = j2 = k2 = ijk = 1 and 1 is the identity. We also use the notation e = 1 to aid clarity. For example ij = k because we multiply ijk = 1 on the right by k to get ijk2 = k and use k2 = 1. On the other hand ji = k: from ijk = 1 we get 1 = kji and now multiply on the left by k. This doesn’t mean that every multiplication of quaternions is anti-commuting:

(1 + i)(1 + j) = 1 + 1j + i1 + ij = 1 + i + j + k, (1 + j)(1 + i) = 1 + 1i + j1 + ji = 1 + i + j k.

According to legend on Monday 16 October 1843, as Hamilton was walking to the Royal Irish Academy, he had the idea that to define a multiplication on 4 it must be non-commutative, whereupon he carved the above equations into the side of Brougham Bridge. I have been to the bridge but was unable to find the carving, so instead I offer the following simple trick to remember the multiplication rule. Draw i,j,k on a directed circle. Multiplication of two elements gives the third, with a plus sign if they are in the correct direction and a minus sign if they are in the reverse direction. This is of course the same rule as for the cross product in 3.

A direct computation shows that = āa = a02 + a12 + a22 + a32 is always real and non-negative. Thus we can define the norm |a| = . The norm shows that every non-zero quaternion has a two-sided inverse, namely a1 = |a|2ā. Therefore the quaternions are a non-commutative field.

Exercise 3.15. Prove the following:

  1. The dot product can be calculated as a b = Re (āb),
  2. = āa = a02 + a12 + a22 + a32,
  3. Conjugation is order reversing ab¯ = b¯ā,
  4. The norm is multiplicative |ab| = |a||b|.

This norm is plainly the same as the usual norm on 4. The unit quaternions (those with norm 1) are as a set 𝕊3 4. Therefore the 3-sphere is a Lie group, because we can multiply two elements of it together in a way that can be undone. This is rather special, the only spheres that are Lie groups are 𝕊0 (𝕊0 = {±1} in 1), 𝕊1 (add the angles), and 𝕊3.

If we choose a 𝕊3 we can look at the function La : 𝕊3 𝕊3 defined by La(q) = aq This is a bijective function, because the inverse is La1. And La(e) = ae = a. Therefore the tangent map of La takes Te𝕊3 to Ta𝕊3. Moreover, the tangent map is also bijective: from the chain rule

id Te𝕊3 = T(La La1) = TLa TLa1

Indeed, this inverse has the property that it takes a to the identity La1(a) = a1a = e. This gives us a way to move any tangent vector of 𝕊3 to Te𝕊3. Just as in Example 2.28, this shows us that T𝕊3 is trivial. The function TaLa1 : Ta𝕊3 Te𝕊3 is called the left trivialisation. Likewise we can define Ra(q) = qa and we have the right trivialisation TaRa1 : Ta𝕊3 Te𝕊3

Example 3.16 (3-Sphere). Let us compute the trivialisations for the point a = i = (0,1,0,0) in 𝕊3. The inverse of a is a1 = i, since i(i) = 1. If we have any point q = q0 + q1i + q2j + q3k then

La1(q) = (i)(q0 + q1i + q2j + q3k) = q1 q0i + q3j q2k.

This does indeed have the property that La1(a) = 1 0 + 0 0 = 1 = e. Next we use some geometry to avoid using charts. We know that the tangent vectors in Ta𝕊3 are perpendicular to a, because this is a sphere. We write

Ta𝕊3 = {v1e + v2j + v3kv1,v2,v3 }.

Because La1(q) is linear in q, we know

TaLa1(v1e + v2j + v3k) = v1i + v3j v2k.

For the right trivialisation

Ra1(q) = (q0 + q1i + q2j + q3k)(i) = q1 q0i q3j + q2k, TaRa1(v1e + v2j + v3k) = v1i v3j + v2k.

So these two trivialisations on 𝕊3 are different from one another.

Example 3.17. We can generalise the previous example to work for any point a 𝕊3. Just like i is a right-angle rotation of the complex plane, i,j,k are all right-angle rotations of the quaternions. Therefore ai, aj, ak is an orthonormal basis of Ta𝕊3. Alternatively, since

La(q) = a(q0 + q1i + q2j + q3k) = q0a + q1ai + q2aj + q3ak

and i,j,k is a basis for Te𝕊3 we know that

TeLa(v1i + v2j + v3k) = v1ai + v2aj + v3ak

is all of Ta𝕊3. This shows us that identifying Ta𝕊3 with Te𝕊3 is the same as writing it with respect to the pushforward of a basis. If v Ta𝕊3 then we get

TaLa1v = a1v.

We call the vector field on 𝕊3 a left-invariant field when it has the form

X|a = av

for v Te𝕊3, because every vector X|a corresponds to v using the left trivialisation. Ditto we have the right-invariant vector fields

Y |a = va.

Example 3.18 (3-Sphere).

X|a = ai = a1e + a0i + a3j a2k

is a left-invariant vector field on 𝕊3. We recognise X|i = e as a vector in Ti𝕊3.

3.3 Covariant Derivatives

We have seen numerous examples thus far of how we cannot simply move vectors around in a chart like we can in euclidean space. If you take a tangent vector at one point of the sphere and translate it in 3 to another point of the sphere, it may not be tangent anymore. As we observed below Example 2.19, a vector might have the same coordinates at different points in one chart, but not in another. And in Example 3.3 we saw that one coordinate basis vector changed its length as you moved around, while the other stayed the same length.

There is also a common thought experiment. Suppose that you are standing on the equator facing east. You walk forward without turning, until you have walked half way around the Earth. Then, still without turning, you begin to sidestep to the north. You sidestep all the way to the north pole, but keep going until you have returned to your original position. The remarkable fact is, even though at no stage did you turn, you are now facing west.

Exercise 3.19. Can you modify the journey so that you end up facing other directions? What is the connection between the area your journey encompasses and the final rotation angle?

However, the naive definition of the derivative of a vector field

lim h0 1 h(X|p+h X|p)

asks us to subtract two vectors at different points. Indeed, any non-trivial definition of a derivative of a vector field is going to require us to compare vectors at different points. Geometrically, thinking about a surface, what we want to do is to ‘roll’ the tangent plane along the surface to another point. This idea is called development and the relation between two tangent planes was called an affine connection, because it was an affine transformation of one plane to another. In modern terminology it is more common to call this a parallel transport operator, for reasons that will be explained in Section 3.4. Already from the above thought experiment we see that a parallel transport operator will depend not just on the two start and end points, but on the path between those points.

The modern approach, which we will ultimately take, uses a different point of view. It asks: how much are vector fields are changing? Once we have a basis of vector fields and we know their changes, then we can measure all other vector fields against them. This leads to the definition of a covariant derivative, a type of differential operator on vector fields. It is extremely common to call this an connection, but we will refrain from doing so, at least until we have made clear the relationship with the parallel transport operator. Though the two approaches are equivalent, the modern approach is the much easier place to begin. On the other hand, some of the definitions and motivations for the modern approach only really make sense from the point of view of the traditional approach.

Definition 3.20. A covariant derivative on a manifold M is a function that acts on two vector fields to produce a third. We write it as XY , with X being the ‘direction’. It has the following properties for all smooth functions f : M and vector fields X,X~,Y,Y ~:

  1. It is C-linear in the direction:

    fX+X~Y = fXY + X~Y.
  2. It is additive in the derivative:

    X(Y + Y ~) = XY + XY ~.
  3. It obeys the product (Leibniz) rule:

    X(fY ) = X(f)Y + fXY.

Example 3.21 (Euclidean Space). Consider euclidean space n and let X = Xi∂i,Y = Y i∂i be vector fields in the chart xi. Then

XeucY := XiY j xi ∂j

is a covariant derivative.

You might be confused, because in Example 2.33 we said this formula didn’t work. Indeed, this formula is not chart independent. This definition is saying explicitly “use this particular coordinates to do the derivative and not others”. If you write this covariant derivative in polar coordinates, then the formula for this covariant derivative will look different. But this is why we say that it is a covariant derivative, we are not claiming uniqueness.

Exercise 3.22. Check the above example has the three properties that are required of a covariant derivative.

The above example suggests that there are many covariant derivatives on a manifold. At least for a manifold that can be covered by a single chart, every set of coordinates gives a covariant derivative. In the following theorem we characterise the set of covariant derivatives.

Theorem 3.23 (Tensorial). Let 0,1 be two covariant derivatives derivatives. Define their difference A(X,Y ) := X0Y X1Y . Then A is C-linear in both X and Y .

Proof. C-linear in X is immediate from Property a of covariant derivatives. C-linear in Y is not too much harder to show, we use Properties b and c:

A(X,fY + Y ~) = X0(fY ) + X0Y ~ X1(fY ) X1Y ~ = X(f)Y + fX0Y X(f)Y X1Y + A(X,Y ~) = fA(X,Y ) + A(X,Y ~).

Exercise 3.24. Prove the converse of Theorem 3.23: Let is a covariant derivative on M. For all vector fields X,Y let A(X,Y ) be a smooth vector field. Suppose that this function A is C-linear in both X,Y . Then ~ := + A is also a covariant derivative.

Corollary 3.25 (Affineness). The space of covariant derivatives on M is affine in the following sense: if t is a constant and 0,1 are two covariant derivatives, so is t := (1 t)0 + t1.

Proof. Observe that t = 0 + t(1 0). The corollary now follows from Theorem 3.23 and its converse Exercise 3.24. □

The above theorems give us a way to construct new covariant derivatives from existing ones (and in fact construct every covariant derivative). But we need one to start with. One can prove1 that every manifold has a covariant derivative, but the proof is technical and not practically useful. We have seen in Example 3.21 that if one chart covers the whole space, then we can declare it is special and use the directional derivative. For manifolds that are a submanifold of a bigger space, the following example is typical.

Example 3.26 (Stereographic Projection,Tangent Connection). Consider the sphere 𝕊1 inside 2. We can understand any vector field Y on 𝕊1 as a function Y ~ : 𝕊1 2 using the pushforward. Therefore we can differentiate Y ~ as an 2 valued function in the usual way.

For the sake of a numerical example, let us take both X and Y to be the vector field from Example 2.19. The pushforward of the vector field is

X = { 2x2+2 (x2+1)2 p1 + 4x (x2+1)2 p2for x UN 0 for p = N

and interpreting this a function to 2 we have

Y = { (2x2+2 (x2+1)2, 4x (x2+1)2 )for x UN 0 for p = N = { 2 x2+1(p2,p1)for pN 0 for p = N.

If we differentiate Y ~ along X, then using the product rule to avoid some nasty but unimportant terms we get

X(Y ~) = X ( 2 x2 + 1 )(p2,p1) + 2 x2 + 1 ( 4x (x2 + 1)2, 2x2 + 2 (x2 + 1)2 ) = X ( 2 x2 + 1 )(p2,p1) 4 (x2 + 1)2 (p1,p2)

The first term is tangent to the circle, but the second is not. So we see the trouble is that the directional derivative X(Y ~) = XiY j pi pj is no longer be tangent to 𝕊1. Therefore this does not meet the definition of a covariant derivative on 𝕊1.

What we can do however is to project this directional derivative onto the tangent space. We define the tangent covariant derivative as

XY = proj Tp𝕊1XiY j pi pj.

Let’s check the three required properties. The two linearity properties just follow from the linearity of the projection

fX+X~Y = proj Tp𝕊1 (fXiY j pi pj + X~iY j pi pj ) = fproj Tp𝕊1XiY j pi pj + proj Tp𝕊1X~iY j pi pj = fXY + X~Y, X(Y + Y ~) = proj Tp𝕊1 (XiY j pi pj + XiY ~j pi pj ) = XY + XY ~.

For the third property, we need to recognise that X(f)Y is already tangent to 𝕊1, so the projection leaves it unaltered:

X(fY ) = proj Tp𝕊1Xi(fY j) pi pj = proj Tp𝕊1 (X(f)Y + Xif Y j pi pj ) = X(f)Y + fXY.

Nothing in the calculation depended on 𝕊1 specifically, so this is a general construction for immersed submanifolds.

Next we examine what type of derivative a covariant derivative is. We will show that it is a directional derivative, in a sense that will be developed. To this end, the first property to notice is that although the direction and the derived vector fields have dramatically different behaviour under scaling by a smooth function, they are both -linear. If a is a constant then

aXY = aXY,X(aY ) = X(a)Y + aXY = aXY.

Consequently, if either field is zero, then so is the covariant derivative. Moreover, using cutoff functions, the covariant derivative only depends on local information.2 In fact something stronger is true of X:

Lemma 3.27 (Directional Derivative). The value of XY at p M only depends on X|p and not other values of X.

Proof. By linearity, it suffices to prove that X|p = 0 implies (XY )|p = 0. Writing X in a chart we have X = Xi∂i and Xi(p) = 0 for all the coefficients. Then

(Xi∂iY )|p = (Xi ∂iY ) |p = Xi(p) ( ∂iY ) |p = 0.

For this reason we sometimes speak of the covariant derivative vY in a direction v TpM. The same is not true for Y : the covariant derivative really is a derivative of Y and depends on its values in a neighbourhood of a point. However, to compute vY you don’t need to know Y completely on an open neighbourhood of p, it is enough to know Y on a curve whose tangent is v.

Lemma 3.28 (Curve Derivative). Let Y,Y ~ be two vector fields and let α : (a,b) M be a smooth curve with α(0) = p and α(0) = v. Suppose that Y α = Y ~ α. Then vY = vY ~.

Proof. Let us consider the situation in a chart, writing v = vi∂i|p, Y = Y i∂i and Y ~ = Y ~i∂i. Then by the properties of covariant derivatives,

vY = vi∂i|p(Y j j) = vi ∂i|p(Y j j) = vi Y j xi | p∂j + viY j(p) ∂i|p∂j,

and likewise for Y ~. Now, Y and Y ~ agree on α, so Y (p) = Y ~(p). Moreover, by the chain rule

vi Y j xi | p = d dt(Y j α)| p = d dt(Y ~j α)| p = vi Y ~j xi | p.

Hence

vY = vi Y j xi | p∂j + viY j(p) ∂i|p∂j = vi Y ~j xi | p∂j + viY ~j(p) ∂i|p∂j = vY ~

This lemma tells us that we can really view the covariant derivative as a generalisation of a directional derivative. This is in contrast to other derivatives of vector fields. Recall Example 2.37. Now consider the vector fields from that example along the curve α(t) = (t,0), the x-axis. We have X α = 1, Y α = 2, and V α = 1. But [X,Y ] = 0 while [V,Y ] = 1. This shows that the Lie bracket is not a covariant derivative.

To break up all this theory, let’s do another example.

Example 3.29 (3-Sphere). We define a covariant derivative L on 𝕊3 in the following way. Given any vector field Y on 𝕊3, use left trivialisation to write it as a function Y ~ : 𝕊3 Te𝕊3. From Example 3.17 we know this has the formula pp1Y |p using quaternions. Now that we have a function to the same vector space, there is no problem differentiating. This gives us a function X(Y ~) : M Te𝕊3. Use the left trivialisation again to move the result back to Tp𝕊3.

Putting this all in one formula gives

(XLY )| p := (TeLp X TpLp1)Y.

This covariant derivative has the property that the derivative of a left-invariant vector field is always zero. This is because, by definition, after you bring its vectors to e they are all the same. In other words Y ~ is constant and thus has zero derivative.

So to see an interesting example, we need to use a non-left-invariant vector field. Consider Y |p = ip. We know that Y ~(p) = p1ip. To proceed we need to choose a direction field X. We know that the value of the covariant derivative at any point only depends on the value of X at that point. So for simplicity let us calculate for the point i in the direction j = p2:

X|iY ~ = p2p1ip| i = p1 ∂p p2p1ip + p1i ∂p p2 | i = p1jp1ip + p1ij| i = i1ji1ii + i1ij = j + j = 2j.

Finally, we move this back to Ti𝕊3

(XY )|i = TeLi(2j) = i2j = 2k.

In the same manner, we can define a covariant derivative R using the right trivialisation.

In the examples above, to define a covariant derivative we really gave a directional derivative. But what is the minimal information required to specify a covariant derivative? Because covariant derivatives are local, we give the answer in a chart. Let ∂i be the coordinate vector fields. Then for each pair i,j we have a vector field ∂i∂j. This vector field must be able to be written

∂i∂j = Γijk k,

for some coefficients Γijk. These coefficients are called Christoffel coefficients, though be aware that some authors reserve this name for a special case. This is sufficient information to determine because

XY = Xi ∂i(Y j j) = XiY j xi ∂j + XiY j ∂i∂j = (XiY k xi + XiY jΓ ijk) k.

Example 3.30 (Polar Coordinates). Let us consider 2 with euc. We see by comparison of its definition in Example 3.21 with the formula above that Γijk is zero for all points and all indices in the standard chart.

But let us compute it with respect to polar coordinates. By the definition of euc, we have to calculate in the x1,x2 coordinates. We have

∂r = cos 𝜃 x1 + sin 𝜃 x2 = x1 (x1 )2 + (x2 )2 x1 + x2 (x1 )2 + (x2 )2 x2 ∂𝜃 = rsin 𝜃 x1 + rcos 𝜃 x2 = x2 x1 + x1 x2.

Hence we can calculate

∂r ∂𝜃 = x1 (x1 )2 + (x2 )2 x1 ∂𝜃 + x2 (x1 )2 + (x2 )2 x2 ∂𝜃 = x1 (x1 )2 + (x2 )2 ((x2) x1 x1 + x1 x1 x2 ) + x2 (x1 )2 + (x2 )2 ((x2) x2 x1 + x1 x2 x2 ) = x1 (x1 )2 + (x2 )2 x2 x2 (x1 )2 + (x2 )2 x1 = sin 𝜃 x1 + cos 𝜃 x2 = 1 r ∂𝜃,

and hence in polar coordinates

Γr,𝜃r = 0,Γ r,𝜃𝜃 = 1 r.

The other six coefficients are calculated similarly.

Example 3.31 (Tangent Connection). Let’s calculate the Christoffel coefficients for a submanifold f : M n with the connection from Example 3.26 in some chart U. Let Φ = f ϕ1 be a parameterisation, a map from a chart U to n. Because the definition of uses the geometry of n we need the pushforwards of the coordinate basis vectors. We use the notation Ei = Tx(f) xi = Jx(Φ)ij pj. From the directional derivative definition of the pushforward map

Ei(Y j) = ( xi ) (Y j) = xi(Y j Φ).

Therefore the covariant derivative is

∂i j = proj TpM xi(Ejk Φ) pk.

Finally to give the Christoffel coefficients, we write this vector in the coordinate basis Ei. This requires solving some linear algebra problem.

Example 3.32 (Stereographic Projection). Let’s calculate the Christoffel coefficients for 𝕊1 with the connection from Example 3.26 in the chart UN. This is a special case of the previous example. The immersion f is the identity map, so the parameterisation is Φ = ϕN1. There is only one coordinate vector field

E Φ = 2 x2 1 (x2 + 1)2 p1 + 4x (x2 + 1)2 p2 = 2 (x2 + 1)2 ((1 x2) p1 + 2x p2 ) .

The composition with Φ is simply saying that we should express the coefficients in the variables of the chart. We prepare some calculations

∂x(E1 Φ) = 8x (x2 + 1)3(1 x2) + 2 (x2 + 1)2(2x) ∂x(E2 Φ) = 8x (x2 + 1)3(2x) + 2 (x2 + 1)2(2)

You can do the orthogonal projection in the standard linear algebra way, but because this is the plane it’s easy to write down a vector perpendicular to E. This leads to

xi(Ek Φ) pk = 4x x2 + 1E + 2 (x2 + 1)2 (2x p1 + 2 p2 ) = 4x x2 + 1E + 4 (x2 + 1)3 [x ((1 x2) p1 + 2x p2 ) + (2x p1 + (1 x2) p2 )] = 2x x2 + 1E + 4 (x2 + 1)3 (2x p1 + (1 x2) p2 ) .

Hence

1 1 = proj Tp𝕊1 xi(Ejk Φ) pk = 2x x2 + 1E Γ111 = 2x x2 + 1.

Exercise 3.33 (Lee Lemma 4.4). Suppose that M is a manifold covered by a single chart U. Show that the set of covariant derivatives on M is in one-to-one correspondence with the set of Christoffel coefficients. That is, show that every choice of n3 functions Γijk gives a covariant derivative.

Exercise 3.34. Derive the transformation formula for Γijk between two charts. Observe that it is neither covariant nor contravariant.

Exercise 3.35 (Stereographic Projection). Repeat the calculation of the Christoffel coefficients from Example 3.32 for 𝕊2 in the chart UN. The following formulas may prove useful. Here we have the pushforwards of the coordinate vector fields and combinations that align with longitude and latitude:

E1 = 2 (x2 + 1)2 ( (x1)2 + (x2)2 + 1 2x1x2 2x1 ) x1E 1 + x2E 2 = 2 (x2 + 1)2 ( x1(x2 1) x2(x2 1) 2x2 ) E2 = 2 (x2 + 1)2 ( 2x1x2 (x1)2 (x2)2 + 1 2x2 ) x2E 1 x1E 2 = 2 x2 + 1 ( x2 x1 0 ).

The derivatives are

x1E1 = 4x1 x2 + 1E1 + 4 (x2 + 1)2 [p + 1 2(x2 + 1)(x1E 1 + x2E 2)] x2E1 = 4x2 x2 + 1E1 + 2 x2 + 1 [x2E 1 x1E 2] ,

and

x1E2 = 4x1 x2 + 1E2 2 x2 + 1 [x2E 1 x1E 2] x2E2 = 4x2 x2 + 1E2 + 4 (x2 + 1)2 [p + 1 2(x2 + 1)(x1E 1 + x2E 2)].

With the derivatives in this form, you should be able to calculate the Christoffel coefficients easily. For example, from

1 1 = 2x1 x2 + 1E1 + 2x2 x2 + 1E2

we read that

Γ111 = 2x1 x2 + 1,Γ112 = 2x2 x2 + 1.

For the other derivative of E1, the projection is trivial, and

Γ211 = 2x2 x2 + 1,Γ212 = 2x1 x2 + 1.

And from the derivatives of E2 we obtain:

Γ121 = 2x2 x2 + 1,Γ122 = 2x1 x2 + 1, Γ221 = 2x1 x2 + 1,Γ222 = 2x2 x2 + 1.

3.4 Parallel Transport

We began Section 3.3 with the motivation that we want to compare different tangent spaces to one another and a thought experiment about walking around the Earth. Then we went on to define covariant derivatives. Now it is time to connect the two (pardon the pun).

Definition 3.36. Let M be a manifold with a covariant derivative , α : (a,b) M a smooth curve and Y a vector field. We say that Y is parallel along α (with respect to ) if αY = 0 at all points on the curve.

The inspiration of the name parallel is that the vectors of the vector field at different points are meant to be (in some sense) parallel to one another. Phrased different: we have a field of parallel vectors. Even though α is not a vector field on M, this is well-defined due to Lemma 3.27. Similarly, we really only need to values of Y along the curve α to compute this condition, due to Lemma 3.28. Therefore many books build a theory of ‘vector fields on curves’. We will avoid this extra theory by assuming the main result: so long as the curve α is injective and not pathological, every vector field on α can be extended to a vector field on M.

In a chart we have α(t) = dαi dt ∂i, so the condition becomes

(3.37) 0 = (dαi dt Y k xi + Γijkdαi dt Y j) j = (dY k dt + Γijkdαi dt Y j) j,

where we treat the vector field as a function of t, i.e. Y (α(t)). Since Γijk and dαi dt are specified, we treat this as a system of ODEs for the functions Y i(t) : (a,b) . By the uniqueness of solutions to ODEs, a parallel vector field is uniquely determined by its value at one point of the curve. On the other hand the existence of solutions to ODEs ensures that given a vector v Tα(t0) there exists a unique parallel field Y along α with Y (t0) = v.

Let us make our thought experiment rigorous by using the tangent covariant derivative. We can expand the thought experiment in the following way: while we are walking around the world without turning, we are holding a stick. The stick represents a vector field along the curve of our journey. Suppose at the start of our journey, the stick is pointing south (recall we are facing east). As we walk east around the world, our stick will continue to point south. Thus we ask whether the vector field Y |p = (0,0,1) Tp𝕊2 is parallel with respect to along the equator α(t) = (cos t,sin t,0). Indeed it is, since Y is constant with respect to p,

αY = proj Tp𝕊2 (sin tY j p1 pj + cos tY j p2 pj + 0Y j p3 pj ) = 0.

Now what about the original thought experiment? This time as we walk around the world, let the stick point forward. Clearly, if we don’t turn, it should continue to point forward. In other words

Y = α(t) = (sin t,cos t,0). = (p2,p1,0).

This is not constant as a function into 3. Now when we compute

αY = proj Tp𝕊2 (sin t(1) p2 + cos t(1) p1 + 0) = proj Tp𝕊2 (p) = 0.

We see from the calculation that the derivative of Y along the curve points towards the center of the sphere, so when projected to the tangent plane it becomes zero. In summary, parallel transport by on the sphere matches our intuition of ‘walking without turning’. Of course there are many other covariant derivatives on the sphere, and with respect to them perhaps these two vector fields are not parallel.

Example 3.38 (3-Sphere). Let us consider the covariant derivative L on 𝕊3 from Example 3.29. We noted there that left-invariant vector fields have L-derivative zero at any point and in any direction. Hence left-invariant fields are parallel along every curve in 𝕊3.

Conversely, suppose Y is parallel along α. It follows from the definition of L that tTα(t)Lα(t)1Y (α(t)) is constant. In words, if we consider Y as a function of t, ie Y (α(t)) and move the vectors to e using the tangent map of the left action, ie Tα(t)Lα(t)1, then this function is constant. Though we don’t have a formal definition, it is fair to say that Y is left-invariant along the curve.

The final observation for this example is that given any vector w Tp𝕊3 there is a unique left-invariant vector field Y with Y |p = w. Let v = TpLp1w. Then Y |p = pv is the field. Therefore there is a unique way to parallel transport any vector to any other point of 𝕊3. Manifolds with this property are called parallelisable. It is equivalent to having a trivial tangent bundle.

In the above example, we encountered the idea of taking a vector v at one point α(t0), finding a vector field Y with Y |α(t0) that is parallel along α, and in particular calculating the parallel vector at another point w = Y |α(t1). We call w the parallel transport of v along α. This is a function P(α)ts : Tα(t)M Tα(s)M called the parallel transport operator. Because the ODE is linear in Y , the parallel transport operator is linear: If Y is the parallel vector field with Y |α(t) = v and Y ~ is the parallel vector field with Y ~|α(t) = v~, then Y + Y ~ is also parallel and (Y + Y ~)|α(t) = v + v~. The same idea works with scaling v.

Some other properties of P(α)ts follow easily from its definition as the solution of an ODE. We have semi-group properties P(α)tt = id and P(α)su P(α)ts = P(α)tu. By the uniqueness of the solutions to ODEs, we have that P(α)ts is injective, and therefore an isomorphism of vector spaces. And so on.

Conversely, if one has the parallel transport operator for a curve α, the we can recover the covariant derivative in the direction α through the formula

α(0)Y = lim h0 1 h [P(α)h0Y | α(h) Y |α(0) ] Tα(0)M.

Exercise 3.39. Prove the above formula. Hint: Take a basis of Tα(0)M and parallel transport it along α. As a reward for solving this exercise, you may now use the word connection for a covariant derivative.

Exercise 3.40. Argue that parallel transport with respect to L on 𝕊3 is P(α)ts = α(t)α(s)1. If we insert this into the above equation we obtain

α(0)LY = α(0)lim h0 1 h [α(h)1Y | α(h) α(0)1Y | α(0) ].

Explain why this is the same formula as Example 3.29.

So intuitively the two approaches, covariant derivatives and parallel transport operators, are equivalent. The reason that it is difficult to start with parallel transport operators is that is tricky to characterise exactly when a set of linear functions between tangent spaces, one for every curve, correspond to a covariant derivative. Note our logic above: if we begin with a covariant derivative, then we have a parallel transport operator, and taking a limit we can recover the covariant derivative. But if you begin with a arbitrary set of operators, there is no guarantee that the limit will exist. You need to have some type of smooth dependence of P(α)ts on t and s. Further, what conditions should you impose on the dependence of P(α) on α such if two curves are tangent at a point, the above limit produces the same result. Hopefully, these questions give you an appreciation of the difficulty involved.

Special mention should go to Appendix B in Sharpe, which does start with the classical idea of rolling a plane (or another space) around on a surface and shows how that gives various modern structures on the manifold.

3.5 Torsion

In this section we discuss a quantity called torsion that is derived from a covariant derivative. There is a relation between the torsion of a connection and the torsion of a space curve, but we will not be explore it in this course3. Ultimately we will only be interested in covariant derivatives with zero torsion, so in a sense we are introducing it only to rule it out. Which brings us to the point: how should we motivate the definitions in this section without going deep into theory we will not use? We ask some natural questions and give some reasonable answers.

In euclidean space we have Schwarz’ theorem, also known as Clairaut’s theorem, that the partial derivatives with respect to different variables commute (for smooth functions among others). This result is embedded in the definition of the Lie bracket, where it was necessary to have the second order terms cancel. In fact sometimes the theorem is expressed as [∂i,∂j] = 0. So naturally we ask this question of the covariant derivative, but the answer is negative in general:

∂i∂j ∂j∂i = Γijk k Γjik k = (Γijk Γ jik ) k.

This leads to the following definition

Definition 3.41. We say that a covariant derivative is torsion-free (in some chart) if ∂i∂j ∂j∂i = 0. Equivalently in terms of Christoffel coefficients, if Γijk = Γjik at every point.

In this first definition, torsion of a covariant derivative is a measure of the non-commutativity of coordinate vector fields. It seems natural therefore that this should depend on the choice of chart as much as the covariant derivative. But if you have done Exercise 3.34, you may already know that if Γijk = Γjik at a point in one chart then it also holds at that point in any overlapping chart. We will return to this idea shortly.

Example 3.42 (Euclidean Space). We have n with one chart, and euc from Example 3.21. In Example 3.30 computed that the Christoffel coefficients are all zero. Thus this covariant derivative is torsion-free in this chart.

Example 3.43 (Stereographic Projection). In Exercise 3.35 you found all the Christoffel coefficients. Observe that they are symmetric in the lower two indices

Γ211 = 2x2 x2 + 1 = Γ121,Γ 212 = 2x1 x2 + 1 = Γ122.

This shows that the covariant derivative of 𝕊2 is torsion-free on UN. Since the torsion is a continuous function, it must also be zero at the north pole.

We have the expectation that the coordinate vector fields should commute, or that this is a desirable property, but we do not have that expectation for general vector fields X,Y . We find

XY Y X = Xi ∂i(Y j j) Y j ∂j(Xi i) = (XiY k xi + XiY jΓ ijk) k (Y jXk xi + Y jXiΓ jik) k = (XiY k xi Y jXk xi ) ∂k + XiY j (Γ ijk Γ jik) k = [X,Y ] + XiY j (Γ ijk Γ jik) k.

The meaning of this equation is that the ‘covariant derivative commutator’ of two vector fields is their Lie bracket plus a factor coming from the fact that the coordinate vector fields do not ‘covariantly commute’.

Definition 3.44. Given a covariant derivative , we define the torsion of two vector fields X,Y to be a third vector field

T(X,Y ) = XY Y X [X,Y ].

Remarkably the value of T(X,Y ) at any point p only depends on X|p,Y |p, with the formula

T(X,Y ) = XiY jT ijk kforTijk = Γ ijk Γ jik.

The definition of T(X,Y ) is in terms of three vector fields XY , Y X, and [X,Y ], so clearly is independent of charts. A covariant derivative is torsion-free if Tijk = 0, and so this too is independent of charts. The second formula is just a rearrangement of the calculation preceding the definition. We say that the second formula is remarkable because although T is defined using derivatives both of which depend on the local behaviour of vector fields, the torsion only depends on the pointwise values of the vector fields. Because the Lie bracket is an antisymmetric function of X,Y , so too is the torsion T(X,Y ) = T(Y,X).

Example 3.45 (3-Sphere). In this example we show that the torsion of the covariant derivative L on 𝕊3 from Example 3.29 is non-zero. The trick is to not work with coordinate vector fields, but rather work with left-invariant vector fields. Let E1|p = pi and likewise E2|p = pj,E3|p = pk denote the left-invariant vector fields that are obtained by pushing forward i,j,k Te𝕊3. We have already noted in Example 3.38 that vLEi = 0 for any vector v TpM.

Further at any point E1|p,E2|p,E3|p is a basis for Tp𝕊3. This means that every vector field X on 𝕊3 can be written as

X = X1E 1 + X2E 2 + X3E 3.

Thus Ei have similar properties to the coordinate vector basis field, except that they do not come from coordinates. A set of vector fields with this basis property is called a frame, but we will not explore this concept in generality. In this frame, the covariant derivative can be reckoned with

XLY = XL (Y jE j) = X(Y j)E j + Y j XLE j = X(Y j)E j.

Similarly the Lie bracket simplifies

[Ei,Y ] = [Ei,Y jE j] = Ei(Y j)E j + Y j[E i,Ej] [X,Y ] = [XiE i,Y ] = Xi[E i,Y ] Y (Xi)E i = XiE i(Y j)E j + XiY j[E i,Ej] Y jE j(Xi)E i = X(Y j)E j + XiY j[E i,Ej] Y (Xi)E i.

Together this yields

TL(X,Y ) = XLY Y LX [X,Y ] = X(Y j)E j Y (Xj)E j X(Y j)E j XiY j[E i,Ej] + Y (Xi)E i = XiY j[E i,Ej].

Thus the torsion comes down to the Lie brackets of this frame.

For this example we will evaluate [E1,E2]:

[E1,E2] = [pi,pj] = [p1 + p0i + p3j p2k,p2 p3i + p0j + p1k] = [p1 0 + p0 1 + p3 2 p2 3,p2 0 p3 1 + p0 2 + p1 3] = p1 2 + p0 3 + p3( 0) p2( 1) [ p2 1 p3( 0) + p0( 3) + p1 2 ] = 2p3 0 + 2p2 1 2p1 2 + 2p0 3 = 2E3.

We can generalise this argument; set i1 = i,i2 = j,i3 = k so that we can use index notation.

[Ei,Ej] = [p∂i,p∂j] = pii∂j pij∂i = p(iiij ijii).

When i = j, the quaternions commute and the bracket is zero (as expected). If they are not equal then the quaternions anti-commute. This gives [E2,E3] = 2E1 and [E3,E1] = 2E2. (There is in fact a close relationship between the Lie bracket of 𝕊3 and the cross product of 3).

Example 3.46 (3-Sphere). We can also ask for the torsion of R on 𝕊3. Of course we could do the same as the previous example, except using a right-invariant frame, and get a similar answer. But to make the two examples comparable, let us compute the torsion of R using the left-invariant frame Ei.

What changes about the calculation is that EiREj0. Instead we must generalise the calculation from Example 3.29:

Ei(p) = p∂ip = pii, EiRE j = (Ei(pijp1))p = (E i(p)ijp1 pi jp1E i(p)p1) p = pi iij pijii = [Ei,Ej].

The covariant derivative of an arbitrary vector field is

XRY = X(Y j)E j + XiY j EiRE j = X(Y j)E j + XiY j[E i,Ej].

Hence

TR(X,Y ) = XRY Y RX [X,Y ] = X(Y j)E j + XiY j[E i,Ej] Y (Xj)E j XiY j[E j,Ei] X(Y j)E j XiY j[E i,Ej] + Y (Xi)E i = XiY j[E i,Ej].

Thus the torsion of R is the negative of the torsion of L.

Recall Exercise 3.24 that given one connection we can create another by the addition of a vector valued function A(X,Y ). We can ask how the torsion of the new covariant derivative related to the torsion of the original. This follows easily, for ~ = + A,

T~(X,Y ) = ~ XY ~Y X [X,Y ] = XY + A(X,Y ) Y X A(Y,X) [X,Y ] = T(X,Y ) + A(X,Y ) A(Y,X).

Purely algebraically, for any function of two variables we can split it into a symmetric and antisymmetric parts

A(X,Y ) = 1 2 (A(X,Y ) + A(Y,X) ) + 1 2 (A(X,Y ) A(Y,X) ).

If A is already symmetric or antisymmetric, then it is just equal to its symmetric or antisymmetric part respectively and the other part is zero. Thus we can express the relationship of the torsions by the dictum “adding A to a covariant derivative adds twice the antisymmetric part of A to its torsion”. In particular, for any covariant derivative, we can absorb the torsion. This means we construct a new torsion-free covariant derivative ~ := 1 2T.

Example 3.47 (3-Sphere). We have just seen in Examples 3.45 and 3.46 that with respect to the left-invariant fields Ei the covariant derivatives are

EiLE j = 0 EiRE j = [Ei,Ej] TL(E i,Ej) = [Ei,Ej] TR(E i,Ej) = [Ei,Ej].

(Aside: the formula on the right makes it seem as if R and TR are equal. They are not in general, only for left-invariant vector fields. Remember: a covariant derivative has the product rule in Y , whereas the torsion is C-linear.)

If we absorb the torsion on these two connections we get the torsion-free connection

EiLCE j = 1 2[Ei,Ej] = EiLE j + 1 2[Ei,Ej] = EiRE j 1 2[Ei,Ej].

This fits nicely with Corollary 3.25, because LC can also be understood as the average of the left and right covariant derivatives: LC = 1 2L + 1 2R. I’ll give you one guess what the LC stands for!

We have seen now that for a torsion-free connection that the coordinate vector fields will ‘covariant commute’ but general vector fields will not.

Definition 3.48. A smooth family of curves is a function αs(t) : (𝜀,𝜀) × (a,b) M. By smooth family we mean that it is smooth in both variables s and t. We typically think of the main curves of the family tαs(t) for fixed s. But we also have the transverse curves, where we fix t and allow s to vary. We can write α(s,t) to emphasise this duality.

Therefore we have two vector fields: the tangents in the main direction and the tangents in the transverse direction. Well, this is not completely true as we do not really have vector fields because the curves may cross each other, giving multiple vectors at the same point. (Technically what we have is the pushforwards of two vector fields.) Regardless, for each value of (s,t) it makes sense to ask how the derivative sα is changing in comparison to tα.

Lemma 3.49 (Mixed Derivatives). Let be a torsion-free covariant derivative and α(s,t) : (𝜀,𝜀) × (a,b) M a smooth family of curves. Then sαtα = tαsα.

Proof. This is a purely computational proof. In a chart, the tangent vectors are

sα = αk ∂s ∂k,tα = αk ∂t ∂k.

Then

sαtα = (2αk ∂s∂t + Γijkαi ∂s αj ∂t )∂k, tαsα = (2αk ∂t∂s + Γijkαi ∂t αj ∂s )∂k.

By the symmetry of the Christoffel coefficients for torsion-free covariant derivatives, these are equal. □

We should comment about why the expression sαtα is well-defined even though the tangents do not necessarily form a vector field. We know that the direction of depends only on the pointwise value, so this is no issue. And for tα we need to know its values along a curve in the direction of sα, but this is exactly the meaning of partial derivative. So understood correctly, these expressions are valid. This is an instance where a fleshed out notion of ‘vector field on a curve’ would have been more precise, but hopefully you see that not much has been lost by skipping this concept.

3.6 The Levi-Civita connection

Let us once more return to the thought experiment of walking along the equator α(t) = (cos t,sin t,0) with our stick. We now understand that we are parallel transporting our stick. But consider the vector field Z(α(t)) = (0,0,cos 2t). To push the metaphor into silliness, it is an telescoping selfie stick that is lengthening and shortening. The vector field Z always points south, but it is not parallel according to definition. If we write Z = cos 2tY for Y (α(t)) = (0,0,1), a known parallel vector field, then

αZ = α(cos 2tY ) = d dt (cos 2t )Y + cos 2t αY = 2sin tcos tY 0.

This illustrates the point that parallel is about more than just direction, it also concerns length (which is unlike how we use the term in elementary geometry and linear algebra). Therefore, among the many covariant derivatives that exists on a Riemannian manifold, we are interested in those whose parallel transport preserves length and angle.

Let us now turn this intuition into a definition. Suppose M is a Riemannian manifold with metric g and that is a connection that preserves the lengths and angles of parallel transport vectors. For any curve γ, let X,Y be parallel fields along γ with respect to . This means that g(X,Y ) is a constant function along γ. For all smooth functions a,b, we must have

d dtg(aX,bY ) = d dt (abg(X,Y ) ) = da dtbg(X,Y ) + adb dtg(X,Y ) + ab d dtg(X,Y ) = g(aX,bY ) + g(aX,bY ) + 0.

On the other hand

γ(aX) = aX + a γX = aX.

Therefore we make the definition

Definition 3.50. A covariant derivative is called metric-compatible or a metric connection if for all vector fields X,Y,Z

Z (g(X,Y ) ) = g (ZX,Y ) + g (X,ZY ).

The choice to define this property using a third vector field Z instead of the tangent vector γ is purely a matter of style. The converse of the above argument is immediate: if X,Y are parallel along a curve γ then the right hand side is zero and thus g(X,Y ) is constant on the curve.

Example 3.51 (3-Sphere). We can show that the left and right covariant derivatives are compatible with the metric on 𝕊3 coming from 4. Write vector fields X = XiEi and Y = Y iEi with respect to the left-invariant basis fields from Example 3.45. By the property of quaternions that a b = Re āb we see that

Ei Ej = Re pii¯pij = Re i¯ip¯pij = Re i¯iij = ii ij,

since p 𝕊3 has unit length. In particular it is constant on all of 𝕊3. Additionally, the covariant derivatives of the Ei are zero in every direction. Therefore, similar to the calculation before the definition, we have

Z(X Y ) = Z(XiY jE i Ej) = Z(Xi)Y jE i Ej + XiZ(Y j)E i Ej = (Z(Xi)E i ) Y + X (Z(Y j)E j ) = (Z(Xi)E i + Xi ZLE i ) Y + X (Z(Y j)E j + Y j ZLE j ) = (ZLX ) Y + X ( ZLY ).

This shows that L is metric-compatible.

For R we can reuse some of this calculation. What changes is that ZREi may not be zero. Instead ZREi = Zk[Ek,Ei]. We need to prove a version of the cyclic property for the triple product (for vectors in 3 we have a (b × c) = b (c × a)):

[Ek,Ei] Ej + Ei [Ek,Ej] = Re (ikii iiik)¯ij + Re i¯i(ikij ijik) = Re (iiikij ikiiij iiikij + iiijik) = Re (ikiiij + iiijik) = 0.

This allows us to write

Z(X Y ) = (Z(Xi)E i ) Y + X (Z(Y j)E j ) + XiY jZk([E k,Ei] Ej + Ei [Ek,Ej]) = (Z(Xi)E i ) Y + X (Z(Y j)E j ) + Xi ZRE i Y + Y jX ZRE j = (ZRX ) Y + X ( ZRY ).

This proves that R is also metric-compatible.

It is useful to reduce the metric-compatibility condition to a condition on the Christoffel coefficients in some chart.

Lemma 3.52. Let a connection in some chart be described by the Christoffel coefficients Γijk. It is compatible with the metric if and only if

∂kgij = Γkilg lj + Γkjlg il (3.53) 

Proof. Notice that the formula for metric-compatibility is C-linear in Z, so it enough to show it holds for each coordinate basis vector. The following calculation is a set of equivalences:

∂k (g(X,Y ) ) = g (∂kX,Y ) + g (X,∂kY ) ∂k (XiY jg ij ) = g ((∂kXi + XlΓ kli) i,Y j j ) + g (Xi i,(∂kY j + Y lΓ klj) j ) ∂kXiY jg ij + Xi kY jg ij + XiY j kgij = (∂kXi + XlΓ kli)Y jg ij + Xi( kY j + Y lΓ klj)g ij XiY j kgij = XlΓ kliY jg ij + XiY lΓ kljg ij ∂kgij = Γkilg lj + Γkjlg il.

In other words, a covariant derivative is metric-compatible if and only if its Christoffel coefficients satisfy (3.53). □

The above equation seems to say that metric-compatibility is a rather strong condition. We know that there are n3 choices of smooth functions for the Christoffel coefficients, and counting the possible values for i,j,k gives n3 conditions. It is almost enough to guarantee uniqueness, but not quite, because the we get the same condition if we swap i and j. However, metric-compatibility and torsion-free are enough to ensure uniqueness. This result is given a rather impressive sounding name, though sometimes it is called a theorem and other times a lemma. We have our cake and eat it too:

Theorem 3.54 (Fundamental Lemma of Riemannian Geometry). On every Riemannian manifold there exists a unique metric-compatible torsion-free covariant derivative.

Proof. Our strategy for the proof is as follows. First we will establish the so-called Koszul formula. Uniqueness is then a direct consequence. To prove existence we will show that the Koszul formula defines a torsion-free metric-compatible covariant derivative in every chart. Since we already have uniqueness, we can conclude that these give a well-defined covariant derivative on the whole manifold.

The idea of the Koszul formula is to use the symmetries of the metric and the Lie bracket to get an expression with exact one covariant derivative. Begin with the metric-compatibility property and then use the fact that torsion is zero:

Z (g(X,Y ) ) = g (ZX,Y ) + g (X,ZY ) = g (ZX,Y ) + g (X,T(Z,Y ) + Y Z + [Z,Y ] ) = g (ZX,Y ) + g (X,Y Z ) + g (X,[Z,Y ] ).

Now write this equation two more times with the vector fields permuted

Y (g(Z,X) ) = g (Y Z,X ) + g (Z,XY ) + g (Z,[Y,X] ) X (g(Y,Z) ) = g (XY,Z ) + g (Y,ZX ) + g (Y,[X,Z] ).

Notice that of the six possible permutations, only ZX, Y Z and XY occur. This is a result of using the torsion-free property. Each of the three covariant derivatives occurs twice. Now, add any two equations and subtract the other. We will add the second and third and subtract the first, but it’s not important which you choose.

X (g(Y,Z) ) + Y (g(Z,X) ) Z (g(X,Y ) ) = 2g (Z,XY ) + g (Z,[Y,X] ) + g (Y,[X,Z] ) g (X,[Z,Y ] ).

If you like, you can clean this up a little, though the role each of the vector fields play in g (Z,XY ) is different, so there cannot be perfect symmetry in the formula. Here is a version I like:

2g (XY,Z ) = X (g(Y,Z) ) g (X,[Y,Z] ) + Y (g(X,Z) ) g (Y,[X,Z] ) Z (g(X,Y ) ) + g (Z,[X,Y ] )

This is the Koszul formula. Since the metric is non-degenerate, what it shows is that if there is a metric-compatible torsion-free covariant derivative then it can be calculated purely in terms of Lie brackets and inner products. Therefore we have established uniqueness.

For existence, it is possible to take the Koszul formula as the definition and directly check all the required properties. It is easier however to first reduce the Koszul formula to an expression in charts. Choose any chart and suppose that X = ∂i,Y = ∂j,Z = ∂k are coordinate vector fields. The Lie brackets are zero. We get

2g(Γijl l,∂k) = 2Γijlg lk = ∂igjk + ∂jgik ∂kgij.

If we view the left hand side in matrix notation rather than index notation, we see that to solve for Γ we need to invert the matrix G = (gij). There is a sneaky convention that the components of the inverse matrix use upper indices (gij) = G1. With this convention, the fact that these matrices are inverse can be written gijgjk = δki. In index notation, multiplying by the inverse matrix looks like Γijlglkgkm = Γijlδlm = Γijm. Thus we can write

(3.55) Γijm = 1 2gkm ( igjk + ∂jgik ∂kgij ).

So given a metric g, define a covariant derivative on this chart using this formula for the Christoffel coefficients. Exercise 3.33 tells us that this does indeed define a covariant derivative on this chart, but it remains to show that it is metric-compatible and torsion-free. Torsion-free is an easy because the above formula is symmetric in i and j. Using the Christoffel coefficients defined through the Koszul formula, we see that the condition of Lemma 3.52 satisfied:

2Γkilg lj + 2Γkjlg il = (∂kgij + ∂igkj ∂jgki ) + (∂kgji + ∂jgki ∂igkj ) = 2∂kgij.

Therefore the covariant derivative that we have defined in each chart is metric-compatible and torsion-free. As mentioned at the outset of the proof, it only remains to show that this definition in each chart agrees, but this follows due to uniqueness. □

We celebrate this result with more terminology. It honours the Italian mathematician Tullio Levi-Civita, who developed much of the ‘tensor calculus’ (covariant, contravariant, indices, etc). His name tricks many students (myself included) into thinking there are two mathematicians Levi and Civita. In response to being asked what he liked best about Italy, Einstein once said “spaghetti and Levi-Civita”.

Definition 3.56. The unique metric-compatible torsion-free covariant derivative on a Riemannian manifold is called the Levi-Civita connection or the Riemannian connection.

Example 3.57 (Euclidean Space). On any open subset of n with the dot product as metric, the Levi-Civita connection is euc from Example 3.21. Because its Christoffel coefficients are identically zero, obviously it is torsion-free and satisfies Equation (3.53) so is metric-compatible.

Example 3.58 (3-Sphere). A corollary of Lemma 3.52 is that the set of metric-compatible covariant derivatives has a affine structure. Corollary 3.25 shows us that the affine combination of covariant derivatives is again a covariant derivative. The Christoffel coefficients of the new covariant derivative is the same affine combination of Christoffel coefficients (Γt)ijk = (1 t)(Γ0)ijk + t(Γ1)ijk. Inserting this into Equation (3.53) shows that such an affine combination is also metric-compatible.

We saw in Example 3.51 that both L and R were metric-compatible. Therefore LC = 1 2L + 1 2R is metric-compatible. Additionally, we proved in Example 3.47 that it is torsion-free. Therefore LC really is the Levi-Civita connection for 𝕊3.

Another obvious example of a Levi-Civita connection would be on 𝕊2. Instead of proving it for the specific case, instead we generalise the construction to any Riemannian immersed submanifold.

Definition 3.59 (Tangent Connection). Let M N be an Riemannian immersed submanifold. We identify the manifold M with its image under the immersion to simplify the statement. Let N be the Levi-Civita connection of N. We define the tangent connection on M to be the covariant derivative

(XY )| p = proj TpMXNY.

This is definition extends the previous definition from Example 3.26 because euc is the Levi-Civita connection of n.

Theorem 3.60 (Gauss Formula). The tangent connection is the Levi-Civita connection of M.

Proof. The proof that it is in fact a covariant derivative is entirely similar to the corresponding statement in Example 3.26. We check the three properties of a covariant derivative:

fX+X~Y = proj TpMfX+X~NY = proj TpM (fXNY + X~NY ) = f XY + X~Y, X(Y + Y ~) = proj TpM (XNY + XNY ~) = XY + XY ~, X(fY ) = proj TpM (X(f)Y + XNY ) = X(f)Y + f XY,

using that Y is already tangent to M. You might observe that this part of the proof works for any covariant derivative and that we have not yet used the metric-compatibility or torsion-free of N.

For torsion-free, we need to know that if X,Y are tangent to M that [X,Y ] is too, even when we consider them as vector fields on N. To prove this fact requires a proper investigation of submanifolds, and the construction of a special chart on N that aligns with a chart on M. This is beyond the scope of this course, which has tried to avoid manifold theory as much as possible. We have seen an example of this phenomenon though: in Example 3.45 the Lie bracket of the Ei fields was again an Ei field. Assuming this result,

T(X,Y ) = XY Y X [X,Y ] = proj TpM (XNY Y NX [X,Y ]) = proj TpMTN(X,Y )

is zero. The generalised statement for arbitrary connections would be that the tangent connection is torsion-free iff the torsion of N is perpendicular to TpM at every point of M. Though we hadn’t defined it, one could also say iff TN lies in the normal bundle of M.

Lastly, we need to show that the tangent connection is metric-compatible. This is where we need to use that M is Riemannian immersed, so that the metric on M and the metric on N agree for tangent vectors to M.

Z (gM(X,Y ) ) = Z (gN(X,Y ) ) = gN ( ZNX,Y ) + gN (X, ZNY ) = gN ( proj TpMZNX,Y ) + gN (X,proj TpMZNY ) = gN ( ZX,Y ) + gN (X, ZY ) = gM ( ZX,Y ) + gM (X, ZY ).

To explain the working here a little, for any vector in TpN we can split it into a part in TpM and a part perpendicular to TpM. Because Y TpM, the inner product of Y with a vector perpendicular to TpM is zero. Thus we can go from the first to the second line.

Now we know that is a metric-compatible torsion-free covariant derivative on M. By the uniqueness in Theorem 3.54, it is the Levi-Civita connection. □

To close the chapter, we revisit the question “why torsion-free”? Our first answer was that it is a natural expectation, based on the commutativity of partial derivatives in the euclidean setting. Our second answer is that torsion-free is a matter of convenience:

1Lee Proposition 4.5

2See Lee Lemma 4.1 for a proof. We prove a stronger statement in Lemma 3.28.

3See the ‘American football example’ Lee Problem 6-1