TL;DR: The BAC-CAB vector identity is probably the most important vector identity, and has potentially important applications in introductory physics. I present six coordinate-free derivations of this identity. By “coordinate-free” I mean a derivation that doesn’t rely on any particular coordinate system, and one that relies on the inherent geometric relationships among the vectors involved.
I have been on a quest for a coordinate-free derivation of the ubiquitous BAC-CAB vector identity
for a long time. (Incidentally, we tell students to remember it as “BAC-CAB” but we rarely, if ever, write it that way.) The usual derivations involve either expanding both sides in Cartesian components and showing they are equal or using the Levi-Civita symbol and index notation to derive the identity. The former is tedious; the latter is elegant but lacks geometry. In my extensive web searches I stumbled onto a beautiful coordinate-free derivation cast in the language of differential forms. While I understand parts of it, I want to eventually completely understand it because it’s more along the lines of what I thought I was originally looking for, something analytical with not as much emphasis on geometry as the present derivation. Then I changed my mind and decided I wanted something based as much as possible on geometry after all, and I finally found several instead of just one.
Remember that I’m writing this as I would explain it to introductory physics students, so I will try to emphasize fine points that may otherwise go unnoticed. I assume the reader has previously been introduced to dot products and cross products. In future posts, I will address how to introduce these two concepts and I decided not to do that here.
Derivation I: A Derivation Based on Index Notation
I didn’t invent this derivation, which mixes tensor index notation and traditional symbolic notation, the Levi-Civita symbol, and explicitly includes basis vectors, which are usually left out of such derivations. I include them here anticipating future posts. In my opinion, this derivation is best described in these notes by Ben-Yaacov and Roig. It is efficient, but it is not geometric in nature. In fact, I think it inherently hides the underlying geometry but I also think it’s a valuable derivation to know.
Recall that in index notation, vectors are represented as components (coefficients) multiplying basis vectors, summed over all such pairs. and . The dot product of these two vectors would be notated as . The cross product of these two vectors would be notated as . The dot product of two orthonormal basis vectors would be notated as . Finally, the cross product of two orthonormal basis vectors would be notated as . For the purposes of this post, I will assume the reader is already familiar with the Levi-Civita symbol and its properties, the Einstein summation convention, and other aspects of index notation.
Here is the derivation.
Here is a description for each step.
- Write the righthand side in index notation, showing only the innermost cross product written with the Levi-Civita symbol. Note that there is no free index in this expression. All the indices are dummy indices.
- Rearrange the righthand side to bring the Levi-Civita symbol to the leftmost position and the cross product of two basis vectors to the rightmost position. This is legal because each factor is simply a scalar, a real number, and thus the entire righthand side, excluding the remaining cross product, is commutative. Note that every index is repeated, and therefore is a dummy index.
- Rewrite the basis vector cross product in terms of a second Levi-Civita symbol. Pay particular attention to the names of the indices.
- Rearrange the righthand side to bring the second Levi-Civita symbol to the right of the first one. Keep the remaining basis vector as the end of the line.
- Rewrite the products of the two Levi-Civita symbols as the difference of two products of Kronecker deltas. The formal logic behind this step is very confusing, and a couple years ago I invented a way to do it quickly, a way that I have never seen in the literature. It is inspired by a strategy shown to algebra students for multiplying two binomials, the FOIL method. FOIL is an acronym for First, Outer, Inner, Last which indicates the combinations of terms to be multiplied and in what order. I understand that many mathematics instructors frown on the use of the FOIL method becuase it’s a shortcut that removes the underlying reasoning. Still, I will show my quick way and let the reader decide on its appropriateness. I call this the FLOI method, a name I remember by thinking about “Floyd the barber” from my all time favorite TV program The Andy Griffith Show. The FLOI method calls for finding the dummy index in the Levi-Civita symbol product. Relative to that index, identify the First, Last, Outer, and Inner indices as in multiplying binomials in algebra. Now, for each of First, Last, Outer, Inner write a Kronecker delta with those corresponding indices, remembering to subtract the second two from the first two. There is where the sign first appears.
- Use the distributive property to expand the righthand side.
- The next step is to reorder the factors in each term to allow the Kronecker deltas to do their job, which is to pick out an index that survives the underlying summation. However, there is a problem. How do we know which vector to associate with each Kronecker delta? Look carefully at the indices. To associate a component with a Kronecker delta, the component must share an index with the Kronecker delta. Otherwise, the Kronecker delta can’t do its job. Thus, the first Kronecker delta can be associated with or ; we choose the latter. The second Kronecker delta can therefore be associated either of the remaining components because the Kronecker delta’s action will automatically turn one component’s index into the other component’s index; we choose . This is very cool! Either way, you’ll end up with a dummy index that will go away when we write the final result in vector notation. So the third Kronecker delta must be associated with or ; we choose the latter. Finally, the fourth can be associated with with or ; we again choose the latter. I arbitrarily write each Kronecker delta to the immediate right of the factor on which it operates. This is merely a convention, one I have not seen addressed in the literature. Feel free to ignore it.As I was writing this, I realized another way to think about this step. The combination is just the vector or . Of course it doesn’t matter which index you use because it’s a dummy index. The combination is the dot product or . Again, it doesn’t matter which index you use because it will end up being a dummy index. I added parentheses for clarity in associating factors.
- Let the Kronecker deltas do their job on the indices of the components immediately to the left of each delta. You end up with two dummy indices on each side of the subtraction sign. It just works. At first, there appears to be an error because you end up with one dummy index used twice on the righthand side, but there is no error because each use is restricted to one term. I added parentheses for clarity in associating factors.
- Rewrite the righthand side in full vector notation. Remember that two adjacent components with the same index constitute a dot product, and a component adjacent to a basis vector with the same index constitutes a vector. Also, in LaTeX I recommend using \bullet to indicate dot products rather than \cdot because the latter also indicates scalar multiplication of real numbers or variables that represent them algebraically. The parentheses aren’t required because the dot product is unambiguous, but they’re traditionally included.
In preparation for more formal work in which dummy indices must occur in upper-lower pairs, one could rewrite this derivation to make the vector components have upper (contravariant) indices and the basis vectors have lower (covariant) indices, with appropriate adjustments to the Levi-Civita symbols’ indices. I may modify this post to reflect that sometime in the future.
Derivation II: A Derivation Based on Geometry
I did not invent this derivation either. Indeed I found it here, which is part of a much larger online text. It is sufficiently clever that I think it should be more widely seen and I use to think it would be appropriate for an introductory physics courses (provided vectors are introduced more carefully than usual, with an extreme emphasis on coordinate-free geometry) but now I’m not so sure because I’ve run into a problem with it. Nevertheless, several aspects of the derivation pique my interest.
- It relies on the fact that any vector can be resolved into components parallel to, and perpendicular to, another vector.
- Given the parallel and perpendicular components relative to another vector, dot products and cross products can be thought of in a slightly different way that I’d never realized. can be written as and finally as because the perpendicular component doesn’t survive the dot product. Similarly, can be written as and finally as because the parallel component doesn’t survive the cross product. These truths are so obvious that I don’t recall noticing them before now, and that bothers me.
- I exploit the fact that a vector can be “factored” into a magnitude and direction: . This isn’t really as amazing as it may seem, because it’s (almost) the same thing as saying that the vector can be expressed as the sum of products of corresponding components and basis vectors. Nevertheless, I emphasize this property here because it prevents the misunderstanding that kept me from being able to reproduce this proof as I describe below.
- Almost all of the derivation takes place in a plane and is easy to visualize.
- Note the author refers to as a double cross product rather than the usual triple cross product. This an entirely intuitive name because there are two cross products involved, not three. There are indeed three operands, and I get why that is used to name the quantity. Still, I prefer the new term even thought I’ve been admonished before that I’m not allowed to invent new and better names without “the community’s permission.” Well, if I appeal to the elitism from which my “warning” came, then I in turn appeal to it again in adopting the same terminology as that used at one of the most elite physics schools on the planet. *Charles Emerson Winchester smirk*
Because this derivation is inherently geometric, I think it is best presented operationally, as a sequence of steps that can be carried out either on paper or better yet in VPython or GlowScript. I will come back and add links to either a GlowScript or Trinket app that illustrates the derivation.
Here is the derivation, which assumes no two vectors are collinear. As in the previous derivation, I will show the mathematical steps first and then the corresponding description. For clarity, I show many more intermediate steps than the original source shows.
- Let be the projection of onto the plane containing and . Let and be vector components of relative to . Substitute for and for into the original expression. Then “factor” each vector into a magnitude and direction. Simplify the resulting expression, noting that it is “naturally” factored into a magnitude (scalar product of three magnitudes actually) and a direction (double cross product of three directions actually).
- Resolve the direction of into vector components along and .
- Construct a vector orthogonal to using the same components from the previous step by interchanging them and negating one of them. Geometry dictates which one to negate. Draw a diagram! This is where the negative is first introduced. Note that the magnitude of this newly constructed vector is the same as that of . Geometrically, we’re merely rotating by ninety degrees in the appropriate direction.
- In the final expression from step (1), replace the direction of with that of . You should now be able to distribute the magnitudes on the righthand side to get an expression in terms of vectors rather than magnitudes and directions.
- Exploit the fact that and have the same direction, and therefore the projection of onto either will be the same, and therefore their difference will be zero. Note the order of the resulting subtraction, foreshadowing the final result.
- Add the final results of steps (4) and (5), which is nothing more than adding zero to both sides. Adding zero changes nothing algebraically, but in this case it allows for some algebra to take place.
- Distribute the magnitudes on the lefthand side. Replace every occurance of with on both sides and with on the lefthand side, remembering that this doesn’t change the outcome as proven in step (1). The identity is proven.
This derivation gave me a lot of difficulty initially. For some reason, I got it into my head that had both the same magnitude and direction as when actually the two only have the same direction. I don’t know how many hours I wasted trying to reconcile the resulting discrepancies before I saw the error in my reasoning.
Derivation III: Another Derivation Based on Geometry
This derivation is also geometric in nature, but conceptually simpler than the previous derivation. It is slightly more algebraic than geometric though. I found it on pages 19 and 20 of this excellent textbook. In my printing, there is a typo in equation (8). should be . This derivation rotates one of the vectors to exploit some geometry, and also exploits the properties of the mixed product (aka triple scalar product, an illogical name if ever there were one). This derivation could be modified to project onto the plane containing the other two vectors, as in the previous derivation, but the authors chose not to do so. It is again assumed that no two vectors are collinear.
- Reason that the final result must be a linear combination of and .
- Rotate by clockwise in the plane containing and and call the resulting vector . Dot each side with .
- Exploit the cyclic property of the mixed product, and show that the intermediate quantity (also incidentally a double cross product) involving can be written as a multiple of . Notice the geometry exploited in steps (3c) and (3d). That’s the utility of using .
- Dot both sides of (3f) with .
- Compare (2b) and (4) to solve for .
- Dot both sides of (1) with , exploit the properties of the mixed product, substitute for , and solve for .
- Substitute into (1) and the identity is proven.
Derivation IV: A Straightforward Algebraic Derivation
This derivation is important because it is found in the definitive vector analysis textbook, that of Wilson, which is based on Gibbs’ notes. Consisting of two parts, the first part establishes an identity used in the second part while the second part calculates the result as a linear combination of , , and .
To begin the first part, resolve into a component parallel to and a component perpendicular to . Let be the angle between (or ) and .
Now for the second part.
Derivation V: Gibbs’ Other Derivation
This derivation, also found in Wilson, is also due to Gibbs. It is very similar to the next deriation but I include it here for completeness. An essentially identical version is found on pages 7 and 8 of Tai’s excellent book.
Derivation VI: A Derivation Based on Linearity
This is the slickest derivation, but certainly but not the least geometric. It is coordinate-free and framed in the spirit of MTW’s approach and for that reason I think this should be the first derivation students see. It requires a slightly different approach to vectors, as you will see. Although it closely resembles the previous derivation, I will repeat those steps here using Tai’s notation.
- Begin by assuming the result is a linear combination of and . Dot each side with , and treat the lefthand side as a triple scalar product. It must be zero because it contains one vector twice. Therefore, the righthand side must also be zero. Now we can relate the two coefficients. I arbitrarily chose to solve for for no particular reason. You could also solve for .
- Substitute the expression for into the original equation. Factor out both and and combine them into one constant, .
- Here is where the most interesting part of the derivation happens. The double cross product on the lefthand side is linear in each vector argument. The dot products and scalar products on the righthand side are also linear in each vector argument. This is nothing more than a formal way of saying that if we replace any one of , , or by , , or where is a scalar (real number for our purposes) the entire expression is also scaled by that same . The fact that we get the same scaled expression regardless of which vector we replace implies that the entire expression cannot depend on our choices for , , and . Thus, to evaluate we can use any vectors we want in the lefthand and righthand sides. Let’s make things simple for us, but let’s also not choose a coordinate system. Let’s temporarily assume that , , and are mutually orthogonal. Let’s also let . Now we can evaluate both the lefthand and righthand sides for these arbitrarily, but also strategically, chosen input vectors.
- The result from the previous step allows us to deduce that .
- The identity is proven.
Writing this post took the better part of four months, mostly because in the process of recreating the various derivations I stumbled onto some very interesting problems. In a corollary post to this one I will highlight one or more of these problems. I have looked at this post so many times that I probably have some existing typos and maybe even some notation errors. Just let me know if you find any and I will fix them. I also need to add annotations for derivations IV and V. I just really want to get this post published since it’s taken so long.
As always, feedback and comments are welcome.