In this chapter, we will see some more set-theoretic constructs, but we will also introduce their category-theoretic counterparts in an effort to gently introduce the concept of a category itself.
When we are finished with that, we will try, (and almost succeed) to define categories from scratch, without actually relying on set theory.
In the previous chapter, we needed a way to construct a set whose elements are composite of the elements of some other sets e.g. when we discussed mathematical functions, we couldn’t define $+$ and $-$ because we could only formulate functions that take one argument. Similarly, when we introduced the primitive types in programming languages, like Char
and Number
, we mentioned that most of the types that we actually use are composite types. So how do we construct those?
The simplest composite type, of the sets $B$, that contains $b$’s and the set $Y$, that contains $y$’s is the Cartesian product of $B$ and $Y$, that is the set of ordered pairs that contain one element of the set $Y$ and one element of the set $B$. Or formally speaking: $Y \times B = { (y, b) }$ where $y ∈ Y, b ∈ B$ ($∈$ means “is an element of”).
It is denoted $B \times Y$ and it comes equipped with two functions for retrieving the $b$ and the $y$ from each $(b, y)$.
Question: Why is this called a product? Hint: How many elements does it have?
The concept of Cartesian product was first defined by the mathematician and philosopher René Descartes, as a basis for the Cartesian coordinate system, which is why both concepts are named after him (although it does not look like it, as they use the Latinized version of his name).
Most people know how a Cartesian coordinate system works, but an equally interesting question, which few people think about, is how we can define it using sets and functions.
A Cartesian coordinate system consists of two perpendicular lines, situated on a Euclidean plane and some kind of mapping that resembles a function, connecting any point in these two lines to a number, representing the distance between the point that is being mapped and the lines’ point of overlap (which is mapped to the number $0$).
Using this construct (as well as the concept of a Cartesian product), we can describe not only the points on the lines but any point on the Euclidean plane. We do that by measuring the distance between the point and those two lines.
And, since the point is the main primitive of Euclidean geometry, the coordinate system allows us to also describe all kinds of geometric figures such as this triangle (which is described using products of products).
So we can say that the Cartesian coordinate system is some kind of function-like mapping between sets of products of numbers and geometric figures that correspond to these numbers, using which we can derive some properties of the figures using the numbers (for example, using the products in the example below, we can compute that the triangle that they represent has a base of $6$ units and height of $5$ units.
What’s even more interesting is that this mapping is one-to-one, which makes the two realms isomorphic (traditionally we say that the point is completely described by the coordinates, which is the same thing).
With that, our effort to represent Cartesian coordinates with sets is complete, except we haven’t described the mapping that connects points and numbers. The mapping resembles a function and exhibits all relevant properties of a function (many-to-one mapping (or one-to-one in this case)), however, as we said, functions are mappings between sets and in this case, even if we can think of the coordinate system as a set (of points and figures), geometrical figures are definitely not a set, as they have a lot of additional things going on (or additional structure, as a category theorist would say). So, defining this mapping formally would require us to also formalize both geometry and algebra, and moreover to do so in a way in which they would be compatible with one another. These are some of the ambitions of category theory and this is what we will attempt to do later in this book.
But before we continue with that, let’s see some other neat uses of products.
In the previous chapter, we established the correspondence of various concepts in programming languages and set theory — sets resemble types, and functions resemble methods/subroutines. This picture is made complete with products, that are like stripped-down classes (also called records or structs) — the sets that form the product correspond to the class’s properties (also called members) and the functions for accessing them are like what programmers call getter methods e.g. the famous example of object-oriented programming of a Person
class with name
and age
fields is nothing more than a product of the set of strings, and the sets of numbers. And objects with more than two values can be expressed as products the composites of which are themselves products.
Products can also be used for expressing functions that take more than one argument (and this is indeed how multi-param functions are implemented in languages that actually have tuples, like the ones from the ML family ). For example, “plus” is a function from the set of products of two numbers to the set of numbers, so, $+: \mathbb{Z} \times \mathbb{Z} → \mathbb{Z}$.
By the way, such functions (ones that take two objects of one type and return a third object of the same type) are called operations.
A product is, as we said, a set of ordered pairs (formally speaking $A \times B ≠ B \times A$). So, to define a product we must define the concept of an ordered pair. So how can we do that?
Note that an ordered pair of elements is not just a set containing the two elements (that would be an _ unordered pair_) but it also contains information about which of those objects comes first and which one goes second in the pair. In programming, we have the ability to assign names to each member of an object, which accomplishes the same purpose.
The order of elements in the pair is important. While some mathematical operations (such as addition) don’t care about order, others (such as subtraction) do. In programming, when we manipulate an object we obviously want to access a specific property of the object, not just any random property.
So does that mean that we have to define ordered pairs as a “primitive” type like we defined sets if we want to use them? That’s possible, but there is another approach if we can define a construct that is isomorphic to the ordered pair, using only sets, we can use that construct instead of them. And mathematicians have come up with multiple ingenious ways to do that. Here is the first one, which was suggested by Norbert Wiener in 1914. Note the smart use of the fact that the empty set is unique.
The next one was suggested by Felix Hausdorff in the same year. In order to use that one, we just have to define $1$, and $2$ first.
Suggested in 1921 by Kazimierz Kuratowski, this one uses just the component of the pair.
In the product definitions presented in the previous section worked by zooming in into the individual elements of the product and seeing what they can be made of. We may think of this as a low-level approach to the definition. This time we will try to do the opposite — we will try to be as oblivious to the contents of our sets as possible i.e. instead of zooming in we will zoom out, we will attempt to fly over the difficulties that we met in the previous section and provide a definition of a product in terms of functions and external diagrams.
To define products in terms of external diagrams, we must, given two sets, devise a way to pinpoint the set that is their product, by looking at the functions that come from/to them. To achieve that, we must first ask ourselves what functions are defined for all products. The answer: we have two of those — the functions for retrieving the two elements of the pair (the “getters”, so to say).
Formally, if we have a set $G$ which is the product of sets $Y$ and $B$, then we should also have functions which give us back the elements of the product, so $G → Y$ and $G → B$. Now let’s switch to the external view.
This diagram already provides a definition of what a product is, but not a complete definition, as the product of sets $Y$ and $B$, is not the only set for which such functions can be defined. For example, a set of triples of $Y \times B \times R$ for any element $R$ also qualifies. And if there is a function from $G$ to $B$ then the set $G$ itself meets our condition for being the product, because it is connected to $B$ and to itself.
However, the set of triples $Y \times B \times R$ is connected to $Y$ and $B$ only because it can be converted to $Y \times B$: the arrow $Y \times B \times R \to B$ is just the composition of the arrow $Y \times B \times R \to Y \times B$ and the arrow $Y \times B \to B$. The same reasoning applies to all other objects that can take the place of our product.
(Intuitively, all such objects would be more complex than the pair objects and you can always have a function that converts a more complex structure, to a simpler one (we saw an example of this when we covered the functions that convert subsets to their supersets).)
More formally, if we suppose that there is a set $I$ that can serve as an impostor of the product of sets $B$ and $Y$ (i.e. that $I$ is such that there exist two functions, which we will call $ib: I → B$ and $iy: I → Y$ that allow us to derive elements $B$ and $Y$ from it), then there must also exist a unique function with the type signature $I → B \times Y$ that converts the impostor to the real product, and $ib$ and $iy$ are just results of composing that function with the usual “getter” functions that go from the pair to the elements. In other words, whichever object we pick for $I$, this diagram would commute).
In category theory, this type of property that a given object might possess (participating in a structure such that all similar objects can be converted to/from it) is called a universal property, but we won’t go into more detail, as it is a bit early (after all we haven’t even yet said what category theory is).
One thing that we should point out is that this definition (as all the previous ones, by the way) does not rule out the sets which are isomorphic to the product — when we represent things using universal properties, an isomorphism is treated as equality. This is the same viewpoint that we adopt in programming, especially when we work on the higher level — there might be many different implementations of pair, but as long as they work in the same way (i.e. we can convert one to the other and vice versa) they are all the same for us).
We will now study a construct that is pretty similar to the product but at the same time is very different. Similar because, like the product, it is a relation between two sets which allows you to unite them into one, without erasing their structure. But different as it encodes a quite different type of relation — a product encodes an and relation between two sets, while the sum encodes an or relation.
The sum of two sets $B$ and $Y$, denoted $B + Y$ is a set that contains all elements from the first set combined with all elements from the second one.
We can immediately see the connection with the or logical structure: For example, because a parent is either a mother or a father of a child, the set of all parents is the sum of the set of mothers and the set of fathers, or $P = M + F$.
As with the product, representing sums in terms of sets is not so straightforward e.g. when a given object is an element of both sets, then it appears in the sum twice which is not permitted, because a set cannot contain the same element twice.
As with the product, the solution is to put some extra structure.
And, as with the product, there is a low-level way to express a sum using sets alone. Incidentally, we can use pairs.
As you might already suspect, the interesting part is expressing the sum of two sets using functions. To do that, we have to go back to the conceptual part of the definition. We said that sums express an or relation between two things.
A property of every or relation is that if something is an $A$ that something is also an $A \vee B$, same for $B$ (The $\vee$ symbol means or by the way). For example, if my hair is brown, then my hair is also either blond or brown. This is what or means, right? This property can be expressed as a function, two functions actually — one for each set that takes part in the sum relation (for example, if parents are either mothers or fathers, then there surely exist functions $mothers → parents$ and $fathers → parents$).
As you might have already noticed, this definition is pretty similar to the definition of the product from the previous section. And the similarities don’t end here. As with products, we have sets that can be thought of as impostor sums — ones for which these functions exist, but which also contain additional information.
All these sets express relationships which are more vague than the simple sum, and therefore given such a set, there would exist a unique function that would distinguish it from the true sum. The only difference is that, unlike the functions that define products, this time this function goes from the sum to the impostor.
The concepts of product and sum might already look similar in a way when we view them through their internal diagrams. The external view makes this similarity precise — these two diagrams are one and the same diagram, only their arrows are flipped — many-to-one relationships become one-to-many and the other way around.
The universal properties that define the two constructs are the same as well — if we have a sum $Y + B$, for each impostor sum, such as $Y + B + R$, there exists a trivial function $Y + B \to Y + B + R$.
And, if you remember, with products the arrows go the other way around — the equivalent example for a product would be the function $Y \times B \times R \to Y \times B $
This fact uncovers a deep connection between the concepts of the product and sum, which is not otherwise apparent — they are each other’s opposites. Product is the opposite of sum and sum is the opposite of product.
In category theory, concepts that have such a relationship are said to be dual to each other. So, the concepts of product and sum are dual. This is why sums are known in a category-theoretic setting as converse product, or coproduct for short. This naming convention is used for all dual constructs in category theory.
Now let’s look at how the concepts of product and sum from the viewpoint of logic. We mentioned that:
When we view those sets as propositions, we discover the concept of the product ($\times$) corresponds exactly to the and relation in logic (denoted $\land$). From this viewpoint, the function $Y \times B \to Y$ can be viewed as an instance of a logical rule of inference called conjunction elimination (also called simplification) stating that, $Y \land B \to Y$, for example, if your hair is partly blond and partly brown, then it is partly blond.
By the same token, the concept of a sum ($+$) corresponds to the or relation in logic (denoted $\lor$). From this viewpoint, the function $Y \to Y + B$ can be viewed as an instance of a logical rule of inference called disjunction introduction, stating that, $Y \to Y \lor B$ for example, if your hair is blond, it is either blond or brown.
This means, among other things, that the concepts of and and or are also dual — an idea which was put forward in the 19th century by the mathematician Augustus De Morgan and is henceforth known as De Morgan duality, and which is a predecessor to the modern idea of duality in category theory, that we examined before.
The duality of $\land$ and $\lor$ can be seen in the two formulas that are most often associated with De Morgan which are known as De Morgan laws (although De Morgan didn’t actually discover those (they were previously formulated, by William of Ockham (of “Ockham’s razor” fame, among other people)).
$\neg(A \wedge B) = \neg{A} \vee \neg{B}$
$\neg(A \vee B) = \neg{A} \wedge \neg{B}$
You can read the second formula as, for example, if my hair is not blond or brown, this means that my hair is not blond and my hair is not brown, and vice versa (the connection works both ways)
Now we will go through the formulas and we will try to show that they are actually a simple corollary of the duality between and and or
Let’s say we want to find the statement that is opposite of “blond or brown”.
$A \vee B$
The first thing we want to do is, to replace the statements that constitute it with their opposites, which would make the statement “not blond or not brown”
$\neg{A} \vee \neg{B}$
However, this statement is clearly not the opposite of “blond or brown”(saying that my hair is not blond or not brown does in fact allow it to be blond and also allows for it to be brown, it just doesn’t allow it to be both of these things).
So what have we missed? Simple: we replaced the propositions with their opposites, but we didn’t replace the operator that connects them — it is still or for both propositions. So we must replace it with or converse. As we said earlier, and as you can see by analyzing this example, this operator is and So the formula becomes “not blond and not brown”.
$\neg{A} \wedge \neg{B}$
Saying that this formula is the opposite of “blond and brown” — is the same thing as saying that it is equivalent to its negation, which is precisely what the second De Morgan law says.
$\neg(A \vee B) = \neg{A} \wedge \neg{B}$
And if we “flip” this whole formula (we can do that without changing the signs of the individual propositions, as it is valid for all propositions) we get the first law.
$\neg(A \wedge B) = \neg{A} \vee \neg{B}$
This probably provokes a lot of questions and we have a whole chapter about logic to address those. But before we get to it, we have to see what categories (and sets) are.
So far in the book, we saw some amazing ways of defining set-theoretic constructs without looking at the set elements and by only using functions and external diagrams.
In the first chapter, we defined functions and functional composition with this diagram.
And now, we also defined products and sums.
What’s even more amazing, is that we can define all of set-theory, based just on the concept of functions, as discovered by the category theory pioneer Francis William Lawvere.
Traditionally, everything in set theory is defined in terms of two things: sets and elements, so, if we want to define it using sets and functions, we must define the concept of a set element in terms of functions.
To do so, we will use the singleton set.
OK, let’s start by taking a random set which we want to describe.
And let’s examine the functions from the singleton set, to that random set.
It’s easy to see that there would be exactly one function for each element of the set i.e. that each element of any set $X$ is isomorphic to a function \(1 \to X\).
So, we can say that what we call “elements” of a set are the functions from the singleton set to it.
Now, after coming up with a definition of a set element, based on functions, we can try to draw the elements of our set as an external diagram.
However, our diagram is not yet fully external, as it depends on the idea of the singleton set, i.e. the set with one element. Furthermore, this makes the whole definition circular, as we cannot define the concept of a one-element set, without the concept of element
To avoid these difficulties, we devise a way to define the singleton set, using just functions. We do it in the same way that we did for products and sums - by using a unique property that the singleton set has. In particular, there is exactly one function from any other set to the singleton set i.e. if $1$ is the singleton set, then we have $\forall X \exists! X \to 1$.
It turns out that this property defines the singleton set uniquely i.e. there is no other set that has it, other than the sets that are isomorphic to the singleton set. This is simply because, if there are two sets that have it, those two sets would also have unique morphisms between themselves so they would be isomorphic to one another. More formally, if we have two sets $X$ and $Y$, such that $\exists!X \to 1 \land \exists!Y \to 1$ then we also have $X \cong Y$.
And because there is no other set, other than the singleton set that has this property, we can use it as a definition of the singleton set and say that if we have $\forall X \exists! X \to 1$, then $1$ is the singleton set.
With this, we acquire a fully external definition (up to an isomorphism) of the singleton set, and thus a definition of a set element — the elements of a given set are just the functions from the singleton set to that set.
Note that from this property it follows that the singleton set has exactly one element, which confirms that our definition is correct.
Question: Why exactly does it follow (check the definition)?
The empty set is the set that has no elements, but how would we say this without referring to elements?
We said that there exists a unique function that goes from the empty set to any other set. But the reverse is also true: the empty set is the only set such that there exists a function from it to any other set.
Observant readers will notice the similarities between the diagrams depicting the initial and terminal object (yes the two concepts are, of course, dual of each other).
Some even more observant readers may also notice the similarities between the product/coproduct diagrams and the initial/terminal object diagrams.
To them, I would like to say: Folks, keep it down please, you are too observant — we have, like, 4 chapters to go until we get to this.
After seeing the functional definition of set elements, we might be inclined to ask the following: If elements are represented by functions, then how do you apply a given function to an element of a set, (and retrieve an element of another set)?
The answer is surprisingly simple — selecting an element from a set is the same as constructing a function from the singleton set to that element.
And then applying a function to an element is the same as composing the element function, with the function we want to apply.
The result is the function that represents the element that is the result of the application.
If we had more time, we would cover the entirety of Lawvere’s Elementary Theory of the Category of Sets (or ETCS for short), and list all concepts and axioms that are required to define a rigorous set theory using functions, but this is enough for you to get the main idea: that these axioms constitute a definition of set theory, that is based entirely on functions. This is a key idea, but there is an even bigger thing there: because it is more general than the traditional definition, this new definition also applies to objects that are not exactly sets but are like sets in some respects.
You may say that they apply to entirely different categories of objects.
Maybe it is about time to see what a category is. We will start with a short definition — a category consists of objects (an example of which are sets) and morphisms that go from one object to another (which behave as functions) and that are composable. We can say a lot more about categories, and even present a formal definition, but for now, it is sufficient for you to remember that sets are one example of a category and that categorical objects are like sets, except that we don’t see their elements. Or to put it another way, category-theoretic notions are captured by the external diagrams, while strictly set-theoretic notions can be captured by internal ones.
When we are within the realm of sets, we can view each set as a collection of individual elements. In category theory, we don’t have such a notion. However, taking this notion away allows us to define concepts such as the sum and product sets in a whole different and more general way. Plus we always have a way to “go back” to set theory, using the tricks from the last section.
Category Theory | Set theory | Programming Languages |
---|---|---|
Category | N/A | N/A |
Objects and Morphisms | Sets and Functions | Classes and methods |
N/A | Element | Object |
Notice the somehow weird, (but actually completely logical) symmetry (or perhaps “reverse symmetry”) between the world as viewed through the lenses of set theory, and the way it is viewed through the lens of category theory:
Category Theory | Set theory |
---|---|
Category | N/A |
Objects and Morphisms | Sets and functions |
N/A | Element |
By switching to external diagrams, we lose sight of the particular (the elements of our sets), but we gain the ability to zoom out and see the whole universe where we have been previously trapped. In the same way that the whole realm of sets can be thought of as one category, a programming language can also be thought of as a category. The concept of a category allows us to find and analyze similarities between these and other structures.
NB: The word “Object” is used in both programming languages and in category theory, but has completely different meanings. A categorical object is equivalent to a type or a class in programming language theory.
One remark before we continue: in the last section, would make it seem like category theory and set theory are somehow competing with each other. Perhaps that notion would be somewhat correct if category and set theory were meant to describe concrete phenomena, in the way that the theory of relativity and the theory of quantum mechanics are both supposed to explain the physical world. Concrete theories are conceived mainly as descriptions of the world, and as such it makes sense for them to be connected in some sort of hierarchy.
In contrast, abstract theories, like category theory and set theory, are more like languages for expressing such descriptions — they still can be connected, and are connected in more than one way, but there is no inherent hierarchical relationship between the two and therefore arguing over which of the two is more basic, or more general, is just a chicken-and-egg problem, as you will see in the next chapter.
“…deal with all elements of a set by ignoring them and working with the set’s definition.” — Dijkstra (from “On the cruelty of really teaching computing science”)
All category theory books, including this one, start by talking about set theory. Looking back, I really don’t know why this is the case — books that focus on a given subject usually don’t start off by introducing an entirely different subject, (before even starting to talk about the main one). Perhaps the set-first approach is the best way to introduce people to categories. Or perhaps using sets to introduce categories is one of those things that people do just because everyone else does it. But, one thing is for certain — we don’t need to study sets in order to understand categories. So now I would like to start over and talk about categories as a foundational concept. So let’s pretend like this is a new book (I wonder if I can dedicate this to a different person).
A category is a collection of objects (things) where the “things” can be anything you want. Consider, for example, these colorful gray balls:
A category consists of a collection of objects as well as some arrows connecting objects to one another. We call the arrows morphisms.
Wait a minute, we said that all sets form a category, but at the same time, any one set can be seen as a category in its own right (just one which has no morphisms). This is true and an example of a phenomenon that is very characteristic of category theory — one structure can be examined from many different angles and may play many different roles, often in a recursive fashion.
This particular analogy (a set as a category with no morphisms) is, however, not very useful. Not because it’s incorrect in any way, but rather because category theory is all about the morphisms. If the arrows in set theory are nothing but a connection between the sets that serve as their source and a destination, in category theory it’s the objects that are nothing but a source and destination for the arrows that connect them to other objects. This is why, in the diagram above, the arrows, and not the objects, are colored: if you ask me, the category of sets should really be called the category of functions.
Speaking of which, note that objects in a category can be connected by multiple arrows and that having the same source and target sets does not in any way make arrows equivalent.
Why that is true is pretty obvious if we go back to set theory for a second (OK, maybe we really have to do it from time to time). There are, for example, an infinite number of functions that go from number to boolean, and the fact that they have the same input type and the same output type (or the same type signature, as we like to say) does not in any way make them equivalent to one another.
There are some types of categories in which only one morphism between two objects is allowed (or one in each direction), but we will talk about them later.
The most important requirement for a structure to be called a category is that two morphisms can make a third, or in other words, that morphisms are composable — given two successive arrows with appropriate type signatures, we can make a third one (in set theory, it is equivalent to the consecutive application of the first two).
Formally, this requirement says that there should exist an operation, usually denoted with the symbol $•$ such that for each pair of morphisms $g: A → B$ and $f: B → C$, there exists a morphism $(f • g): A → C$.
This is the most important part of the definition — if you remember, in set theory, we picked functions, as opposed to the other types of relations because they are composable, here we just invent the concept of a morphism and define them to be composable. You will see where this definition would get us.
NB: Note, that functional composition is read from right to left. e.g. applying $g$ and then applying $f$ is written $f • g$ and not the other way around. (You can think of it as a shortcut to $f(g(a))$).
To have numbers, you have to have a zero. The zero of category theory is what we call the “identity morphism” for each object. In short, this is a morphism that doesn’t do anything.
It’s important to mark this morphism because there can be (let’s again add this very important (and by now probably also very boring) reminder) many morphisms that go from one object to the same object. For example, in the category of sets, we deal with a multitude of functions that have the set of numbers as source and target, such as $negate$, $square$, $add one$, and are not at all the identity morphism.
A structure must have an identity morphism for each object in order for it to be called a category — this is known as the law of identity.
Question: What is the identity morphism in the category of sets?
Composition is special not only because you can take any two morphisms with appropriate signatures and make a third, but because you can do so indefinitely, i.e. for each $n$ successive arrows, each of which has as a source object the target object of the previous, we can draw one (exactly one) arrow that is equivalent to the consecutive application of all $n$ arrows.
But let’s get back to the math. If we carefully review the definition above, we can see that it can be reduced to multiple applications of the following formula: given 3 objects and 2 morphisms between them $f$ $g$ $h$, combining $h$ and $g$ and then combining the end result with $f$ should be the same as combining $h$ to the result of $g$ and $f$ (or simply $(h • g) • f = h • (g • f)$).
This formula can be expressed using the following diagram, which would only commute if the formula is true (given that all our category-theoretic diagrams are commutative, we can say, in such cases, that the formula and the diagram are equivalent).
This formula (and the diagram) is the definition of a property called $associativity$. Being associative is required for functional composition to really be called functional composition (and thus for a category to really be called a category). It is also required for our diagrams to work, as diagrams can only represent associative structures (imagine if the diagram above would not commute, it would be super weird).
Associativity is not just about diagrams. For example, when we express relations using formulas, associativity just means that brackets don’t matter in our formulas (as evidenced by the definition $(h • g) • f = h • (g • f)$).
And it is not only about categories either, it is a property of many other operations on other types of objects as well e.g. if we look at numbers, we can see that the multiplication operation is associative e.g. $(1 \times 2) \times 3 = 1 \times (2 \times 3)$. While division is not $(1 / 2) / 3 = 1 / (2 / 3)$.
This approach (composing indefinitely many things) for building stuff is often used in programming. To see some examples, you don’t need to look further than the pipe operator in Unix (|
), which feeds the standard output of a program into the standard input of another program. If you want to look further, note that there is a whole programming paradigm based on functional composition, called “concatenative programming” utilized in languages like Forth and Factor.
The diagrams above use colours to illustrate the fact that the green morphism is equivalent to the other two (and not just some unrelated morphism), but in practice this notation is a little redundant — the only reason to draw diagrams in the first place is to represent paths that are equivalent to each other. All other paths would just belong in different diagrams.
As we mentioned briefly in the last chapter, all diagrams that are like that (ones in which any two paths between two objects are equivalent to one another) are called commutative diagrams (or diagrams that commute). All diagrams in this book (except the incorrect ones, nudge nudge) commute.
More formally, a commuting diagram is a diagram in which given two objects $a$ and $b$ and two sequences of morphisms between those two objects, we can say that those sequences are equivalent.
The diagram above is one of the simplest commuting diagrams.
For future reference, let’s restate what a category is:
A category is a collection of objects (we can think of them as points) and morphisms (or arrows) that go from one object to another, where: Each object has to have the identity morphism. There should be a way to compose two morphisms with an appropriate type signature into a third one, in a way that is associative.
This is it.
Why are categories defined by those two laws and not some other two (or one, three, four etc.). laws? From one standpoint, the answer to that seems obvious — we study categories because they work, I mean, look at how many applications are there.
But at the same time, category theory is an abstract theory, so everything about it is kinda arbitrary: you can remove a law — and you get another theory that looks similar to category theory (although it might actually turn out to be quite different in practice). Or you can add one more law and get yet another theory (by the way, there are indeed such laws and such theories), so if this specific set of laws works better than any other, then this fact demands an explanation. Not a mathematical explanation (e.g. we cannot prove that this theory is better than some other one), but an explanation nevertheless. What follows is my attempt to provide such an explanation, regarding the laws of identity and associativity.
The reason the identity law is required is by far the more obvious one. Why do we need to have a morphism that does nothing? It’s because morphisms are the basic building blocks of our language, we need the identity morphism to be able to speak properly. For example, once we have the concept of identity morphism defined, we can define a category-theoretic definition of an isomorphism, based on it (which is important, because the concept of an isomorphism is very important for category theory).
As we said in the previous chapter, an isomorphism between two objects ($A$ and $B$) consists of two morphisms — ($A → B$. and $B → A$) such that their compositions are equivalent to the identity functions of the respective objects. Formally, objects $A$ and $B$ are isomorphic if there exist morphisms $f: A → B$ and $g: B → A$ such that $f \circ g = ID_{B}$ and $g \circ f = ID_{A}$.
And here is the same thing expressed with a commuting diagram.
Like the previous one, the diagram expresses the same (simple) fact as the formula, namely that going from the one of objects ($A$ and $B$) to the other one and then back again to the starting object is the same as applying the identity morphism i.e. doing nothing.
If, in some cataclysm, all of scientific knowledge were to be destroyed, and only one sentence passed on to the next generations of creatures, what statement would contain the most information in the fewest words? I believe it is the atomic hypothesis (or the atomic fact, or whatever you wish to call it) that all things are made of atoms—little particles that move around in perpetual motion, attracting each other when they are a little distance apart, but repelling upon being squeezed into one another. In that one sentence, you will see, there is an enormous amount of information about the world, if just a little imagination and thinking are applied. — Richard Feynman
Associativity — what does it mean and why is it there? In order to tackle this question, we must first talk about another concept — the concept of reductionism:
Reductionism is the idea that the behaviour of complex phenomena can be understood in terms of a number of simpler and more fundamental phenomena. In other words, that things keep getting simpler and simpler as they get “smaller” (or when they are viewed from a lower level). An example of reductionism is the idea that the behaviour of matter can be understood completely by studying the behaviours of its constituents i.e. atoms (the word means “undividable”).
Whether the reductionist view is universally valid, i.e. whether it is possible to devise a theory of everything that describes the whole universe with a set of very simple laws, is a question over which we can argue until that universe’s inevitable collapse. What is certain, though, is that reductionism underpins all our understanding, especially when it comes to science and mathematics — each scientific discipline is based on a set of simple fundaments (e.g. elementary particles in particle physics, chemical elements in chemistry etc. ) on which it builds on its much more complex theories.
So, if this principle is so important, it would be beneficial to be able to formalize it (i.e. to translate it into mathematical language), and this is what we will try to do now. One way to state the principle of reductionism is to say that each thing is nothing but a sum of its parts i.e. if we combine the same set of parts, we always get the same result. To formalize that, we get a set of objects (balls) and a way to combine them, (which we will denote with a dot).
So, if we have, a given “recipe”, for example
Then, we would also have
Or quite simply
Incidentally, this is the definition of a mathematical law called commutativity.
A simple context where this law applies — the natural numbers are commutative under the operation of addiction, e.g. 1 + 2 = 2 + 1 (we will learn more about this in the chapter on groups).
Question: If our objects are sets, what set operations can play the part of the dot in this example (i.e. which ones are commutative)?
Sometimes we observe phenomena that still can be represented as a combination of a given set of fundaments, but only when they are combined in a specific way (as opposed to any combination, as in commutative contexts) e.g. as any mechanic can confirm, a bicycle is indeed just the combination of wheels and frameset…
…but that does not mean that every combination of the above parts constitutes a bicycle. For example, placing the front wheel of the bicycle on the rear end of the frame would result would not result in a working bicycle. And combining two wheels one with another is just not possible.
And, to take a formal example, if function A can be combined with B to get C…
…would not automatically mean that B can be combined with A to get the same result
Side note: composing any function with any other is only possible for functions that have the same set, both as source and target, but even then the end result would not always be the same.
Anyway, we determined that functional composition (as well as bike building) does not obey the law of commutativity i.e. not all functions (or bike parts) compose with all other ones. But still, we know that some of them do, i.e. if you align the function signatures (or in the case of the bicycle if you use the proper parts in the proper places) they would come together seamlessly. So, the order by which we combine the individual parts doesn’t matter for the final outcome e.g. when we are assembling a bicycle, it doesn’t matter if we attach the front wheel to the frame first, or the back wheel, the result will be the same.
Similarly, when combining functions, each pair of functions can, at any time, be replaced by the function that we get by combining them.
This fact is captured by a more restrictive version of commutativity, that we call associativity, which for functions, is usually formulated like this.
Or more generally.
This is the essence of associativity — the ability to study complex phenomenon by zooming in on a part that you want to examine in a given moment, and looking at it in isolation.
Note that the operator we defined only allows for combining things in one dimension (you can attach a thing left and right, but not up or down). Later we will learn about an extension of the concept of a category theory (called monoidal category) that “supports” working in 2 dimensions.
We said that associativity is a more restricted version of commutativity. This is true in a formal way, as we show here. By now, you might realize that a composition operator for morphisms in a given category (the thing we denote by the dot) is itself a morphism, that accepts a pair of morphisms and returns yet another morphism (so \(A \times B \to C\)).
Given a composition operator for a category and one morphism from that same category (in this case the green ball), create two more morphisms (which we call the “curried” morphisms (more in the next chapter)) (\(A \to C\) and \(B \to C\)) by just fixing the second or the first argument of the operator, correspondingly i.e. a morphism that attaches the green ball at the “front” of the function that we have as an argument (produces its source) and one that attaches it at the “end” (accepting its target as a source).
Now, take two such morphisms, with different directions and different objects and ask yourself: under what conditions would those two morphisms commute i.e. when would that formula be satisfied?
Expanding the formula gives us the answer: they commute when the original dot operator is associative.
Thus, we established a connection between associativity and commutativity.