Ready, set, begin… (you don’t know how hard I tried to resist to making that pun). We begin our inquiry with the theory of sets. Set theory and category theory share many similarities. We can view category theory as a generalization of set theory. That is, it’s meant to describe the same thing as set theory (everything?), but to do it in a more abstract manner, one that is more versatile and (hopefully) simpler.
Also, sets are an example of a category (the proto-example, we might say), and it is useful to have examples.
Instead of asking what can be defined and deduced from what is assumed to begin with, we ask instead what more general ideas and principles can be found, in terms of which what was our starting-point can be defined or deduced. Bertrand Russell, from Introduction to Mathematical Philosophy
Most scientific and mathematical theories have a specific domain, which they are tied to, and in which they are valid. They are created with this domain in mind and are not intended to be used outside of it. For example, Darwin’s theory of evolution is created in order to explain how different biological species came to evolve using natural selection, quantum mechanics is a description of how particles behave at a specific scale, etc.
Even mathematical theories, although they are not inherently bound to a specific domain (like the scientific theories) are at least strongly related to some domain, as for example differential equations are created to model how events change over time.
Set theory and category theory are different, they are not created to provide a rigorous explanation of how a particular phenomenon works, instead they provide a more general framework for explaining all kinds of phenomena. They work less like tools and more like languages for defining tools. Such theories are called abstract theories.
The borders of the two are sometimes blurry. All theories use abstraction, otherwise they would be pretty useless: without abstraction Darwin would have to speak about specific animal species or even individual animals. The difference is that abstract theories have core concepts that don’t refer to anything in particular, and are instead left for people to generalize on. All theories are applicable outside of their domains, but set theory and category theory do not have a domain to begin with.
Concrete theories, like the theory of evolution, are composed of concrete concepts. For example, the concept of a population, also called a gene-pool, refers to a group of individuals that can interbreed. Abstract theories, like set theory, are composed of abstract concepts, like the concept of a set. The concept of a set by itself does not refer to anything. However, we cannot say that it is an empty concept, as there are countless things that can be represented by sets, for example, gene pools can be (very aptly) represented by sets of individual animals. Animal species can also be represented by sets — a set of all populations that can theoretically interbreed.
You’ve already seen how abstract theories may be useful. Because they are so simple, they can be used as building blocks to many concrete theories. Because they are common, they can be used to unify and compare different concrete theories, by putting these theories in common grounds (this is very characteristic of category theory, as we will see later). Moreover, good (abstract) theories can serve as mental models for developing our thoughts.
“A set is a gathering together into a whole of definite, distinct objects of our perception or of our thought—which are called elements of the set.” – Georg Cantor
Perhaps unsurprisingly, everything in set theory is defined in terms of sets. A set is a collection of things where the “things” can be anything you want (like individuals, populations, genes, etc.) Consider, for example, these balls.
Let’s construct a set, call it $G$ (as gray) that contains all of them as elements. There can only be one such set, because a sets have no structure.
A set is a collection of items, that contains no structure, other than which items belong to it.
So, there is no order, no position, no ball goes before or after another, there are no members which are “special” with respect to their membership of the set. Two sets that contain the same elements are just pictures of the same set.
This example may look overly-simple, but in fact, it’s just as valid as any other.
The key insight that makes the concept useful is the fact that it enables you to reason about several things as if they were one.
Let’s construct one more set. The set of all balls that are warm in color. Let’s call it $Y$ (because in the diagram, it’s colored in yellow).
Notice that $Y$ contains only elements that are also present in $G$. When two sets have this relation, we may say that $Y$ is a subset of $G$.
$Y$ is a subset of $G$ when (or $Y \subseteq G$) if every element of the set of $Y$ is also an element in the set $G$.
A subset resides completely inside its superset when the two are drawn together.
The set of all red balls contains just one ball. We said above that sets summarize several elements into one. Still, sets that contain just one element are perfectly valid &mdash, there are things that are one of a kind.
A singleton set is a set that contains one element.
The set of kings/queens that a given kingdom has is a singleton set.
What’s the point of singleton sets? Well, it is part of the language of set theory, e.g., if we have a function which expects a set of given items, but if there is only one item that meets the criteria, we can just create a singleton set with that item.
Of course if one is a valid answer, zero can be also. If we want a set of all black balls $B$ or all the white balls, $W$, the answer to all these questions is the same — the empty set.
Because a set is defined only by the items it contains, the empty set is unique — there is no difference between the set that contains zero balls and the set that contains zero numbers, for instance.
The empty set is a set that contains no elements.
Formally, the empty set is marked with the symbol $\varnothing$ (so $B = W = \varnothing$).
The empty set has some special properties, for example, it is a subset of every other set. Mathematically speaking, $\forall A \to \varnothing \subseteq A$ ($\forall$ means “for all”)
“By function I mean the unity of the act of arranging various representations under one common representation.” — Immanuel Kant, from “The Critique of Pure Reason”
Now is the time to admit something: this chapter isn’t actually about sets. It is about functions and we only started this way because there is no way to explain functions without explaining sets (although if you read on, you’ll find that there actually is one).
A function is many-to-one a relationship between two sets: one that matches each element of one set, called the source set of the function, with exactly one element from another set, called the target set of the function.
These two sets are also called the domain and codomain of the function, or its input and output. In programming, they go by the name of argument type and return type. In logic, they correspond to the premise and conclusion (we will get there). We might also say, depending on the situation, that a given function goes from this set to that other one, connects this set to the other, or that it converts a value from this set to a value from the other one. These different terms demonstrate the multifaceted nature of the concept of function.
Here is a function $f$, which converts each ball from the set $R$ to the ball with the opposite color in another set $G$ (in mathematics a function’s name is often accompanied by the names of its source and target sets, like this: $f: R → G$)
This is probably one of the simplest type of function that exists — it encodes a one-to-one relationship between the sets. That is to say, one element from the source is connected to exactly one element from the target (and the other way around).
But functions usually express relationships of the type many-to-one, where many elements from the source might be connected to one element from the target (but not the other way around). Below is one such function.
Such functions might represent operations such as categorizing a given collection of objects by some criteria, or partitioning them, based on some property that they might have.
A function can also express relationships in which some elements from the target set do not play a part.
An example might be the relationship between some kind of pattern or structure and the emergence of this pattern in some more complicated context.
We saw how versatile functions are, but there is one thing that you cannot have in a function. You cannot have a source element that is not mapped to anything, or that is mapped to more than one target element — that would constitute a many-to-many relationship and as we said functions express many-to-one relationships. There is a reason for that “design decision”, and we will arrive at it shortly.
For every set $G$, no matter what it represents, we can define the function that does nothing:
The identity function for a set $G$, $ID_{G}: G \to G$ is a function which maps every element of $G$ to itself.
You can think of $ID_{G}$ as a function which represents the set $G$ in the realm of functions.
Another interesting collection of functions (there is one for each subset).
For each set and subset, we can define a function (called the image of the subset) that maps each element of the subset to itself.
Every set is a subset of itself, in which case this function is the same as the identity.
Although it doesn’t look like it…
There is a unique function from the empty set to any other set.
Task 3: Is this really valid? Why? Check the definition.
If you still aren’t convinced, check this out:
So, evidently, this function has to exist!
Task 4: What about the other way around. Are there functions with the empty set as a target as opposed to its source?
And another function that we meet often is this one.
There is a unique function from any set to any singleton set.
Task 5: Is this really the only way to connect any set to a singleton set in a valid way?
Task 6: Again, what about the other way around?
Given two sets $A$ and $B$ with a bunch of functions between them (here we don’t draw the set elements and draw each function as a single arrow)…
…we can construct the function set (usually called hom set) of $A$ to $B$, containing those functions as elements.
If there are no functions, this set is empty.
For objects $A$ and $B$ the homomorphism set of $A$ to $B$, (denoted $A \Rightarrow B$) is the set containing one element for each function that goes from $A$ to $B$.
Note that this is only in one direction, the functions between $B$ and $A$ belong in a different hom set.
All numerical operations can be expressed as functions acting on the sets of (different types of) numbers.
Because not all functions work on all numbers, we separate the set of numbers to several sets, some of which are subsets to one another, such the set of whole numbers $\mathbb{Z} := {… -3 -2, -1, 0, 1, 2, 3… }$, the set of positive whole numbers, (also called “natural” numbers), $\mathbb{N} := {0, 1, 2, 3… }$.
(Because both sets are infinite, we cannot draw them in their entirety, however we can draw a part of them).
Every generalization of number has first presented itself as needed for some simple problem: negative numbers were needed in order that subtraction might be always possible, since otherwise a − b would be meaningless if a were less than b; fractions were needed in order that division might be always possible; and complex numbers are needed in order that extraction of roots and solution of equations may be always possible. — Bertrand Russell, from Introduction to Mathematical Philosophy
Each numerical operation is a function between two number sets. For example, squaring a number is a function from the set of real numbers to the set of real non-negative numbers.
I will use the occasion to reiterate some of the more important characteristics of functions:
Overall everything is permitted, as long as you can always provide exactly one result (also known as The result™) per value. For numerical operations, this is always true, simply because math is designed this way.
Note that most mathematical operations, such as addition, multiplication, etc. require two numbers in order to produce a result. This does not mean that they are not functions, it just means they’re a little fancier. Depending on what we need, we may present those operations as functions from the sets of tuples of numbers to the set of numbers, or we may say that they take a number and return a function. More on that later.
Sets are used extensively in programming, especially in their incarnation as types (also called classes). All sets of numbers that we discussed earlier also exist in most languages as types.
Sets are not the same thing as types, but types can be seen as sets (or they have sets we can say).
For example, we can view the Boolean type as a set containing two elements — true and false.
Another very basic set that is used in programming is the set of keyboard characters, or Char.
Most of the sets that we use in programming are composite sets e.g. make a list of Chars and you have a string. We will see how this happens later.
Task 7: What is the type equivalent of subsets in programming?
Functions in programming (also called methods, subroutines, etc.) kinda resemble mathematical functions — they take an element that belongs to a given set and return exactly one element which belongs to another set.
For example, here is a method that takes an argument of type Char and returns a Boolean, indicating whether the character is a letter.
However functions in most programming languages can also be quite different from mathematical functions — they can perform various operations that have nothing to do with returning a value. These operations are sometimes called side effects.
Why are functions in programming different? Well, figuring a way to encode effectful functions in a way that is mathematically sound isn’t trivial and at the time when most programming paradigms that are in use today were created, people had bigger problems than the their functions not being mathematically sound (e.g. actually being able to run any program at all).
Many people feel that mathematical functions are too limiting and hard to use. And they might have a point, but mathematical functions have one big advantage over non-mathematical ones — their type signature tells you almost everything about what the function does (this is probably the reason why most functional languages are strongly-typed). This is why there are some languages that only permit mathematical functions, and for which this equality holds. They are called purely-functional programming languages. An example of a such language is Haskell, which we will meet later.
Such languages don’t support functions that perform operations like rendering stuff on screen, doing I/O, etc. (in this context, such operations are called “side effects”.
There, such operations are outsourced to the language’s runtime. Instead of writing functions that directly perform a side effect, for example console.log('Hello'), we write functions that return a type that represents that side effect (for example, in Haskell side effects are handled by the IO type) and the runtime then executes those functions for us.
We then compose all those functions into a program (by breaking them down to a thing called continuation passing style).
Now, we were just about to reach the heart of the matter regarding the topic of functions. And that is functional composition. Assume that we have two functions, and the target of the first one is the same set as the source of the second one
If we apply the first function $g$ to some element from set $Y$, we will get an element of the set $P$. Then, if we apply the second function $f$ to that element, we will get an element from type $G$.
We can define a function that is the equivalent to performing the operation described above:
For any three sets $Y$, $P$ and $G$ and two functions $g: Y \to P$ and $f: P \to G$ we can define a function $f \circ g$, such that, if you follow the $f \circ g$ arrow for any element of set $Y$ you will get to the same element of the set $G$ as the one you will get if you follow the $g$ arrow and then the $f$ arrow. We call $f \circ g$ the composition of $g$ and $f$.
(notice that in $f \circ g$ the first function is on the right, so it’s similar to $f(g(a)$).
Composition is the essence of all things categorical. The key insight is that the sum of two parts is no more complex than the parts themselves (and therefore can be summed (composed) again). This insight is captured by the property called associativity, which we will look into later.
Task 8: Think about which qualities of a function make composition possible, e.g., does it work with other types of relationships, like many-to-many and one-to-many.
To understand how powerful composition is, consider the following: one set being connected to another means that each function from the second set can be transferred to a corresponding function from the first one.
If we have a function $g: P \to Y$ from set $P$ to set $Y$, then for every function $f$ from the set $Y$ to any other set, there is a corresponding function $f \circ g$ from the set $P$ to the same set. In other words, every time you define a new function from $Y$ to some other set, you gain one function from $P$ to that same set for free.
For example, if we take the relationship between a person and his mother as a function called “mother” with the set of all people in the world as source, and the same set as target, then composing this function with itself would yield the function “grandmother”, composing it with the function “sister” would yield the function “aunt”.
And if we keep composing these two functions (as well as their male counterparts “father” and “brother”, we would obtain all possible ancestral relationships.
This example highlights an important ability that is enabled to functional composition — the ability to break-down composite relationships to their basic building blocks.
Besides being useful for analyzing relationships that already exist, the principle of composition enables you to build objects that exhibit such relationships (AKA engineering).
The main way in which modern engineering differs from ancient craftsmanship is the concept of a part/module/component — a product that performs a given function that is not made to be used directly, but is instead optimized to be combined with other such products in order to form a “end-user” product.
For example, an espresso machine is just a combination of the components, such as , pump, heater, grinder group etc, when composed in an appropriate way.
Task 9: Think about what would be those functions’ sources and targets.
By the way, diagrams that are “zoomed out” that show functions without showing set elements are called external diagrams, as opposed to the ones that we saw before, which are internal.
Let’s look at the diagram that demonstrates functional composition in which we showed that successive application of the two composed functions ($f \circ g$) and the new function ($h$) are equivalent.
We showed this equivalence by drawing an internal diagram, and explicitly drawing the elements of the functions’ sources and targets in such a way that the two paths are equivalent.
Alternatively, we can just say that the arrow paths are all equivalent (all arrows starting from a given set element ultimately lead to the same corresponding element from the resulting set) and draw the equivalence as an external diagram.
Or alternatively, if you want to express it as a formula.
The external diagram is a more appropriate representation of the concept of composition, as it is more general. In fact, it is so general that it can actually serve as a definition of functional composition.
The composition of two functions $f$ and $g$ is a third function $h$ defined in such a way that all the paths in this diagram are equivalent.
If you continue reading this book, you will hear more about diagrams in which all paths are equivalent (they are called commuting diagrams, by the way).
If we want compose more than two functions we might wonder if the order in which we compose the functions matters for the final outcome i.e. whether combining two functions and then combining the result with a third function…
…would yield the same result as composing the second and the third functions, before adding the first one.
The answer is that order of composition doesn’t matter — as long as we compose the same functions, the result would always be the same.
i.e. there are many ways to get the same function.
This property of functions is called associativity.
Functional composition is associative i.e., for any functions $f$ $g$ and $c$ with the appropriate type signature $(f \circ g ) \circ c$ is the same as $f \circ (g \circ c)$
Task 10: Draw the above diagrams as internal diagrams: define three functions that compose with one another (you can use the two functions that we defined earlier, you only would have to make a third one) compose them in the two ways shown above and check if the result is the same.
At this point you might be worried that I had forgotten that I am supposed to talk about category theory and I am just presenting a bunch of irrelevant concepts. I may indeed do that sometimes, but not right now — the fact that functional composition can be presented without even mentioning category theory doesn’t stop it from being one of category theory’s most important concepts.
In fact, we can say (although this is not an official definition) that…
Category theory is the study of things that are function-like (that compose in an associative way).
Those things are not necessarily functions, but have a source and a target like functions, they compose with one another like functions (associatively) and they can be represented by external diagrams.
And there is another way of defining category theory, without defining category theory: it is what you get if you replace the concept of equality with the concept of isomorphism.
We haven’t talked about isomorphisms yet, but this is what we will be doing for the rest of this chapter.
To explain what isomorphism is, we go back to the examples of the types of relationships that functions can represent, and to the first and most elementary of them all — the one-to-one type of relationship. We know that all functions have exactly one element from the source set, pointing to one element from the target set. But for one-to-one functions the reverse is also true — exactly one element from the target set points to one element from the source.
If we have a one-to-one-function that connects sets that are of the same size (as is the case here), then this function has the following property: all elements from the target set have exactly one arrow pointing at them. In this case, the function is invertible. That is, if you flip the arrows of the function and its source and target, you get another valid function.
Invertible functions are called isomorphisms. When there exists an invertible function between two sets, we say that the sets are isomorphic. For example, because we have an invertible function that converts the temperature measured in Celsius to temperature measured in Fahrenheit, and vise versa, we can say that temperatures measured in Celsius and Fahrenheit are isomorphic.
Isomorphism means “same form” in Greek (although actually their form is the only thing which is different between two isomorphic sets).
Now it’s time to define isomorphisms formally. One way to do that is the following:
(Internal) Two sets $A$ and $B$ are isomorphic (or $A ≅ B$) if there exist a one-to-one relationship between their elements.
This definition does not tell us anything about the most important quality of isomorphisms — invertability. So, we will present a different one:
(External) Two sets $A$ and $B$ are isomorphic (or $A ≅ B$) if there exist functions $f: A \to B$ and its reverse $g: B \to A$, such that $f \circ g = ID_{A}$ and $g \circ f = ID_{B}$.
Notice how the identity function comes in handy. In fact, notice that the concept of isomorphism is defined only by using the identity function and functional composition, this is our first completely external, completely categorical definition (if you don’t count composition itself).
If you look closely you would see that the identity function is invertible too (its reverse is itself),
The identity function is an isomorphism.
So each set is isomorphic to itself in that way.
So, the concept of an isomorphism contains the concept of equality — all equal things are also isomorphic.
An interesting fact about isomorphisms is that if we have functions that convert a member of set $A$ to a member of set $B$, and the other way around, then, because of functional composition, we know that any function from $A$ has a corresponding function from $B$ and the other way around.
For example, if you have a function “is the partner of” that goes from the set of all married people to the same set, then that function is invertible. That is not to say that you are the same person as your significant other, but rather that every statement about you, or every relation you have to some other person or object is also a relation between them and this person/object, and vice versa.
Another interesting fact about isomorphisms is that if we have two isomorphisms that have a set in common, then we can obtain a third isomorphism between the other two sets that would be the result of their (the isomorphisms) composition.
Two sets that are both isomorphic to a third one are isomorphic to one another.
Composing two isomorphisms into another isomorphism is possible by composing the two pairs of functions that make up the isomorphism in the two directions.
Informally, we can see that the two morphisms are indeed reverse to each other and hence form an isomorphism. If we want to prove that fact formally, we will do something like the following:
If two functions are isomorphic, then their composition is equal to an identity function, proving that functions $g \circ f$ and $f’ \circ g’$, are isomorphic is equivalent to proving that their composition is equal to identity.
$g \circ f \circ f’ \circ g’ = id$
But we know already that $f$ and $f’$ are isomorphic and hence $f\circ f’ = id$, so the above formula is equivalent to (you can reference the diagram to see what that means):
$g \circ id \circ g’ = id$
And we know that anything composed with $id$ is equal to itself, so it is equivalent to:
$g \circ g’ = id$
which is true, because $g$ and $g’$ are isomorphic and isomorphic functions composed are equal to identity.
By the way, there is another way to obtain the isomorphism — by composing the two morphisms one way in order to get the third function and then taking its reverse. But to do this, we have to prove that the function we get from composing two isomorphisms is also an isomorphism.
Between any two singleton sets, we may define the only possible function.
The function is invertible, which means that all singleton sets are isomorphic to one another, and furthermore (which is important) they are isomorphic in one unique way.
We said that isomorphic sets aren’t necessarily the same set (although the reverse is true). However, it is hard to get away from the notion that being isomorphic means that they are equal or equivalent in some respect. For example, all people who are connected by the isomorphic mother/child relationship share some of the same genes.
And in computer science, if we have functions that convert an object of type $A$ to an object of type $B$ and the other way around (as for example the functions between a data structure and its id), we also can pretty much regard $A$ and $B$ as two formats of the same thing, as having one means that we can easily obtain the other.
What does it mean for two things to be equivalent? The question sounds quite philosophical, but there is actually is a formal way to answer it, i.e., there is a mathematical concept that captures the concept of equality in a rather elegant way — the concept of an equivalence relation.
So what is an equivalence relation? We already know what a relation is — it is a connection between two sets (an example of which is function). But when is a relation an equivalence relation? Well, according the definition, it’s when it follows three laws, which correspond to three intuitive ideas about equality.
An equivalence relation between sets is a relation that obeys the laws of reflexivity, transitivity, and symmetry.
Let’s review them.
The first idea that defines equivalence, is that everything is equivalent with itself.
This simple principle translates to the equally simple law of reflexivity: for all sets $A$, $A=A$.
According to the Christian theology of the Holy Trinity, the Jesus’ Father is God, Jesus is God, and the Holy Spirit is also God, however, the Father is not the same person as Jesus (neither is Jesus the Holy Spirit). If this seems weird to you, that’s because it breaks the second law of equivalence relations, transitivity. Transitivity is the idea that things that are both equal to a third thing must also equal between themselves.
Mathematically, for all sets $A$ $B$ and $C$, if $A=B$ and $B=C$ then $A=C$.
Note that we don’t need to define what happens in similar situations that involve more than three sets, as they can be settled by just multiple application of this same law.
If one thing is equal to another, the reverse is also true, i.e, the other thing is also equal to the first one. This idea is called symmetry. Symmetry is probably the most characteristic property of the equivalence relation, which is not true for almost any other relation.
In mathematical terms: if $A=B$ then $B=A$.
You probably suspect that…
Isomorphisms are equivalence relations
Isomorphisms are indeed equivalence relations. And “incidentally”, we already have all the information needed to prove it (in the same way in which James Bond seems to always incidentally have exactly the gadgets that are needed to complete his mission).
We said that the most characteristic property of the equivalence relation is its symmetry. And this property is satisfied by isomorphisms, due to the isomorphisms’ most characteristic property, namely the fact that they are invertible.
Task 11: One law down, two to go: Go through the previous section and verify that isomorphisms also satisfy the other equivalence relation laws.
What I am trying to say with all this is that it makes sense to treat any isomorphism as equality. For this reason, the practice of using isomorphisms to define an equivalence relation is very prominent in category theory where isomorphisms are denoted with $≅$, which is almost the same as the way equality is denoted $=$ (note that the sign is also similar to two parallel arrows connecting one set to the other).