“The range of applications for category theory is immense, and visually conveying meaning through illustration is an indispensable skill for organizational and technical work. Unfortunately, the foundations of category theory, despite much of their utility and simplicity being on par with Venn Diagrams, are locked behind resources that assume far too much academic background.
Should category theory be considered for this academic purpose or any work wherein clear thinking and explanations are valued, beginner-appropriate resources are essential. There is no book on category theory that makes its abstractions so tangible as “Category Theory Illustrated” does. I recommend it for programmers, managers, organizers, designers, or anyone else who values the structure and clarity of information, processes, and relationships.”
Evan Burchard, Author of “The Web Game Developer’s Cookbook” and “Refactoring JavaScript”
“The clarity, consistency and elegance of diagrams in ‘Category Theory Illustrated’ has helped us demystify and explain in simple terms a topic often feared.”
Gonzalo Casas, Software developer and lecturer at ETH Zurich
In memory of
Francis William Lawvere
1937 — 2023
“Try as you may,
you just can’t get away,
from mathematics”
Tom Lehrer
I was interested in math as a kid, but was always messing up calculations, so I decided it was not my thing and started pursuing other interests, like writing and visual art.
A little later I got into programming and I found that this was similar to the part of mathematics that I enjoyed. I started using functional programming in an effort to explore the similarity and to improve myself as a developer. I discovered category theory a little later.
Some 5 years ago I found myself jobless for a few months and decided to publish some of the diagrams that I drew as part of the notes I kept when was reading “Category Theory for Scientists” by David Spivak. The effort resulted in a rough version of the first two chapters of this book, which I published online.
A few years after that some people found my notes and encouraged me write more. They were so nice that I forgot my imposter syndrome and got to work on the next several chapters.
Ever since Newton’s Principia, the discipline of mathematics is viewed in the somewhat demeaning position of “science and engineering’s workhorse” — only “useful” as a means for helping scientists and engineers to make technological and scientific advancements, i.e., it is viewed as just a tool for solving “practical” problems.
Because of this, mathematicians are in a weird and, I’d say, unique position of always having to defend what they do with respect to its value for other disciplines. I again stress that this is something that would be considered absurd when it comes to any other discipline.
People don’t expect any return on investment from physical theories, e.g., no one bashes a physical theory for having no utilitarian value.
And bashing philosophical theories for being impractical would be even more absurd — imagine bashing Wittgenstein, for example:
“All too well, but what can you do with the picture theory of language?” “Well, I am told it does have its applications is programming language theory…”
Or someone being sceptical to David Hume’s scepticism:
“That’s all fine and dandy, but your theory leaves us at square one in terms of our knowledge. What the hell are we expected to do from there?”
Although many people don’t necessarily subscribe to this view of mathematics as a workhorse, we can see it encoded inside the structure of most mathematics textbooks — each chapter starts with an explanation of a concept, followed by some examples, and then ends with a list of problems that this concept solves.
There is nothing wrong with this approach, but mathematics is so much more than a tool for solving problems. It was the basis of a religious cult in ancient Greece (the Pythagoreans), it was seen by philosophers as means to understanding the laws which govern the universe. It was, and still is, a language which can allow for people with different cultural backgrounds to understand each other. And it is also art and a means of entertainment. It is a mode of thinking, Or we can even say it is thinking itself. Some people say that “writing is thinking”, but I would argue that writing, when refined enough, and free from any kind of bias in on the side of the author, automatically becomes mathematical writing — you can almost convert the words into formulas and diagrams.
Category theory embodies all these aspects of mathematics, so I think it’s very good grounds to writing a book where all of them shine — a book that isn’t based on solving of problems, but exploring concepts and seeking connections between them. A book that is, overall, pretty.
So, who is this book for? Some people would phrase the question as “Who should read this book”, but if you ask it this way, then the answer is “nobody”. Indeed, if you think in terms of “should”, mathematics (or at least the type of mathematics that is reviewed here) won’t help you much, although it is falsely advertised as a solution to many problems (whereas it is, in fact, (as we established) something much more).
Let’s take an example — many people claim that Einstein’s theories of relativity are essential for GPS-es to work properly. Due to relativistic effects, the clocks on GPS satellites tick faster than identical clocks on the ground.
They seem to think that if the theory didn’t exist, the engineers that developed the GPSes would have faced this phenomenon in the following way:
Engineer 1: Whoa, the clocks on the satellites are off by X nanoseconds!
Engineer 2: But that’s impossible! Our mathematical model predicts that they should be correct.
Engineer 1: OK, so what do we do now?
Engineer 2: I guess we need to drop this project until we have a viable mathematical model that describes time in the universe.
Although I am not an expert in special relativity, I suspect that the way this conversation would have developed would be closer to the following:
Engineer 1: Whoa, the clocks on the satellites are off by X nanoseconds!
Engineer 2: This is normal. There are many unknowns.
Engineer 1: OK, so what do we do now?
Engineer 2: Just adjust it by X and see if it works. Oh, and tell that to some physicist. They might find it interesting.
In other words, we can solve problems without any advanced math, or with no math at all, as evidenced by the fact that the Egyptians were able to build the pyramids without even knowing Euclidean geometry. And with that I am not claiming that math is so insignificant, that it is not even good enough to serve as a tool for building stuff. Quite the contrary, I think that math is much more than just a simple tool. So going through any math textbook (and of course especially this one) would help you in ways that are much more vital than finding solutions to “complex” problems.
Some people say that we don’t use maths in our daily life. But, if true, that is only because other people have solved all hard problems for us and the solutions are encoded on the tools that we use, however not knowing math means that you will be forever a consumer, bound to use those existing tools and solutions and thinking patterns, not being able to do anything on your own.
And so “Who is this book for” is not to be read as who should, but who can read it. Then, the answer is “everyone”.
Like we said, the fundaments of mathematics are the fundaments of thought. Category theory allows us to formalize those fundaments that we use in our daily (intellectual) lives.
The way we think and talk is based on intuition that develops naturally and is a very easy way to get our point across. However, intuition also makes it easy to be misunderstood — what we say usually can be interpreted in many ways, some of which are wrong. Misunderstanding of these kinds are the reason why biases appear. Moreover, some people (called “sophists” in ancient Greece) would introduce biases on purpose in order to get short-term gains (not caring that introducing biases hurts everyone on the long run).
It’s in such situations, that people often resort to formulas and diagrams to refine their thoughts. Diagrams (even more than formulas) are ubiquitous in science and mathematics.
Category theory formalizes the concept of diagrams and their components — arrows and objects — to create a language for presenting all kinds of ideas. In this sense, category theory is a way to unify knowledge, both mathematical and scientific, and to unite various modes of thinking with common terms.
As a consequence of that, category theory and diagrams are also a very understandable way to communicate a formal concept clearly, something I hope to demonstrate in the following pages.
In this book we will visit various such modes of knowledge and along the way, see all kinds of mathematical objects, viewed through the lens of categories.
We start with set theory in chapter 1, which is the original way to formalize different mathematical concepts.
Chapter 2 we will make a (hopefully) gentle transition from sets to categories while showing how the two compare and (finally) introducing the definition of category theory.
In the next two chapters, 3 and 4, we jump into two different branches of mathematics and introduce their main means of abstraction, groups and orders, observing how they connect to the core category-theoretic concepts that we introduced earlier.
Chapter 5 also follows the main formula of the previous two chapters, getting to the heart of the matter of why category theory is a universal language, by showing its connection with the ancient discipline of logic. As in chapters 3 and 4, we start with a crash course in logic itself.
The connection between all these different disciplines is examined in chapter 6, using one of the most interesting category-theoretical concepts — the concept of a functor.
In chapter 7 we review another more interesting and more advanced categorical concept, the concept of a natural transformation.
Thanks to my wife Dimitrina, for all her support.
My daughter Daria, my “anti-author” who stayed seated on my knees when I was writing the second and third chapters and mercilessly deleted many sentences, most of them bad.
Thanks to my high-school arts teacher, Mrs Georgieva who told me that I have some talent, but I have to work.
Thanks to Prathyush Pramod who encouraged me to finish the book and is also helping me out with it.
And also to everyone else who submitted feedback and helped me fix some of the numerous errors that I made — knowing myself, I know that there are more.
Let’s begin our inquiry by looking at the basic theory of sets. Set theory and category theory share many similarities. We can view category theory as a generalization of set theory. That is, it’s meant to describe the same thing as set theory (everything?), but to do it in a more abstract manner, one that is more versatile and (hopefully) simpler.
In other words, sets are an example of a category (the proto-example, we might say), and it is useful to have examples.
Instead of asking what can be defined and deduced from what is assumed to begin with, we ask instead what more general ideas and principles can be found, in terms of which what was our starting-point can be defined or deduced. Bertrand Russell, from Introduction to Mathematical Philosophy
Most scientific and mathematical theories have a specific domain, which they are tied to, and in which they are valid. They are created with this domain in mind and are not intended to be used outside of it. For example, Darwin’s theory of evolution is created in order to explain how different biological species came to evolve using natural selection, quantum mechanics is a description of how particles behave at a specific scale, etc.
Even most mathematical theories, although they are not inherently bound to a specific domain (like the scientific theories) are at least strongly related to some domain, as for example differential equations are created to model how events change over time.
Set theory and category theory are different, they are not created to provide a rigorous explanation of how a particular phenomenon works, instead they provide a more general framework for explaining all kinds of phenomena. They work less like tools and more like languages for defining tools. Theories that are like that are called abstract theories.
The borders of the two are sometimes blurry. All theories use abstraction, otherwise they would be pretty useless: without abstraction Darwin would have to speak about specific animal species or even individual animals. But theories have core concepts that don’t refer to anything in particular, but are instead left for people to generalize on. All theories are applicable outside of their domains, but set theory and category theory do not have a domain to begin with.
Concrete theories, like the theory of evolution, are composed of concrete concepts. For example, the concept of a population, also called a gene-pool, refers to a group of individuals that can interbreed. Abstract theories, like set theory, are composed of abstract concepts, like the concept of a set. The concept of a set by itself does not refer to anything. However, we cannot say that it is an empty concept, as there are countless things that can be represented by sets, for example, gene pools can be (very aptly) represented by sets of individual animals. Animal species can also be represented by sets — a set of all populations that can theoretically interbreed. You’ve already seen how abstract theories may be useful. Because they are so simple, they can be used as building blocks to many concrete theories. Because they are common, they can be used to unify and compare different concrete theories, by putting these theories in common grounds (this is very characteristic of category theory, as we will see later). Moreover, good (abstract) theories can serve as mental models for developing our thoughts.
“A set is a gathering together into a whole of definite, distinct objects of our perception or of our thought—which are called elements of the set.” – Georg Cantor
Perhaps unsurprisingly, everything in set theory is defined in terms of sets. A set is a collection of things where the “things” can be anything you want (like individuals, populations, genes, etc.) Consider, for example, these balls.
Let’s construct a set, call it $G$ (as gray) that contains all of them as elements. There can only be one such set: because a set has no structure (there is no order, no ball goes before or after another, there are no members which are “special” with respect to their membership of the set.) Two sets that contain the same elements are just two pictures of the same set.
This example may look overly-simple, but in fact, it’s just as valid as any other.
The key insight that makes the concept useful is the fact that it enables you to reason about several things as if they were one.
Let’s construct one more set. The set of all balls that are warm in color. Let’s call it $Y$ (because in the diagram, it’s colored in yellow).
Notice that $Y$ contains only elements that are also present in $G$. That is, every element of the set of $Y$ is also an element in the set $G$. When two sets have this relation, we may say that $Y$ is a subset of $G$ (or $Y \subseteq G$). A subset resides completely inside its superset when the two are drawn together.
The set of all red balls contains just one ball. We said above that sets summarize several elements into one. Still, sets that contain just one element are perfectly valid — simply put, there are things that are one of a kind. The set of kings/queens that a given kingdom has is a singleton set.
What’s the point of the singleton set? Well, it is part of the language of set theory, e.g., if we have a function which expects a set of given items, but if there is only one item that meets the criteria, we can just create a singleton set with that item.
Of course if one is a valid answer, zero can be also. If we want a set of all black balls $B$ or all the white balls, $W$, the answer to all these questions is the same — the empty set.
Because a set is defined only by the items it contains, the empty set is unique — there is no difference between the set that contains zero balls and the set that contains zero numbers, for instance. Formally, the empty set is marked with the symbol $\varnothing$ (so $B = W = \varnothing$).
The empty set has some special properties, for example, it is a subset of every other set. Mathematically speaking, $\forall A \to \varnothing \subseteq A$ ($\forall$ means “for all”)
“By function I mean the unity of the act of arranging various representations under one common representation.” — Immanuel Kant, from “The Critique of Pure Reason”
A function is a relationship between two sets that matches each element of one set, called the source set of the function, with exactly one element from another set, called the target set of the function.
These two sets are also called the domain and codomain of the function, or its input and output. In programming, they go by the name of argument type and return type. In logic, they correspond to the premise and conclusion (we will get there). We might also say, depending on the situation, that a given function goes from this set to that other one, connects this set to the other, or that it converts a value from this set to a value from the other one. These different terms demonstrate the multifaceted nature of the concept of function.
Here is a function $f$, which converts each ball from the set $R$ to the ball with the opposite color in another set $G$ (in mathematics a function’s name is often accompanied by the names of its source and target sets, like this: $f: R → G$)
This is probably one of the simplest type of function that exists — it encodes a one-to-one relationship between the sets. That is to say, one element from the source is connected to exactly one element from the target (and the other way around).
But functions can also express relationships of the type many-to-one, where many elements from the source might be connected to one element from the target (but not the other way around). Below is one such function.
Such functions might represent operations such as categorizing a given collection of objects by some criteria, or partitioning them, based on some property that they might have.
A function can also express relationships in which some elements from the target set do not play a part.
An example might be the relationship between some kind of pattern or structure and the emergence of this pattern in some more complicated context.
We saw how versatile functions are, but there is one thing that you cannot have in a function. You cannot have a source element that is not mapped to anything, or that is mapped to more than one target element — that would constitute a many-to-many relationship and as we said functions express many-to-one relationships. There is a reason for that “design decision”, and we will arrive at it shortly.
Sets and functions can express relationships between all kinds of objects, and even people. Every question that you ask that has an answer can be expressed as a function.
The question “How far are we from New York?” is a function with set of all places in the world as source set and its target set consisting of the set of all positive numbers.
The question “Who is my father?” is a function whose source is the set of all people in the world.
Question: What is the target of this function?
Note that the question “Who is my child?” is NOT a straightforward function, because a person can have no children, or can have multiple children. We will learn to represent such questions as functions later.
Question: Do all functions that we drew at the beginning express something? Do you think that a function should express something in order to be valid?
For every set $G$, no matter what it represents, we can define the function that does nothing, or in other words, a function which maps every element of $G$ to itself. It is called the identity function of $G$ or $ID_{G}: G → G$.
You can think of $ID_{G}$ as a function which represents the set $G$ in the realm of functions. Its existence allows us to prove many theorems, that we “know” by intuition, formally.
For each set and subset, no matter what they represent, we can define a function (called the image of the subset) that maps each element of the subset to itself:
Every set is a subset of itself, in which case this function is the same as the identity.
There is a unique function from the empty set to any other set.
Question: Is this really valid? Why? Check the definition.
Note that this statement is also a result from the one saying that there is a function between a Subset and a Set, and the one that says that the empty set is a subset of any other set.
Question: What about the other way around. Are there functions with the empty set as a target as opposed to its source?
There is a unique function from any set to any singleton set.
Question: Is this really the only way to connect any set to a singleton set in a valid way?
Question: Again, what about the other way around?
All numerical operations can be expressed as functions acting on the set of (different types of) numbers.
Because not all functions work on all numbers, we separate the set of numbers to several sets, many of which are subsets to one another, such the set of whole numbers $\mathbb{Z} := {… -3 -2, -1, 0, 1, 2, 3… }$, the set of positive whole numbers, (also called “natural” numbers), $\mathbb{N} := {1, 2, 3… }$. We also have the set of Real numbers $\mathbb{R}$, which includes almost all numbers and the set of positive real numbers (or $\mathbb{R}_{>0}$).
Each numerical operation is a function between two of these sets. For example, squaring a number is a function from the set of real numbers to the set of real non-negative numbers (because both sets are infinite, we cannot draw them in their entirety, however we can draw a part of them).
I will use the occasion to reiterate some of the more important characteristics of functions:
Overall everything is permitted, as long as you can always provide exactly one result (also known as The result™) per value. For numerical operations, this is always true, simply because math is designed this way.
Every generalization of number has first presented itself as needed for some simple problem: negative numbers were needed in order that subtraction might be always possible, since otherwise a − b would be meaningless if a were less than b; fractions were needed in order that division might be always possible; and complex numbers are needed in order that extraction of roots and solution of equations may be always possible.
Bertrand Russell, from Introduction to Mathematical Philosophy
Note that most mathematical operations, such as addition, multiplication, etc. require two numbers in order to produce a result. This does not mean that they are not functions, it just means they’re a little fancier. Depending on what we need, we may present those operations as functions from the sets of tuples of numbers to the set of numbers, or we may say that they take a number and return a function. More on that later.
Sets are used extensively in programming, especially in their incarnation as types (also called classes). All sets of numbers that we discussed earlier also exist in most languages as types.
Sets are not exactly the same thing as types, but all types are (or can be seen as) sets. For example, we can view the Boolean
type as a set containing two elements — true
and false
.
Another very basic set in programming is the set of keyboard characters, or Char
. Characters are actually used rarely by themselves and mostly as parts of sequences.
Most of the types of programming are composite types — they are a combination of the primitive ones that are listed here. Again, we will cover these later.
Question: What is the type equivalent of subsets in programming?
Some functions in programming (also called methods, subroutines, etc.) kinda resemble mathematical functions — they sometimes take one value of a given type (or in other words, an element that belongs to a given set) and always return exactly one element which belongs to another type (or set). For example, here is a function that takes an argument of type Char
and returns a Boolean
, indicating whether the character is a letter.
However functions in most programming languages can also be quite different from mathematical functions — they can perform various operations that have nothing to do with returning a value. These operations are sometimes called side effects.
Why are functions in programming different? Well, figuring a way to encode effectful functions in a way that is mathematically sound isn’t trivial and at the time when most programming paradigms that are in use today were created, people had bigger problems than the their functions not being mathematically sound (e.g. actually being able to run any program at all).
Nowadays, many people feel that mathematical functions are too limiting and hard to use. And they might be right. But mathematical functions have one big advantage over non-mathematical ones — their type signature tells you almost everything about what the function does (this is probably the reason why most functional languages are strongly-typed).
We said that while all mathematical functions are also programming functions, the reverse is not true for most programming languages. However, there are some languages that only permit mathematical functions, and for which this equality holds. They are called purely-functional programming languages.
A peculiarity in such languages is that they don’t support functions that perform operations like rendering stuff on screen, doing I/O, etc. (in this context, such operations are called “side effects”.
In purely functional programming languages, such operations are outsourced to the language’s runtime. Instead of writing functions that directly perform a side effect, for example console.log('Hello')
, we write functions that return a type that represents that side effect (for example, in Haskell side effects are handled by the IO
type) and the runtime then executes those functions for us.
We then link all those functions into a whole program, often by using a thing called continuation passing style.
Now, we were just about to reach the heart of the matter regarding the topic of functions. And that is functional composition. Assume that we have two functions, $g: Y → P$ and $f: P → G$ and the target of the first one is the same set as the source of the second one.
If we apply the first function $g$ to some element from set $Y$, we will get an element of the set $P$. Then, if we apply the second function $f$ to that element, we will get an element from type $G$.
We can define a function that is the equivalent to performing the operation described above. That would be a function such that, if you follow the arrow $h$ for any element of set $Y$ you will get to the same element of the set $G$ as the one you will get if you follow both the $g$ and $f$ arrows.
Let us call it $h: Y → G$. We may say that $h$ is the composition of $g$ and $f$, or $h = f \circ g$ (notice that the first function is on the right, so it’s similar to $b = f(g(a)$).
Composition is the essence of all things categorical. The key insight is that the sum of two parts is no more complex than the parts themselves.
Question: Think about which qualities of a function make composition possible, e.g., does it work with other types of relationships, like many-to-many and one-to-many.
To understand how powerful composition is, consider the following: one set being connected to another means that each function from the second set can be transferred to a corresponding function from the first one.
If we have a function $g: P → Y$ from set $P$ to set $Y$, then for every function $f$ from the set $Y$ to any other set, there is a corresponding function $f \circ g$ from the set $P$ to the same set. In other words, every time you define a new function from $Y$ to some other set, you gain one function from $P$ to that same set for free.
For example, if we again take the relationship between a person and his mother as a function with the set of all people in the world as source, and the set of all people that have children as its target, composing this function with other similar functions would give us all relatives on a person’s mother side.
Although you might be seeing functional composition for the first time, the intuition behind it is there — we all know that each person whom our mother is related to is automatically our relative as well — our mother’s father is our grandfather, our mother’s partner is our father, etc.
Besides being useful for analyzing relationships that already exist, the principle of composition can help you in the practice of building objects that exhibit such relationships i.e. engineering.
One of the main ways in which modern engineering differs from ancient craftsmanship is the concept of a part/module/component - a product that performs a given function that is not made to be used directly, but is instead optimized to be combined with other such products in order to form a “end-user” product. For example, an espresso machine is just a combination of the components, such as , pump, heater, grinder group etc, when composed in an appropriate way.
Task: Think about what would be those functions’ sources and targets.
By the way, diagrams that are “zoomed out” that show functions without showing set elements are called external diagrams, as opposed to the ones that we saw before, which are internal.
Let’s look at the diagram that demonstrates functional composition in which we showed that successive application of the two composed functions ($f \circ g$) and the new function ($h$) are equivalent.
We showed this equivalence by drawing an internal diagram, and explicitly drawing the elements of the functions’ sources and targets in such a way that the two paths are equivalent.
Alternatively, we can just say that the arrow paths are all equivalent (all arrows starting from a given set element ultimately lead to the same corresponding element from the resulting set) and draw the equivalence as an external diagram.
The external diagram is a more appropriate representation of the concept of composition, as it is more general. In fact, it is so general that it can actually serve as a definition of functional composition.
The composition of two functions $f$ and $g$ is a third function $h$ defined in such a way that all the paths in this diagram are equivalent.
If you continue reading this book, you will hear more about diagrams in which all paths are equivalent (they are called commuting diagrams, by the way).
At this point you might be worried that I had forgotten that I am supposed to talk about category theory and I am just presenting a bunch of irrelevant concepts. I may indeed do that sometimes, but not right now - the fact that functional composition can be presented without even mentioning category theory doesn’t stop it from being one of category theory’s most important concepts.
In fact, we can say (although this is not an official definition) that category theory is the study of things that are function-like (we call them morphisms). They have a source and a target, compose with one another in an associative way, and can be represented by external diagrams.
And there is another way of defining category theory without defining category theory: it is what you get if you replace the concept of equality with the concept of isomorphism. We haven’t talked about isomorphisms yet, but this is what we will be doing till the end of this chapter.
To explain what isomorphism is, we go back to the examples of the types of relationships that functions can represent, and to the first and most elementary of them all — the one-to-one type of relationship. We know that all functions have exactly one element from the source set, pointing to one element from the target set. But for one-to-one functions the reverse is also true — exactly one element from the target set points to one element from the source.
If we have a one-to-one-function that connects sets that are of the same size (as is the case here), then this function has the following property: all elements from the target set have exactly one arrow pointing at them. In this case, the function is invertible. That is, if you flip the arrows of the function and its source and target, you get another valid function.
Invertible functions are called isomorphisms. When there exists an invertible function between two sets, we say that the sets are isomorphic. For example, because we have an invertible function that converts the temperature measured in Celsius to temperature measured in Fahrenheit, and vise versa, we can say that temperatures measured in Celsius and Fahrenheit are isomorphic.
Isomorphism means “same form” in Greek (although actually their form is the only thing which is different between two isomorphic sets).
More formally, two sets $R$ and $G$ are isomorphic (or $R ≅ G$) if there exist functions $f: G → R$ and its reverse $g: R → G$, such that $f \circ g = ID_{R}$ and $g \circ f = ID_{G}$ (notice how the identity function comes in handy).
If you look closely you would see that the identity function is invertible too (its reverse is itself), so each set is isomorphic to itself in that way.
Therefore, the concept of an isomorphism contains the concept of equality — all equal things are also isomorphic.
An interesting fact about isomorphisms is that if we have functions that convert a member of set $A$ to a member of set $B$, and the other way around, then, because of functional composition, we know that any function from/to $A$ has a corresponding function from/to $B$.
For example, if you have a function “is the partner of” that goes from the set of all married people to the same set, then that function is invertible. That is not to say that you are the same person as your significant other, but rather that every statement about you, or every relation you have to some other person or object is also a relation between them and this person/object, and vice versa.
Another interesting fact about isomorphisms is that if we have two isomorphisms that have a set in common, then we can obtain a third isomorphism between the other two sets that would be the result of their (the isomorphisms) composition.
Composing two isomorphisms into another isomorphism is possible by composing the two pairs of functions that make up the isomorphism in the two directions.
Informally, we can see that the two morphisms are indeed reverse to each other and hence form an isomorphism. If we want to prove that fact formally, we will do something like the following:
Given that if two functions are isomorphic, then their composition is equal to an identity function, proving that functions $g \circ f$ and $f’ \circ g’$, are isomorphic is equivalent to proving that their composition is equal to identity.
$g \circ f \circ f’ \circ g’ = id$
But we know already that $f$ and $f’$ are isomorphic and hence $f\circ f’ = id$, so the above formula is equivalent to (you can reference the diagram to see what that means):
$g \circ id \circ g’ = id$
And we know that anything composed with $id$ is equal to itself, so it is equivalent to:
$g \circ g’ = id$
which is true, because $g$ and $g’$ are isomorphic and isomorphic functions composed are equal to identity.
By the way, there is another way to obtain the isomorphism — by composing the two morphisms one way in order to get the third function and then taking its reverse. But to do this, we have to prove that the function we get from composing two bijective functions is also bijective.
Between any two singleton sets, we may define the only possible function.
The function is invertible, which means that all singleton sets are isomorphic to one another, and furthermore (which is important) they are isomorphic in one unique way.
Following the logic from the last paragraph, each statement about something that is one of a kind can be transferred to a statement about another thing that is one of a kind.
Question: Come up with a good example of a statement that demonstrates the isomorphism between singleton sets (I obviously couldn’t). Consider that all of people and objects are sharing one and the same universe.
We said that isomorphic sets aren’t necessarily the same set (although the reverse is true). However, it is hard to get away from the notion that being isomorphic means that they are equal or equivalent in some respect. For example, all people who are connected by the isomorphic mother/child relationship share some of the same genes.
And in computer science, if we have functions that convert an object of type $A$ to an object of type $B$ and the other way around (as for example the functions between a data structure and its id), we also can pretty much regard $A$ and $B$ as two formats of the same thing, as having one means that we can easily obtain the other.
What does it mean for two things to be equivalent? The question sounds quite philosophical, but there is actually is a formal way to answer it, i.e., there is a mathematical concept that captures the concept of equality in a rather elegant way — the concept of an equivalence relation.
So what is an equivalence relation? We already know what a relation is — it is a connection between two sets (an example of which is function). But when is a relation an equivalence relation? Well, according the definition, it’s when it follows three laws, which correspond to three intuitive ideas about equality. Let’s review them.
The first idea that defines equivalence, is that everything is equivalent with itself.
This simple principle translates to the equally simple law of reflexivity: for all sets $A$, $A=A$.
According to the Christian theology of the Holy Trinity, the Jesus’ Father is God, Jesus is God, and the Holy Spirit is also God, however, the Father is not the same person as Jesus (neither is Jesus the Holy Spirit). If this seems weird to you, that’s because it breaks the second law of equivalence relations, transitivity. Transitivity is the idea that things that are both equal to a third thing must also equal between themselves.
Mathematically, for all sets $A$ $B$ and $C$, if $A=B$ and $B=C$ then $A=C$.
Note that we don’t need to define what happens in similar situations that involve more than three sets, as they can be settled by just multiple application of this same law.
If one thing is equal to another, the reverse is also true, i.e, the other thing is also equal to the first one. This idea is called symmetry. Symmetry is probably the most characteristic property of the equivalence relation, which is not true for almost any other relation.
In mathematical terms: if $A=B$ then $B=A$.
Isomorphisms are indeed equivalence relations. And “incidentally”, we already have all the information needed to prove it (in the same way in which James Bond seems to always incidentally have exactly the gadgets that are needed to complete his mission).
We said that the most characteristic property of the equivalence relation is its symmetry. And this property is satisfied by isomorphisms, due to the isomorphisms’ most characteristic property, namely the fact that they are invertible.
Task: One law down, two to go: Go through the previous section and verify that isomorphisms also satisfy the other equivalence relation laws.
The practice of using isomorphisms to define an equivalence relation is very prominent in category theory where isomorphisms are denoted with $≅$, which is almost the same as $=$ (and is also similar to having two opposite arrows connecting one set to the other).
Many people would say that the concept of a number is the most basic concept in mathematics. But actually they are wrong — sets and isomorphisms are more basic! Or at least, numbers can be defined using sets and isomorphisms.
To understand how, let’s think about how you teach a person what a number is (in particular, here we will concentrate on the natural, or counting numbers). You may start your lesson by showing them a bunch of objects that are of a given quantity, like for example if you want to demonstrate the number $2$, you might bring them like two pencils, two apples or two of something else.
When you do that, it would be important to highlight that you are not referring to only the left object, or only about the right one, but that we should consider both things at once, i.e., both things as one, so if the person to whom you are explaining happens to know what a set is, this piece of knowledge might come in handy. Also, being good teachers, we might provide them with some more examples of sets of 2 things.
This is a good starting point, but the person may still be staring at the objects instead of the structure — they might ask if this or that set is $2$ as well. At this point, if the person whom you are explaining happens to know about isomorphisms (let’s say they lived in a cave with nothing but this book with them), you can easily formulate your final definition, saying that the number $2$ is represented by those sets and all other sets that are isomorphic to them, or by the equivalence class of sets that have two elements, as the formal definition goes (don’t worry, we will learn all about equivalence classes later).
At this point there are no more examples that we can add. In fact, because we consider all other sets as well, we might say that this is not just a bunch of examples, but a proper definition of the number $2$. And we can extend that to include all other numbers. In fact, the first definition of a natural number (presented by Gottlob Frege in 1884) is roughly based on this very idea.
Before we close this chapter, there is one meta-note that we should definitely make: according to the definition of a number that we presented, a number is not an object, but a whole system of interconnected objects, containing in this case an infinite number of objects. This may seem weird to you, but it’s actually pretty characteristic of the categorical way of modeling things.
An unstructured monolithic design is not a good idea, except maybe for a tiny operating system in, say, a toaster, but even there it is arguable.— Andrew S. Tanenbaum
Software development is a peculiar discipline — in theory, it should be just some sort of engineering, however the way it is executed in practice is sometimes closer to craftsmanship, with the principle of composition not being utilized to the fullest.
To see why, imagine a person (e.g. me), tinkering with some sort of engineering problem e.g. trying to fix a machine or modify it to serve a new purpose. If the machine in question is mechanical or electrical, this person will be forced to pretty much make due with the components that already exist, simply because they can rarely afford to manufacture new components themselves (or at least they would avoid it if possible). This limitation, forces component manufacturers to create components that are versatile and that work well together, in the same way in which pure functions work well together. And this in turn makes it easy for engineers to create better machines without doing all the work themselves.
But things are different if the machine in question is software-based — due to the ease with which new software components can be rolled out, our design can blur the line that separates some of the components or even do away with the concept of component altogether and make the whole program one giant component (monolithic design). Worse, when no ready-made components are available, this approach is actually easier than the component-based approach that we described in the previous paragraph, and so many people use it.
This is bad, as the benefits of monolithic design are mostly short-term — not being separated to components makes programs harder to reason about, harder to modify (e.g. you cannot replace a faulty component with a new one) and generally more primitive than component-based programs. For these reasons, I think that currently, programmers are losing out by not utilizing the principles of functional composition. In fact, I was so unhappy with the situation that I decided to write a whole book on applied category theory to help people understand the principles of composition better — it’s called Category Theory Illustrated (Oh wait, I am writing that right now, aren’t I?)
In this chapter, we will see some more set-theoretic constructs, but we will also introduce their category-theoretic counterparts in an effort to gently introduce the concept of a category itself.
When we are finished with that, we will try, (and almost succeed) to define categories from scratch, without actually relying on set theory.
In the previous chapter, we needed a way to construct a set whose elements are composite of the elements of some other sets e.g. when we discussed mathematical functions, we couldn’t define $+$ and $-$ because we could only formulate functions that take one argument. Similarly, when we introduced the primitive types in programming languages, like Char
and Number
, we mentioned that most of the types that we actually use are composite types. So how do we construct those?
The simplest composite type, of the sets $B$, that contains $b$’s and the set $Y$, that contains $y$’s is the Cartesian product of $B$ and $Y$, that is the set of ordered pairs that contain one element of the set $Y$ and one element of the set $B$. Or formally speaking: $Y \times B = { (y, b) }$ where $y ∈ Y, b ∈ B$ ($∈$ means “is an element of”).
It is denoted $B \times Y$ and it comes equipped with two functions for retrieving the $b$ and the $y$ from each $(b, y)$.
Question: Why is this called a product? Hint: How many elements does it have?
The concept of Cartesian product was first defined by the mathematician and philosopher René Descartes, as a basis for the Cartesian coordinate system, which is why both concepts are named after him (although it does not look like it, as they use the Latinized version of his name).
Most people know how a Cartesian coordinate system works, but an equally interesting question, which few people think about, is how we can define it using sets and functions.
A Cartesian coordinate system consists of two perpendicular lines, situated on a Euclidean plane and some kind of mapping that resembles a function, connecting any point in these two lines to a number, representing the distance between the point that is being mapped and the lines’ point of overlap (which is mapped to the number $0$).
Using this construct (as well as the concept of a Cartesian product), we can describe not only the points on the lines but any point on the Euclidean plane. We do that by measuring the distance between the point and those two lines.
And, since the point is the main primitive of Euclidean geometry, the coordinate system allows us to also describe all kinds of geometric figures such as this triangle (which is described using products of products).
So we can say that the Cartesian coordinate system is some kind of function-like mapping between sets of products of numbers and geometric figures that correspond to these numbers, using which we can derive some properties of the figures using the numbers (for example, using the products in the example below, we can compute that the triangle that they represent has a base of $6$ units and height of $5$ units.
What’s even more interesting is that this mapping is one-to-one, which makes the two realms isomorphic (traditionally we say that the point is completely described by the coordinates, which is the same thing).
With that, our effort to represent Cartesian coordinates with sets is complete, except we haven’t described the mapping that connects points and numbers. The mapping resembles a function and exhibits all relevant properties of a function (many-to-one mapping (or one-to-one in this case)), however, as we said, functions are mappings between sets and in this case, even if we can think of the coordinate system as a set (of points and figures), geometrical figures are definitely not a set, as they have a lot of additional things going on (or additional structure, as a category theorist would say). So, defining this mapping formally would require us to also formalize both geometry and algebra, and moreover to do so in a way in which they would be compatible with one another. These are some of the ambitions of category theory and this is what we will attempt to do later in this book.
But before we continue with that, let’s see some other neat uses of products.
In the previous chapter, we established the correspondence of various concepts in programming languages and set theory — sets resemble types, and functions resemble methods/subroutines. This picture is made complete with products, that are like stripped-down classes (also called records or structs) — the sets that form the product correspond to the class’s properties (also called members) and the functions for accessing them are like what programmers call getter methods e.g. the famous example of object-oriented programming of a Person
class with name
and age
fields is nothing more than a product of the set of strings, and the sets of numbers. And objects with more than two values can be expressed as products the composites of which are themselves products.
Products can also be used for expressing functions that take more than one argument (and this is indeed how multi-param functions are implemented in languages that actually have tuples, like the ones from the ML family ). For example, “plus” is a function from the set of products of two numbers to the set of numbers, so, $+: \mathbb{Z} \times \mathbb{Z} → \mathbb{Z}$.
By the way, such functions (ones that take two objects of one type and return a third object of the same type) are called operations.
A product is, as we said, a set of ordered pairs (formally speaking $A \times B ≠ B \times A$). So, to define a product we must define the concept of an ordered pair. So how can we do that?
Note that an ordered pair of elements is not just a set containing the two elements (that would be an _ unordered pair_) but it also contains information about which of those objects comes first and which one goes second in the pair. In programming, we have the ability to assign names to each member of an object, which accomplishes the same purpose.
The order of elements in the pair is important. While some mathematical operations (such as addition) don’t care about order, others (such as subtraction) do. In programming, when we manipulate an object we obviously want to access a specific property of the object, not just any random property.
So does that mean that we have to define ordered pairs as a “primitive” type like we defined sets if we want to use them? That’s possible, but there is another approach if we can define a construct that is isomorphic to the ordered pair, using only sets, we can use that construct instead of them. And mathematicians have come up with multiple ingenious ways to do that. Here is the first one, which was suggested by Norbert Wiener in 1914. Note the smart use of the fact that the empty set is unique.
The next one was suggested by Felix Hausdorff in the same year. In order to use that one, we just have to define $1$, and $2$ first.
Suggested in 1921 by Kazimierz Kuratowski, this one uses just the component of the pair.
In the product definitions presented in the previous section worked by zooming in into the individual elements of the product and seeing what they can be made of. We may think of this as a low-level approach to the definition. This time we will try to do the opposite — we will try to be as oblivious to the contents of our sets as possible i.e. instead of zooming in we will zoom out, we will attempt to fly over the difficulties that we met in the previous section and provide a definition of a product in terms of functions and external diagrams.
To define products in terms of external diagrams, we must, given two sets, devise a way to pinpoint the set that is their product, by looking at the functions that come from/to them. To achieve that, we must first ask ourselves what functions are defined for all products. The answer: we have two of those — the functions for retrieving the two elements of the pair (the “getters”, so to say).
Formally, if we have a set $G$ which is the product of sets $Y$ and $B$, then we should also have functions which give us back the elements of the product, so $G → Y$ and $G → B$. Now let’s switch to the external view.
This diagram already provides a definition of what a product is, but not a complete definition, as the product of sets $Y$ and $B$, is not the only set for which such functions can be defined. For example, a set of triples of $Y \times B \times R$ for any element $R$ also qualifies. And if there is a function from $G$ to $B$ then the set $G$ itself meets our condition for being the product, because it is connected to $B$ and to itself.
However, the set of triples $Y \times B \times R$ is connected to $Y$ and $B$ only because it can be converted to $Y \times B$: the arrow $Y \times B \times R \to B$ is just the composition of the arrow $Y \times B \times R \to Y \times B$ and the arrow $Y \times B \to B$. The same reasoning applies to all other objects that can take the place of our product.
(Intuitively, all such objects would be more complex than the pair objects and you can always have a function that converts a more complex structure, to a simpler one (we saw an example of this when we covered the functions that convert subsets to their supersets).)
More formally, if we suppose that there is a set $I$ that can serve as an impostor of the product of sets $B$ and $Y$ (i.e. that $I$ is such that there exist two functions, which we will call $ib: I → B$ and $iy: I → Y$ that allow us to derive elements $B$ and $Y$ from it), then there must also exist a unique function with the type signature $I → B \times Y$ that converts the impostor to the real product, and $ib$ and $iy$ are just results of composing that function with the usual “getter” functions that go from the pair to the elements. In other words, whichever object we pick for $I$, this diagram would commute).
In category theory, this type of property that a given object might possess (participating in a structure such that all similar objects can be converted to/from it) is called a universal property, but we won’t go into more detail, as it is a bit early (after all we haven’t even yet said what category theory is).
One thing that we should point out is that this definition (as all the previous ones, by the way) does not rule out the sets which are isomorphic to the product — when we represent things using universal properties, an isomorphism is treated as equality. This is the same viewpoint that we adopt in programming, especially when we work on the higher level — there might be many different implementations of pair, but as long as they work in the same way (i.e. we can convert one to the other and vice versa) they are all the same for us).
We will now study a construct that is pretty similar to the product but at the same time is very different. Similar because, like the product, it is a relation between two sets which allows you to unite them into one, without erasing their structure. But different as it encodes a quite different type of relation — a product encodes an and relation between two sets, while the sum encodes an or relation.
The sum of two sets $B$ and $Y$, denoted $B + Y$ is a set that contains all elements from the first set combined with all elements from the second one.
We can immediately see the connection with the or logical structure: For example, because a parent is either a mother or a father of a child, the set of all parents is the sum of the set of mothers and the set of fathers, or $P = M + F$.
As with the product, representing sums in terms of sets is not so straightforward e.g. when a given object is an element of both sets, then it appears in the sum twice which is not permitted, because a set cannot contain the same element twice.
As with the product, the solution is to put some extra structure.
And, as with the product, there is a low-level way to express a sum using sets alone. Incidentally, we can use pairs.
As you might already suspect, the interesting part is expressing the sum of two sets using functions. To do that, we have to go back to the conceptual part of the definition. We said that sums express an or relation between two things.
A property of every or relation is that if something is an $A$ that something is also an $A \vee B$, same for $B$ (The $\vee$ symbol means or by the way). For example, if my hair is brown, then my hair is also either blond or brown. This is what or means, right? This property can be expressed as a function, two functions actually — one for each set that takes part in the sum relation (for example, if parents are either mothers or fathers, then there surely exist functions $mothers → parents$ and $fathers → parents$).
As you might have already noticed, this definition is pretty similar to the definition of the product from the previous section. And the similarities don’t end here. As with products, we have sets that can be thought of as impostor sums — ones for which these functions exist, but which also contain additional information.
All these sets express relationships which are more vague than the simple sum, and therefore given such a set, there would exist a unique function that would distinguish it from the true sum. The only difference is that, unlike the functions that define products, this time this function goes from the sum to the impostor.
The concepts of product and sum might already look similar in a way when we view them through their internal diagrams. The external view makes this similarity precise — these two diagrams are one and the same diagram, only their arrows are flipped — many-to-one relationships become one-to-many and the other way around.
The universal properties that define the two constructs are the same as well — if we have a sum $Y + B$, for each impostor sum, such as $Y + B + R$, there exists a trivial function $Y + B \to Y + B + R$.
And, if you remember, with products the arrows go the other way around — the equivalent example for a product would be the function $Y \times B \times R \to Y \times B $
This fact uncovers a deep connection between the concepts of the product and sum, which is not otherwise apparent — they are each other’s opposites. Product is the opposite of sum and sum is the opposite of product.
In category theory, concepts that have such a relationship are said to be dual to each other. So, the concepts of product and sum are dual. This is why sums are known in a category-theoretic setting as converse product, or coproduct for short. This naming convention is used for all dual constructs in category theory.
Now let’s look at how the concepts of product and sum from the viewpoint of logic. We mentioned that:
When we view those sets as propositions, we discover the concept of the product ($\times$) corresponds exactly to the and relation in logic (denoted $\land$). From this viewpoint, the function $Y \times B \to Y$ can be viewed as an instance of a logical rule of inference called conjunction elimination (also called simplification) stating that, $Y \land B \to Y$, for example, if your hair is partly blond and partly brown, then it is partly blond.
By the same token, the concept of a sum ($+$) corresponds to the or relation in logic (denoted $\lor$). From this viewpoint, the function $Y \to Y + B$ can be viewed as an instance of a logical rule of inference called disjunction introduction, stating that, $Y \to Y \lor B$ for example, if your hair is blond, it is either blond or brown.
This means, among other things, that the concepts of and and or are also dual — an idea which was put forward in the 19th century by the mathematician Augustus De Morgan and is henceforth known as De Morgan duality, and which is a predecessor to the modern idea of duality in category theory, that we examined before.
The duality of $\land$ and $\lor$ can be seen in the two formulas that are most often associated with De Morgan which are known as De Morgan laws (although De Morgan didn’t actually discover those (they were previously formulated, by William of Ockham (of “Ockham’s razor” fame, among other people)).
$\neg(A \wedge B) = \neg{A} \vee \neg{B}$
$\neg(A \vee B) = \neg{A} \wedge \neg{B}$
You can read the second formula as, for example, if my hair is not blond or brown, this means that my hair is not blond and my hair is not brown, and vice versa (the connection works both ways)
Now we will go through the formulas and we will try to show that they are actually a simple corollary of the duality between and and or
Let’s say we want to find the statement that is opposite of “blond or brown”.
$A \vee B$
The first thing we want to do is, to replace the statements that constitute it with their opposites, which would make the statement “not blond or not brown”
$\neg{A} \vee \neg{B}$
However, this statement is clearly not the opposite of “blond or brown”(saying that my hair is not blond or not brown does in fact allow it to be blond and also allows for it to be brown, it just doesn’t allow it to be both of these things).
So what have we missed? Simple: we replaced the propositions with their opposites, but we didn’t replace the operator that connects them — it is still or for both propositions. So we must replace it with or converse. As we said earlier, and as you can see by analyzing this example, this operator is and So the formula becomes “not blond and not brown”.
$\neg{A} \wedge \neg{B}$
Saying that this formula is the opposite of “blond and brown” — is the same thing as saying that it is equivalent to its negation, which is precisely what the second De Morgan law says.
$\neg(A \vee B) = \neg{A} \wedge \neg{B}$
And if we “flip” this whole formula (we can do that without changing the signs of the individual propositions, as it is valid for all propositions) we get the first law.
$\neg(A \wedge B) = \neg{A} \vee \neg{B}$
This probably provokes a lot of questions and we have a whole chapter about logic to address those. But before we get to it, we have to see what categories (and sets) are.
So far in the book, we saw some amazing ways of defining set-theoretic constructs without looking at the set elements and by only using functions and external diagrams.
In the first chapter, we defined functions and functional composition with this diagram.
And now, we also defined products and sums.
What’s even more amazing, is that we can define all of set-theory, based just on the concept of functions, as discovered by the category theory pioneer Francis William Lawvere.
Traditionally, everything in set theory is defined in terms of two things: sets and elements, so, if we want to define it using sets and functions, we must define the concept of a set element in terms of functions.
To do so, we will use the singleton set.
OK, let’s start by taking a random set which we want to describe.
And let’s examine the functions from the singleton set, to that random set.
It’s easy to see that there would be exactly one function for each element of the set i.e. that each element of any set $X$ is isomorphic to a function \(1 \to X\).
So, we can say that what we call “elements” of a set are the functions from the singleton set to it.
Now, after coming up with a definition of a set element, based on functions, we can try to draw the elements of our set as an external diagram.
However, our diagram is not yet fully external, as it depends on the idea of the singleton set, i.e. the set with one element. Furthermore, this makes the whole definition circular, as we cannot define the concept of a one-element set, without the concept of element
To avoid these difficulties, we devise a way to define the singleton set, using just functions. We do it in the same way that we did for products and sums - by using a unique property that the singleton set has. In particular, there is exactly one function from any other set to the singleton set i.e. if $1$ is the singleton set, then we have $\forall X \exists! X \to 1$.
It turns out that this property defines the singleton set uniquely i.e. there is no other set that has it, other than the sets that are isomorphic to the singleton set. This is simply because, if there are two sets that have it, those two sets would also have unique morphisms between themselves so they would be isomorphic to one another. More formally, if we have two sets $X$ and $Y$, such that $\exists!X \to 1 \land \exists!Y \to 1$ then we also have $X \cong Y$.
And because there is no other set, other than the singleton set that has this property, we can use it as a definition of the singleton set and say that if we have $\forall X \exists! X \to 1$, then $1$ is the singleton set.
With this, we acquire a fully external definition (up to an isomorphism) of the singleton set, and thus a definition of a set element — the elements of a given set are just the functions from the singleton set to that set.
Note that from this property it follows that the singleton set has exactly one element, which confirms that our definition is correct.
Question: Why exactly does it follow (check the definition)?
The empty set is the set that has no elements, but how would we say this without referring to elements?
We said that there exists a unique function that goes from the empty set to any other set. But the reverse is also true: the empty set is the only set such that there exists a function from it to any other set.
Observant readers will notice the similarities between the diagrams depicting the initial and terminal object (yes the two concepts are, of course, dual of each other).
Some even more observant readers may also notice the similarities between the product/coproduct diagrams and the initial/terminal object diagrams.
To them, I would like to say: Folks, keep it down please, you are too observant — we have, like, 4 chapters to go until we get to this.
After seeing the functional definition of set elements, we might be inclined to ask the following: If elements are represented by functions, then how do you apply a given function to an element of a set, (and retrieve an element of another set)?
The answer is surprisingly simple — selecting an element from a set is the same as constructing a function from the singleton set to that element.
And then applying a function to an element is the same as composing the element function, with the function we want to apply.
The result is the function that represents the element that is the result of the application.
If we had more time, we would cover the entirety of Lawvere’s Elementary Theory of the Category of Sets (or ETCS for short), and list all concepts and axioms that are required to define a rigorous set theory using functions, but this is enough for you to get the main idea: that these axioms constitute a definition of set theory, that is based entirely on functions. This is a key idea, but there is an even bigger thing there: because it is more general than the traditional definition, this new definition also applies to objects that are not exactly sets but are like sets in some respects.
You may say that they apply to entirely different categories of objects.
Maybe it is about time to see what a category is. We will start with a short definition — a category consists of objects (an example of which are sets) and morphisms that go from one object to another (which behave as functions) and that are composable. We can say a lot more about categories, and even present a formal definition, but for now, it is sufficient for you to remember that sets are one example of a category and that categorical objects are like sets, except that we don’t see their elements. Or to put it another way, category-theoretic notions are captured by the external diagrams, while strictly set-theoretic notions can be captured by internal ones.
When we are within the realm of sets, we can view each set as a collection of individual elements. In category theory, we don’t have such a notion. However, taking this notion away allows us to define concepts such as the sum and product sets in a whole different and more general way. Plus we always have a way to “go back” to set theory, using the tricks from the last section.
Category Theory | Set theory | Programming Languages |
---|---|---|
Category | N/A | N/A |
Objects and Morphisms | Sets and Functions | Classes and methods |
N/A | Element | Object |
Notice the somehow weird, (but actually completely logical) symmetry (or perhaps “reverse symmetry”) between the world as viewed through the lenses of set theory, and the way it is viewed through the lens of category theory:
Category Theory | Set theory |
---|---|
Category | N/A |
Objects and Morphisms | Sets and functions |
N/A | Element |
By switching to external diagrams, we lose sight of the particular (the elements of our sets), but we gain the ability to zoom out and see the whole universe where we have been previously trapped. In the same way that the whole realm of sets can be thought of as one category, a programming language can also be thought of as a category. The concept of a category allows us to find and analyze similarities between these and other structures.
NB: The word “Object” is used in both programming languages and in category theory, but has completely different meanings. A categorical object is equivalent to a type or a class in programming language theory.
One remark before we continue: in the last section, would make it seem like category theory and set theory are somehow competing with each other. Perhaps that notion would be somewhat correct if category and set theory were meant to describe concrete phenomena, in the way that the theory of relativity and the theory of quantum mechanics are both supposed to explain the physical world. Concrete theories are conceived mainly as descriptions of the world, and as such it makes sense for them to be connected in some sort of hierarchy.
In contrast, abstract theories, like category theory and set theory, are more like languages for expressing such descriptions — they still can be connected, and are connected in more than one way, but there is no inherent hierarchical relationship between the two and therefore arguing over which of the two is more basic, or more general, is just a chicken-and-egg problem, as you will see in the next chapter.
“…deal with all elements of a set by ignoring them and working with the set’s definition.” — Dijkstra (from “On the cruelty of really teaching computing science”)
All category theory books, including this one, start by talking about set theory. Looking back, I really don’t know why this is the case — books that focus on a given subject usually don’t start off by introducing an entirely different subject, (before even starting to talk about the main one). Perhaps the set-first approach is the best way to introduce people to categories. Or perhaps using sets to introduce categories is one of those things that people do just because everyone else does it. But, one thing is for certain — we don’t need to study sets in order to understand categories. So now I would like to start over and talk about categories as a foundational concept. So let’s pretend like this is a new book (I wonder if I can dedicate this to a different person).
A category is a collection of objects (things) where the “things” can be anything you want. Consider, for example, these colorful gray balls:
A category consists of a collection of objects as well as some arrows connecting objects to one another. We call the arrows morphisms.
Wait a minute, we said that all sets form a category, but at the same time, any one set can be seen as a category in its own right (just one which has no morphisms). This is true and an example of a phenomenon that is very characteristic of category theory — one structure can be examined from many different angles and may play many different roles, often in a recursive fashion.
This particular analogy (a set as a category with no morphisms) is, however, not very useful. Not because it’s incorrect in any way, but rather because category theory is all about the morphisms. If the arrows in set theory are nothing but a connection between the sets that serve as their source and a destination, in category theory it’s the objects that are nothing but a source and destination for the arrows that connect them to other objects. This is why, in the diagram above, the arrows, and not the objects, are colored: if you ask me, the category of sets should really be called the category of functions.
Speaking of which, note that objects in a category can be connected by multiple arrows and that having the same source and target sets does not in any way make arrows equivalent.
Why that is true is pretty obvious if we go back to set theory for a second (OK, maybe we really have to do it from time to time). There are, for example, an infinite number of functions that go from number to boolean, and the fact that they have the same input type and the same output type (or the same type signature, as we like to say) does not in any way make them equivalent to one another.
There are some types of categories in which only one morphism between two objects is allowed (or one in each direction), but we will talk about them later.
The most important requirement for a structure to be called a category is that two morphisms can make a third, or in other words, that morphisms are composable — given two successive arrows with appropriate type signatures, we can make a third one (in set theory, it is equivalent to the consecutive application of the first two).
Formally, this requirement says that there should exist an operation, usually denoted with the symbol $•$ such that for each pair of morphisms $g: A → B$ and $f: B → C$, there exists a morphism $(f • g): A → C$.
This is the most important part of the definition — if you remember, in set theory, we picked functions, as opposed to the other types of relations because they are composable, here we just invent the concept of a morphism and define them to be composable. You will see where this definition would get us.
NB: Note, that functional composition is read from right to left. e.g. applying $g$ and then applying $f$ is written $f • g$ and not the other way around. (You can think of it as a shortcut to $f(g(a))$).
To have numbers, you have to have a zero. The zero of category theory is what we call the “identity morphism” for each object. In short, this is a morphism that doesn’t do anything.
It’s important to mark this morphism because there can be (let’s again add this very important (and by now probably also very boring) reminder) many morphisms that go from one object to the same object. For example, in the category of sets, we deal with a multitude of functions that have the set of numbers as source and target, such as $negate$, $square$, $add one$, and are not at all the identity morphism.
A structure must have an identity morphism for each object in order for it to be called a category — this is known as the law of identity.
Question: What is the identity morphism in the category of sets?
Composition is special not only because you can take any two morphisms with appropriate signatures and make a third, but because you can do so indefinitely, i.e. for each $n$ successive arrows, each of which has as a source object the target object of the previous, we can draw one (exactly one) arrow that is equivalent to the consecutive application of all $n$ arrows.
But let’s get back to the math. If we carefully review the definition above, we can see that it can be reduced to multiple applications of the following formula: given 3 objects and 2 morphisms between them $f$ $g$ $h$, combining $h$ and $g$ and then combining the end result with $f$ should be the same as combining $h$ to the result of $g$ and $f$ (or simply $(h • g) • f = h • (g • f)$).
This formula can be expressed using the following diagram, which would only commute if the formula is true (given that all our category-theoretic diagrams are commutative, we can say, in such cases, that the formula and the diagram are equivalent).
This formula (and the diagram) is the definition of a property called $associativity$. Being associative is required for functional composition to really be called functional composition (and thus for a category to really be called a category). It is also required for our diagrams to work, as diagrams can only represent associative structures (imagine if the diagram above would not commute, it would be super weird).
Associativity is not just about diagrams. For example, when we express relations using formulas, associativity just means that brackets don’t matter in our formulas (as evidenced by the definition $(h • g) • f = h • (g • f)$).
And it is not only about categories either, it is a property of many other operations on other types of objects as well e.g. if we look at numbers, we can see that the multiplication operation is associative e.g. $(1 \times 2) \times 3 = 1 \times (2 \times 3)$. While division is not $(1 / 2) / 3 = 1 / (2 / 3)$.
This approach (composing indefinitely many things) for building stuff is often used in programming. To see some examples, you don’t need to look further than the pipe operator in Unix (|
), which feeds the standard output of a program into the standard input of another program. If you want to look further, note that there is a whole programming paradigm based on functional composition, called “concatenative programming” utilized in languages like Forth and Factor.
The diagrams above use colours to illustrate the fact that the green morphism is equivalent to the other two (and not just some unrelated morphism), but in practice this notation is a little redundant — the only reason to draw diagrams in the first place is to represent paths that are equivalent to each other. All other paths would just belong in different diagrams.
As we mentioned briefly in the last chapter, all diagrams that are like that (ones in which any two paths between two objects are equivalent to one another) are called commutative diagrams (or diagrams that commute). All diagrams in this book (except the incorrect ones, nudge nudge) commute.
More formally, a commuting diagram is a diagram in which given two objects $a$ and $b$ and two sequences of morphisms between those two objects, we can say that those sequences are equivalent.
The diagram above is one of the simplest commuting diagrams.
For future reference, let’s restate what a category is:
A category is a collection of objects (we can think of them as points) and morphisms (or arrows) that go from one object to another, where: Each object has to have the identity morphism. There should be a way to compose two morphisms with an appropriate type signature into a third one, in a way that is associative.
This is it.
Why are categories defined by those two laws and not some other two (or one, three, four etc.). laws? From one standpoint, the answer to that seems obvious — we study categories because they work, I mean, look at how many applications are there.
But at the same time, category theory is an abstract theory, so everything about it is kinda arbitrary: you can remove a law — and you get another theory that looks similar to category theory (although it might actually turn out to be quite different in practice). Or you can add one more law and get yet another theory (by the way, there are indeed such laws and such theories), so if this specific set of laws works better than any other, then this fact demands an explanation. Not a mathematical explanation (e.g. we cannot prove that this theory is better than some other one), but an explanation nevertheless. What follows is my attempt to provide such an explanation, regarding the laws of identity and associativity.
The reason the identity law is required is by far the more obvious one. Why do we need to have a morphism that does nothing? It’s because morphisms are the basic building blocks of our language, we need the identity morphism to be able to speak properly. For example, once we have the concept of identity morphism defined, we can define a category-theoretic definition of an isomorphism, based on it (which is important, because the concept of an isomorphism is very important for category theory).
As we said in the previous chapter, an isomorphism between two objects ($A$ and $B$) consists of two morphisms — ($A → B$. and $B → A$) such that their compositions are equivalent to the identity functions of the respective objects. Formally, objects $A$ and $B$ are isomorphic if there exist morphisms $f: A → B$ and $g: B → A$ such that $f \circ g = ID_{B}$ and $g \circ f = ID_{A}$.
And here is the same thing expressed with a commuting diagram.
Like the previous one, the diagram expresses the same (simple) fact as the formula, namely that going from the one of objects ($A$ and $B$) to the other one and then back again to the starting object is the same as applying the identity morphism i.e. doing nothing.
If, in some cataclysm, all of scientific knowledge were to be destroyed, and only one sentence passed on to the next generations of creatures, what statement would contain the most information in the fewest words? I believe it is the atomic hypothesis (or the atomic fact, or whatever you wish to call it) that all things are made of atoms—little particles that move around in perpetual motion, attracting each other when they are a little distance apart, but repelling upon being squeezed into one another. In that one sentence, you will see, there is an enormous amount of information about the world, if just a little imagination and thinking are applied. — Richard Feynman
Associativity — what does it mean and why is it there? In order to tackle this question, we must first talk about another concept — the concept of reductionism:
Reductionism is the idea that the behaviour of complex phenomena can be understood in terms of a number of simpler and more fundamental phenomena. In other words, that things keep getting simpler and simpler as they get “smaller” (or when they are viewed from a lower level). An example of reductionism is the idea that the behaviour of matter can be understood completely by studying the behaviours of its constituents i.e. atoms (the word means “undividable”).
Whether the reductionist view is universally valid, i.e. whether it is possible to devise a theory of everything that describes the whole universe with a set of very simple laws, is a question over which we can argue until that universe’s inevitable collapse. What is certain, though, is that reductionism underpins all our understanding, especially when it comes to science and mathematics — each scientific discipline is based on a set of simple fundaments (e.g. elementary particles in particle physics, chemical elements in chemistry etc. ) on which it builds on its much more complex theories.
So, if this principle is so important, it would be beneficial to be able to formalize it (i.e. to translate it into mathematical language), and this is what we will try to do now. One way to state the principle of reductionism is to say that each thing is nothing but a sum of its parts i.e. if we combine the same set of parts, we always get the same result. To formalize that, we get a set of objects (balls) and a way to combine them, (which we will denote with a dot).
So, if we have, a given “recipe”, for example
Then, we would also have
Or quite simply
Incidentally, this is the definition of a mathematical law called commutativity.
A simple context where this law applies — the natural numbers are commutative under the operation of addiction, e.g. 1 + 2 = 2 + 1 (we will learn more about this in the chapter on groups).
Question: If our objects are sets, what set operations can play the part of the dot in this example (i.e. which ones are commutative)?
Sometimes we observe phenomena that still can be represented as a combination of a given set of fundaments, but only when they are combined in a specific way (as opposed to any combination, as in commutative contexts) e.g. as any mechanic can confirm, a bicycle is indeed just the combination of wheels and frameset…
…but that does not mean that every combination of the above parts constitutes a bicycle. For example, placing the front wheel of the bicycle on the rear end of the frame would result would not result in a working bicycle. And combining two wheels one with another is just not possible.
And, to take a formal example, if function A can be combined with B to get C…
…would not automatically mean that B can be combined with A to get the same result
Side note: composing any function with any other is only possible for functions that have the same set, both as source and target, but even then the end result would not always be the same.
Anyway, we determined that functional composition (as well as bike building) does not obey the law of commutativity i.e. not all functions (or bike parts) compose with all other ones. But still, we know that some of them do, i.e. if you align the function signatures (or in the case of the bicycle if you use the proper parts in the proper places) they would come together seamlessly. So, the order by which we combine the individual parts doesn’t matter for the final outcome e.g. when we are assembling a bicycle, it doesn’t matter if we attach the front wheel to the frame first, or the back wheel, the result will be the same.
Similarly, when combining functions, each pair of functions can, at any time, be replaced by the function that we get by combining them.
This fact is captured by a more restrictive version of commutativity, that we call associativity, which for functions, is usually formulated like this.
Or more generally.
This is the essence of associativity — the ability to study complex phenomenon by zooming in on a part that you want to examine in a given moment, and looking at it in isolation.
Note that the operator we defined only allows for combining things in one dimension (you can attach a thing left and right, but not up or down). Later we will learn about an extension of the concept of a category theory (called monoidal category) that “supports” working in 2 dimensions.
We said that associativity is a more restricted version of commutativity. This is true in a formal way, as we show here. By now, you might realize that a composition operator for morphisms in a given category (the thing we denote by the dot) is itself a morphism, that accepts a pair of morphisms and returns yet another morphism (so \(A \times B \to C\)).
Given a composition operator for a category and one morphism from that same category (in this case the green ball), create two more morphisms (which we call the “curried” morphisms (more in the next chapter)) (\(A \to C\) and \(B \to C\)) by just fixing the second or the first argument of the operator, correspondingly i.e. a morphism that attaches the green ball at the “front” of the function that we have as an argument (produces its source) and one that attaches it at the “end” (accepting its target as a source).
Now, take two such morphisms, with different directions and different objects and ask yourself: under what conditions would those two morphisms commute i.e. when would that formula be satisfied?
Expanding the formula gives us the answer: they commute when the original dot operator is associative.
Thus, we established a connection between associativity and commutativity.
Since we are done with categories, let’s look at some other structures that are also interesting — monoids. Like categories, monoids/groups are also abstract systems consisting of set of elements and operations for manipulating these elements, however the operations look different than the operations we have for categories. Let’s see them.
Monoids are simpler than categories. A monoid is defined by a collection/set of elements (called the monoid’s underlying set, together with an monoid operation — a rule for combining two elements that produces a third element one of the same kind.
Let’s take our familiar colorful balls.
We can define a monoid based on this set by defining an operation for “combining” two balls into one. An example of such operation would be blending the colors of the balls, as if we are mixing paint.
You can probably think of other ways to define a similar operation. This will help you realize that there can be many ways to create a monoid from a given set of set elements i.e. the monoid is not the set itself, it is the set together with the operation.
The monoid operation should, like functional composition, be associative i.e. applying it on the same number of elements in a different order should make no difference.
When an operation is associative, this means we can use all kinds of algebraic operations to any sequence of terms (or in other words to apply equation reasoning), like for example we can replace any element with a set of elements from which it is composed of, or add a term that is present at both sides of an equation and retaining the equality of the existing terms.
Actually, not any (associative) operation for combining elements makes the balls form a monoid (it makes them form a semigroup, which is also a thing, but that’s a separate topic). To be a monoid, a set must feature what is called an identity element of a given operation, the concept of which you are already familiar from both sets and categories — it is an element that when combined with any other element gives back that same element (not the identity but the other one). Or simply $x • i = x$ and $i • x = x$ for any $x$.
In the case of our color-mixing monoid the identity element is the white ball (or perhaps a transparent one, if we have one).
As you probably remember from the last chapter, functional composition is also associative and it also contains an identity element, so you might start suspecting that it forms a monoid in some way. This is indeed the case, but with one caveat, for which we will talk about later.
To keep the suspense, before we discuss the relationship between monoids and categories, we are going through see some simple examples of monoids.
Mathematics is not only about numbers, however numbers do tend to pop up in most of its areas, and monoids are no exception. The set of natural numbers $\mathbb{N}$ forms a monoid when combined with the all too familiar operation of addition (or under addition as it is traditionally said). This group is denoted $\left< \mathbb{N},+ \right>$ (in general, all groups are denoted by specifying the set and the operation, enclosed in angle brackets).
If you see a $1 + 1 = 2$ in your textbook you know you are either reading something very advanced, or very simple, although I am not really sure which of the two applies in the present case.
Anyways, the natural numbers also form a monoid under multiplication as well.
Question: Which are the identity elements of those monoids?
Task: Go through other mathematical operations and verify that they are monoidal.
Thinking about operations that we covered, we may remember the boolean operations and and or. Both of them form monoids, which operate on the set, consisting of just two values ${ True, False }$.
Task: Prove that $\land$ is associative by expanding the formula $(A \land B) \land C = A \land (B \land C)$ with all possible values. Do the same for or.
Question: Which are the identity elements of the and and or operations?
We now know what the monoid operation is, and we even saw some simple examples. However, we never defined the monoid rule/operation formally i.e. using the language of set theory with which we defined everything else. Can we do that? Of course we can — everything can be defined in terms of sets.
We said that a monoid consists of two things a set (let’s call it $A$) and a monoid operation that acts on that set. Since $A$ is already defined in set theory (because it is just a set), all we have to do is define the monoid operation.
Defining the operation is not hard at all. Actually, we have already done it for the operation $+$ — in chapter 2, we said that addition can be represented in set theory as a function that accepts a product of two numbers and returns a number (formally $+: \mathbb{Z} \times \mathbb{Z} \to \mathbb{Z}$).
Every other monoid operation can also be represented in the same way — as a function that takes a pair of elements from the monoid’s set and returns one other monoid element.
Formally, we can define a monoid from any set $A$, by defining an (associative) function with type signature $A \times A \to A$. That’s it. Or to be precise, that is one way to define the monoid operation. And there is another way, which we will see next. Before that, let’s examine some more categories.
Monoid operations obey two laws — they are associative and there is an identity element. In some cases we come across operations that also obey other laws that are also interesting. Imposing more (or less) rules to the way in which (elements) objects are combined results in the definition of other monoid-like structures.
Looking at the monoid laws and the examples we gave so far, we observe that all of them obey one more rule (law) which we didn’t specify — the order in which the operations are applied is irrelevant to the end result.
Such operations (ones for which combining a given set of elements yields the same result no matter which one is first and which one is second) are called commutative operations. Monoids with operations that are commutative are called commutative monoids.
As we said, addition is commutative as well — it does not matter whether if I have given you 1 apple and then 2 more, or if I have given you 2 first and then 1 more.
All monoids that we examined so far are also commutative. We will see some non-commutative ones later.
A group is a monoid such that for each of its elements, there is another element which is the so called “inverse” of the first one where the element and its inverse cancel each other out when applied one after the other. Plain-English definitions like this make you appreciate mathematical formulas more — formally we say that for all elements $x$, there must exist $x’$ such that $x • x’ = i$ ( where $i$ is the identity element).
If we view monoids as a means of modeling the effect of applying a set of (associative) actions, we use groups to model the effects of actions are also reversible.
A nice example of a monoid that we covered that is also a group is the set of integers under addition. The inverse of each number is its opposite number (positive numbers’ inverse are negatives and vice versa). The above formula, then, becomes $x + (-x) = 0$
The study of groups is a field that is much bigger than the theory of monoids (and perhaps bigger than category theory itself). And one of its the biggest branches is the study of the “symmetry groups” which we will look into next.
But before that, just a quick note — the algebraic structures that we saw can be summarized based on the laws that define them in this table.
Semigroups | Monoids | Groups | |
---|---|---|---|
Associativity | X | X | X |
Identity | X | X | |
Invertability | X |
And now for the symmetry groups.
An interesting kinds of groups/monoids are the groups of symmetries of geometric figures. Given some geometric figure, a symmetry is an action after which the figure is not displaced (e.g. it can fit into the same mold that it fit before the action was applied).
We won’t use the balls this time, because in terms of symmetries they have just one position and hence just one action — the identity action (which is its own reverse, by the way). So let’s take this triangle, which, for our purposes, is the same as any other triangle (we are not interested in the triangle itself, but in its rotations).
Let’s first review the group of ways in which we can rotate our triangle i.e. its rotation group. A geometric figure can be rotated without displacement in positions equal to the number of its sides, so for our triangle there are 3 positions.
Connecting the dots (or the triangles in this case) shows us that there are just two possible rotations that get us from any state of the triangle to any other one — a 120-degree rotation (i.e. flipping the triangle one time) and a 240-degree rotation (i.e. flipping it two times (or equivalently, flipping it once, but in the opposite direction)). Adding the identity action of 0-degree rotation makes up for 3 rotations (objects) in total.
The rotations of a triangle form a monoid — the rotations are objects (of which the zero-degree rotation is the identity) and the monoid operation which combines two rotations into one is just the operation of performing the first rotation and then performing the second one.
NB: Note once again that the elements in the group are the rotations, not the triangles themselves, actually the group has nothing to do with triangles, as we shall see later.
The diagram that enumerates all the rotations of a more complex geometrical figure looks quite messy at first.
But it gets much simpler to grasp if we notice the following: although our group has many rotations, and there are more still for figures with more sides (if I am not mistaken, the number of rotations is equal to the number of the sides), all those rotations can be reduced to the repetitive application of just one rotation, (for example, the 120-degree rotation for triangles and the 45-degree rotation for octagons). Let’s make up a symbol for this rotation.
Symmetry groups that have such “main” rotation, and, in general, groups and monoids that have an object that is capable of generating all other objects by its repeated application, are called cyclic groups. And such rotation are called the group’s generator.
All rotation groups are cyclic groups. Another example of a cyclic groups is, yes, the natural numbers under addition. Here we can use $+1$ or $-1$ as generators.
Wait, how can this be a cyclic group when there are no cycles? This is because the integers are an infinite cyclic group.
A number-based example of a finite cyclic group is the group of natural numbers under modular arithmetic (sometimes called “clock arithmetic”). Modular arithmetic’s operation is based on a number called the modulus (let’s take $12$ for example). In it, each number is mapped to the remainder of the integer addition of that number and the modulus.
For example: $1 \pmod{12} = 1$ (because $1/12 = 0$ with $1$ remainder) $2 \pmod{12} = 2$ etc.
But $13 \pmod{12} = 1$ (as $13/12 = 1$ with $1$ remainder) $14 \pmod{12} = 2$, $15 \pmod{12} = 3$ etc.
In effect numbers “wrap around”, forming a group with as many elements as it the modulus number. Like for example a group representation of modular arithmetic with modulus $3$ has 3 elements.
All cyclic groups that have the same number of elements (or that are of the same order) are isomorphic to each other (careful readers might notice that we haven’t yet defined what a group isomorphisms are, even more careful readers might already have an idea about what it is).
For example, the group of rotations of the triangle is isomorphic to the natural numbers under the addition with modulo $3$.
All cyclic groups are commutative (or “abelian” as they are also called).
Task: Show that there are no other groups with 3 objects, other than $Z_3$.
There are abelian groups that are not cyclic, but, as we shall see below, the concepts of cyclic groups and of abelian groups are deeply related.
We already mentioned group isomorphisms, but we didn’t define what they are. Let’s do that now — an isomorphism between two groups is an isomorphism ($f$) between their respective sets of elements, such that for any $a$ and $b$ we have $f(a \circ b) = f(a) \circ f(b)$. Visually, the diagrams of isomorphic groups have the same structure.
As in category theory, in group theory isomorphic groups they considered instances of one and the same group. For example the one above is called $Z_3$.
Like with sets, the concept of an isomorphism in group theory allows us to identify common finite groups.
The smallest group is just the trivial group $Z_1$ that has just one element.
The smallest non-trivial group is the group $Z_2$ that has two elements.
$Z_2$ is also known as the boolean group, due to the fact that it is isomorphic to the ${ True, False }$ set under the operation that negates a given value.
Like $Z_3$, $Z_1$ and $Z_2$ are cyclic.
We already saw a lot of abelian groups that are also cyclic, but we didn’t see any abelian groups that are not cyclic. So let’s examine what those look like. This time, instead of looking into individual examples, we will present a general way for producing abelian non-cyclic groups from cyclic ones — it is by uniting them by using group product.
Given any two groups, we can combine them to create a third group, comprised of all possible pairs of elements from the two groups and of the sum of all their actions.
Let’s see how the product looks like take the following two groups (which, having just two elements and one operation, are both isomorphic to $Z2$). To make it easier to imagine them, we can think of the first one as based on the vertical reflection of a figure and the second, just the horizontal reflection.
We get set of elements of the new group by taking the Cartesian product of the set of the elements of the first group and the set of the element of the second one.
And the actions of a product group are comprised of the actions of the first group, combined with the actions of the second one, where each action is applied only on the element that is a member of its corresponding group, leaving the other element unchanged.
The product of the two groups we presented is called the Klein four-group and it is the simplest non-cyclic abelian group.
Another way to present the Klein-four group is the group of symmetries of a non-square rectangle.
Task: Show that the two representations are isomorphic.
The Klein-four group is non-cyclic (because there are not one, but two generators) — vertical and horizontal spin. It is, however, still abelian, because the ordering of the actions still does not matter for the end results. Actually, the Klein-four group is the smallest non-cyclic group.
Product groups are non-cyclic, provided that the number of elements of the groups that comprise them (or their orders) aren’t relatively prime (have some GCD other than 1).
If two groups have orders that aren’t relatively prime, (like for example $2$ and $2$, (which are both divided by 2) as the groups that comprise Klein-four), then even if the two groups are cyclic and have just 1 generator each, their product would have 2 generators.
And if you combine two groups with orders that are relatively prime, (like $2$ and $3$) the resulting group would be isomorphic to a cyclic group of the same order, as the product of $Z_3$ and $Z_2$ is isomorphic to the group $Z_6$ ($Z_3 \times Z_2 \cong Z_6$)
This is a consequence of an ancient result, known as the Chinese Remainder theorem.
Product groups are abelian, provided that the groups that form them are abelian. We can see that this is true by noticing that, although the generators are more than one, each of them acts only on its own part of the group, so they don’t interfere with each other in any way.
Products provide one way to create non-cyclic abelian groups — by creating a product of two or more cyclic groups. The fundamental theory of finite abelian groups is a result that tells us that this is the only way to produce non-cyclic abelian groups i.e.
All abelian groups are either cyclic or products of cyclic groups.
We can use this law to gain intuitive understanding of the what abelian groups are, but also to test whether a given group can be broken down to a product of more elementary groups.
To see how can we use this theorem, let’s revisit our color mixing monoid that we saw earlier.
As there doesn’t exist a color that, when mixed with itself, can produce all other colors, the color-mixing monoid is not cyclic. However, the color mixing monoid is abelian. So according to the theorem of finite abelian groups (which is valid for monoids as well), the color-mixing monoid must be (isomorphic to) a product.
And it is not hard to find the monoids that form it — although there isn’t one color that can produce all other colors, there are three colors that can do that — the prime colors. This observation leads us to the conclusion that the color-mixing monoid, can be represented as the product of three monoids, corresponding to the three primary colors.
You can think of each color monoid as a boolean monoid, having just two states (colored and not-colored).
Or alternatively, you can view it as having multiple states, representing the different levels of shading.
In both cases the monoid would be cyclic.
Now, let’s finally examine a non-commutative group — the group of rotations and reflections of a given geometrical figure. It is the same as the last one, but here besides the rotation action that we already saw (and its composite actions), we have the action of flipping the figure vertically, an operation which results in its mirror image:
Those two operations and their composite results in a group called $Dih3$ that is not abelian (and is furthermore the smallest non-abelian group).
Task: Prove that this group is indeed not abelian.
Question: Besides having two main actions, what is the defining factor that makes this and any other group non-abelian?
Groups that represent the set of rotations and reflections of any 2D shape are called dihedral groups
We began by defining a monoid as a set of composable elements. Then we saw that for some groups, like the groups of symmetries and rotations, those elements can be viewed as actions. And this is actually true for all other groups as well, e.g. the red ball in our color-blending monoid can be seen as the action of adding the color red to the mix, the number $2$ in the monoid of addition can be seen as the operation $+2$ etc. This observation leads to a categorical view of the theory of groups and monoids.
When we defined monoids, we saw that their operations are two-argument functions. And we introduced a way for representing such functions using set theory — by uniting the two arguments into one using products. i.e. we showed that a function that accepts two arguments (say $A$ and $B$) and maps them into some result ($C$), can be thought as a mapping from the product of the sets of two arguments to the result. So $A\times B\to C$.
However, this is not the only way to represent multi-argument function set-theoretically — there is another, equally interesting way, that doesn’t rely on any data structures, but only on functions: that way is to have a function that maps the first of the two arguments (i.e. from $A$) to another function that maps the second argument to the final result (i.e. $B \to C$). So $A\to B \to C$.
The practice of transforming a function that takes a pair of objects to a function that takes just one object and returns a function that takes another one is called currying.
It is achieved by a higher-order function. Here is how such a function might be implemented.
const curry = <A, B, C> (f:(a:A, b:B) => C) => (a:A) => (b:B) => f(a, b)
And equally important is the opposite function, which maps a curried function to a multi-argument one, which is known as uncurry.
const uncurry = <A, B, C> (f:(a:A) => (b:B) => C) => (a:A, b:B ) => f(a)(b)
There is a lot to say about these two functions, starting from the fact that its existence gives rise to an interesting relationship between the concept of a product and the concept of a morphism in category theory, called an adjunction. But we will cover this later. For now, we are interested in the fact the two function representations are isomorphic, formally $A\times B\to C\cong A\to B \to C$.
By the way, this isomorphism can be represented in terms of programming as well. It is equivalent to the statement that the following function always returns true
for any arguments,
(...args) => uncurry(curry(f(...args)) === f(...args)
This is one part of the isomorphism, the other part is the equivalent function for curried functions.
Task: Write the other part of the isomorphism.
Let’s take a step back and examine the groups/monoids that we covered so far in the light of what we learned. We started off by representing group operation as a function from pairs. For example, the operation of a symmetric group,(let’s take $Z_3$ as an example) are actions that converts two rotations to another rotation.
Using currying, we can represent the elements of a given group/monoid as functions by uniting them to the group operation, and the group operation itself — as functional composition. For example, the 3 elements of $Z_3$ can be seen as 3 bijective (invertable) functions from a set of 3 elements to itself (in group-theoretic context, these kinds of functions are called permutations, by the way).
We can do the same for the addition monoid — numbers can be seen not as quantities (as in two apples, two oranges etc.), but as operations, (e.g. as the action of adding two to a given quantity).
Formally, the operation of the addition monoid, that we saw above has the following type signature.
$+: \mathbb{Z} \times \mathbb{Z} \to \mathbb{Z}$
Because of the isomorphism we presented above, this function is equivalent to the following function.
$+: \mathbb{Z} \to (\mathbb{Z} \to \mathbb{Z})$
When we apply an element of the monoid to that function (say $2$), the result is the function $+2$ that adds 2 to a given number.
$+2: \mathbb{Z} \to \mathbb{Z}$
And because the monoid operation is always a given in the context of a given monoid, we can view the element $2$ and the function $+2$ as equivalent in the context of the monoid.
$2 \cong +2$
In other words, in addition to representing the monoid elements in the set as objects that are combined using a function, we can represent them as functions themselves.
The functions that represent the monoid elements have the same set as source and target, or same signature, as we say (formally, they are of the type $A \to A$ for some $A$). Because of that, they all can be composed with one another, using functional composition, resulting in functions that also has the same signature.
And the same is valid for the addition monoid — number functions can be combined using functional composition.
$+2 \circ +3 \cong +5$
So, basically the functions that represent the elements of a monoid also form a monoid, under the operation of functional composition (and the functions that represent the elements that form a group also form a group).
Question: Which are the identity elements of function groups?
Task: Show that the functions representing inverse group elements are also inverse.
Once we learn how to represent the elements of any monoid as permutations that also form a monoid, using currying, it isn’t too surprising to learn that this constructed permutation monoid is isomorphic to the original one (the one from which it is constructed — this is a result known as the Cayley’s theorem:
Any group is isomorphic to a permutation group.
Formally, if we use $Perm$ to denote the permutation group then $Perm(A) \cong A$ for any $A$.
Or in other words, representing the elements of a group as permutations actually yields a representation of the monoid itself (sometimes called its standard representation).
Cayley’s theorem may not seem very impressive, but that only shows how influential it has been as a result.
The first thing that you have to know about the symmetric groups is that they are not the same thing as symmetry groups. Once we have that out of the way, we can understand what they actually are: given a natural number $n$, the symmetric group of $n$, denoted $\mathrm{S}_n$ (symmetric group of degree $n$) is the group of all possible permutations of a set with $n$ elements. The number of the elements of such groups is equal to are $1\times 2\times 3…\times n$ or $n!$ (n-factorial).
So, for example the group $\mathrm{S}_1$ of permutations of the one-element set has just 1 element (because a 1-element set has no other functions to itself other than the identity function.
The group $\mathrm{S}_2$, has $1 \times 2 = 2$ elements (by the way, the colors are there to give you some intuition as to why the number of permutations of a $n$-element set is $n!$).
And with $\mathrm{S}_3$ we are already feeling the power of exponential (and even faster than exponential!) growth of the factorial function — it has $1\times 2\times 3=6$ elements.
Each symmetric group $\mathrm{S}_n$ contains all groups of order $n$ — this is so, because (as we saw in the prev section) every group with $n$ elements is isomorphic to a set of permutations on the set of $n$ elements and the group $\mathrm{S}_n$ contains all such permutations that exist.
Here are some examples:
Based on this insight, can state Cayley’s theorem in terms of symmetric groups in the following way:
All groups are isomorphic to subgroups of symmetric groups.
Task: Show how the two are equivalent.
Fun fact: the study of group theory actually started by examining symmetric groups, so this theorem was actually a prerequisite for the emergence of the normal definition of groups that we all know and love (OK, at least I love it) — it provided a proof that the notion described by this definition is equivalent to the already existing notion of symmetric groups.
We saw that converting the monoid’s elements to actions/functions yields an accurate representation of the monoid in terms of sets and functions.
However, it seems that the set part of the structure in this representation is kinda redundant — you have the same set everywhere — so, it would do it good if we can simplify it. And we can do that by depicting it as an external (categorical) diagram.
But wait, if the monoids’ underlying sets correspond to objects in category theory, then the corresponding category would have just one object. And so the correct representation would involve just one point from which all arrows come and to which they go.
The only difference between different monoids would be the number of morphisms that they have and the relationship between them.
The intuition behind this representation from a category-theoretic standpoint is encompassed by the law of closure that monoid and group operations have and that categories lack — it is the law stating that applying the operation (functional composition) on any two objects always yields the same object, e.g. no matter how do you flip a triangle, you’d still get a triangle.
Categories | Monoids | Groups | |
---|---|---|---|
Associativity | X | X | X |
Identity | X | X | X |
Invertibility | X | ||
Closure | X | X |
When we view a monoid as a category, this law says that all morphisms in the category should be from one object to itself - a monoid, any monoid, can be seen as a category with one object. The converse is also true: any category with one object can be seen as a monoid.
Let’s elaborate on this thought by reviewing the definition of a category from chapter 2.
A category is a collection of objects (we can think of them as points) and morphisms (arrows) that go from one object to another, where:
- Each object has to have the identity morphism.
- There should be a way to compose two morphisms with an appropriate type signature into a third one in a way that is associative.
Aside from the little-confusing fact that monoid objects are morphisms when viewed categorically, this describes exactly what monoids are.
Categories have an identity morphism for each object, so for categories with just one object, there should also be exactly one identity morphism. And monoids do have an identity object, which when viewed categorically corresponds to that identity morphism.
Categories provide a way to compose two morphisms with an appropriate type signature, and for categories with one object this means that all morphisms should be composable with one another. And the monoid operation does exactly that — given any two objects (or two morphisms, if we use the categorical terminology), it creates a third.
Philosophically, defining a monoid as a one-object category means corresponds to the view of monoids as a model of how a set of (associative) actions that are performed on a given object alter its state. Provided that the object’s state is determined solely by the actions that are performed on it, we can leave it out of the equation and concentrate on how the actions are combined. And as per usual, the actions (and elements) can be anything, from mixing colors in paint, or adding a quantities to a given set of things etc.
When we view cyclic groups/monoids as categories, we would see that they correspond to categories that (besides having just one object) also have just one morphism (which, as we said, is called a generator) along with the morphisms that are created when this morphism is composed with itself. In fact the infinite cyclic monoid (which is isomorphic to the natural numbers), can be completely described by this simple definition.
This is so, because applying the generator again and again yields all elements of the infinite cyclic group. Specifically, if we view the generator as the action $+1$ then we get the natural numbers.
Finite cyclic groups/monoids are the same, except that their definition contains an additional law, stating that that once you compose the generator with itself $n$ number of times, you get identity morphism. For the cyclic group $Z_3$ (which can be visualized as the group of triangle rotations) this law states that composing the generator with itself $3$ times yields the identity morphism.
Composing the group generator with itself, and then applying the law, yields the three morphisms of $Z_3$.
We can represent product groups this way too. Let’s take Klein four-group as an example, The Klein four-group has two generators that it inherits from the groups that form it (which we viewed like vertical and horizontal rotation of a non-square rectangle) each of which comes with one law.
To make the representation complete, we add the law for combining the two generators.
And then, if we start applying the two generators and follow the laws, we get the four elements.
The set of generators and laws that defines a given group is called the presentation of a group. Every group has a presentation.
We saw how picking a different selection of laws gives rise to different types of monoid. But what monoids would we get if we pick no laws at all? These monoids (we get a different one depending on the set we picked) are called a free monoids (the word “free” is used in the sense that once you have the set, you can upgrade it to a monoid for free (i.e. without having to define anything else).
If you revisit the previous section you will notice that we already saw one such monoid. The free monoid with just one generator is isomorphic to the monoid of integers.
We can make a free monoid from the set of colorful balls — the monoid’s elements would be sequences of all possible combinations of the balls.
The free monoid is a special one — each element of the free monoid over a given set, can be converted to a corresponding element in any any other monoid that uses the same set of generators by just applying the monoid’s laws. For example, here is how the elements above would look like if we apply the laws of the color-mixing monoid.
Task: Write up the laws of the color-mixing monoid.
If we put out programmers’ hat, we will see that the type of the free monoid under the set of generators T (which we can denote as FreeMonoid<T>
) is isomorphic to the type List<T>
(or Array<T>
, if you prefer) and that the intuition behind the special property that we described above is actually very simple: keeping objects in a list allows you to convert them to any other structure i.e. when we want to perform some manipulation on a bunch of objects, but we don’t know exactly what this manipulation is, we just keep a list of those objects until it’s time to do it.
While the intuition behind free monoids seems simple enough, the formal definition is not our cup of tea… yet, simply because we have to cover more stuff.
We understand that, being the most general of all monoids for a given set of generators, a free monoid can be converted to all of them. i.e. there exist a function from it to all of them. But what kind of function would that be? Tune in after a few chapters to find out.
Given a set of objects, there can be numerous criteria, based on which to order them (depending on the objects themselves) — size, weight, age, alphabetical order etc.
However, currently we are not interested in the criteria that we can use to order objects, but in the nature of the relationships that define order. Of which there can be several types as well.
Mathematically, the order as a construct is represented (much like a monoid) by two components.
One is a set of things (e.g. colorful balls) which we sometimes call the order’s underlying set.
And the other is a binary relation between these things, which are often represented as arrows.
Not all binary relationships are orders — only ones that fit certain criteria that we are going to examine as we review the different types of order.
Let’s start with an example — the most straightforward type of order that you think of is linear order i.e. one in which every object has its place depending on every other object. In this case the ordering criteria is completely deterministic and leaves no room for ambiguity in terms of which element comes before which. For example, order of colors, sorted by the length of their light-waves (or by how they appear in the rainbow).
Using set theory, we can represent this order, as well as any other order, as a cartesian products of the order’s underlying set with itself.
And in programming, orders are defined by providing a function which, given two objects, tells us which one of them is “bigger” (comes first) and which one is “smaller”. It isn’t hard to see that this function is actually a definition of a cartesian product.
[1, 3, 2].sort((a, b) => {
if (a > b) {
return true
} else {
return false
}
})
However (this is where it gets interesting) not all such functions (and not all cartesian products) define orders. To really define an order (e.g. give the same output every time, independent of how the objects were shuffled initially), functions have to obey several rules.
Incidentally, (or rather not incidentally at all), these rules are nearly equivalent to the mathematical laws that define the criteria of the order relationship i.e. those are the rules that define which element can point to which. Let’s review them.
Let’s get the most boring law out of the way — each object has to be bigger or equal to itself, or $a ≤ a$ for all $a$ (the relationship between elements in an order is commonly denoted as $≤$ in formulas, but it can also be represented with an arrow from first object to the second.)
There is no special reason for this law to exist, except that the “base case” should be covered somehow.
We can formulate it the opposite way too and say that each object should not have the relationship to itself, in which case we would have a relation than resembles bigger than, as opposed to bigger or equal to and a slightly different type of order, sometimes called a strict order.
The second law is maybe the least obvious, (but probably the most essential) — it states that if object $a$ is bigger than object $b$, it is automatically bigger than all objects that are smaller than $b$ or $a ≤ b \land b ≤ c \to a ≤ c$.
This is the law that to a large extend defines what an order is: if I am better at playing soccer than my grandmother, then I would also be better at it than my grandmother’s friend, whom she beats, otherwise I wouldn’t really be better than her.
The third law is called antisymmetry. It states that the function that defines the order should not give contradictory results (or in other words you have $x ≤ y$ and $y ≤ x$ only if $x = y$).
It also means that no ties are permitted — either I am better than my grandmother at soccer or she is better at it than me.
The last law is called totality (or connexity) and it mandates that all elements that belong to the order should be comparable ($a ≤ b \lor b ≤ a$). That is, for any two elements, one would always be “bigger” than the other.
By the way, this law makes the reflexivity law redundant, as reflexivity is just a special case of totality when $a$ and $b$ are one and the same object, but I still want to present it for reasons that will become apparent soon.
Actually, here are the reasons: this law does not look so “set in stone” as the rest of them i.e. we can probably think of some situations in which it does not apply. For example, if we aim to order all people based on soccer skills there are many ways in which we can rank a person compared to their friends their friend’s friends etc. but there isn’t a way to order groups of people who never played with one another.
Orders, like the order people based on their soccer skills, that don’t follow the totality law are called partial orders, (and linear orders are also called total orders.)
Question: Previously, we covered a relation that is pretty similar to this. Do you remember it? What is the difference?
Task: Think about some orders that you know about and figure out whether they are partial or total.
Partial orders are actually much more interesting than linear/total orders. But before we dive into them, let’s say a few things about numbers.
Natural numbers form a linear order under the operation bigger or equal to (the symbol of which we have been using in our formulas.)
In many ways, numbers are the quintessential order — every finite order of objects is isomorphic to a subset of the order of numbers, as we can map the first element of any order to the number $1$, the second one to the number $2$ etc (and we can do the opposite operation as well).
If we think about it, this isomorphism is actually closer to the everyday notion of a linear order, than the one defined by the laws — when most people think of order, they aren’t thinking of a transitive, antisymmetric and total relation, but are rather thinking about criteria based on which they can decide which object comes first, which comes second etc. So it’s important to notice that the two are equivalent.
From the fact that any finite order of objects is isomorphic to the natural numbers, it also follows that all linear orders of the same magnitude are isomorphic to one another.
So, the linear order is simple, but it is also (and I think that this isomorphism proves it) the most boring order ever, especially when looked from a category-theoretic viewpoint — all finite linear orders (and most infinite ones) are just isomorphic to the natural numbers and so all of their diagrams look the same way.
However, this is not the case with partial orders that we will look into next.
Like a linear order, a partial order consists of a set plus a relation, with the only difference that, although it still obeys the reflexive, transitive and the antisymmetric laws, the relation does not obey the law of totality, that is, not all of the sets elements are necessarily ordered. I say “necessarily” because even if all elements are ordered, it is still a partial order (just as a group is still a monoid) — all linear orders are also partial orders, but not the other way around. We can even create an order of orders, based on which is more general.
Partial orders are also related to the concept of an equivalence relations that we covered in chapter 1, except that symmetry law is replaced with antisymmetry.
If we revisit the example of the soccer players rank list, we can see that the first version that includes just myself, my grandmother and her friend is a linear order.
However, including this other person whom none of us played yet, makes the hierarchy non-linear i.e. a partial order.
This is the main difference between partial and total orders — partial orders cannot provide us with a definite answer of the question who is better than who. But sometimes this is what we need — in sports, as well as in other domains, there isn’t always an appropriate way to rate people linearly.
Before, we said that all linear orders can be represented by the same chain-like diagram, we can reverse this statement and say that all diagrams that look something different than the said diagram represent partial orders. An example of this is a partial order that contains a bunch of linearly-ordered subsets, e.g. in our soccer example we can have separate groups of friends who play together and are ranked with each other, but not with anyone from other groups.
The different linear orders that make up the partial order are called chains. There are two chains in this diagram $m \to g \to f$ and $d \to o$.
The chains in an order don’t have to be completely disconnected from each other in order for it to be partial. They can be connected as long as the connections are not all one-to-one i.e. ones when the last element from one chain is connected to the first element of the other one (this would effectively unite them into one chain.)
The above set is not linearly-ordered. This is because, although we know that $d ≤ g$ and that $f ≤ g$, the relationship between $d$ and $f$ is not known — any element can be bigger than the other one.
Although partial orders don’t give us a definitive answer to “Who is better than who?”, some of them still can give us an answer to the more important question (in sports, as well as in other domains), namely “Who is number one?” i.e. who is the champion, the player who is better than anyone else. Or, more generally, the element that is bigger than all other elements.
We call such element the greatest element. Some (not all) partial orders do have such element — in our last diagram $m$ is the greatest element, in this diagram, the green element is the biggest one.
Sometimes we have more than one elements that are bigger than all other elements, in this case none of them is the greatest.
In addition to the greatest element, a partial order may also have a least (smallest) element, which is defined in the same way.
The least upper bound of two elements that are connected as part of an order is called the join of these elements, e.g. the green element is a join of the other two.
There can be multiple elements bigger than $a$ and $b$ (all elements that are bigger than $c$ are also bigger than $a$ and $b$), but only one of them is a join. Formally, the join of $a$ and $b$ is defined as the smallest element that is bigger than both $a$ and $b$ (i.e. smallest $c$ for which $a ≤ c$, and $b ≤ c$.)
Given any two elements in which one is bigger than the other (e.g. $a ≤ b$), the join is the bigger element (in this case $b$).
In a linear orders, the join of any two elements is just the bigger element.
Like with the greatest element, if two elements have several upper bounds that are equally big, then none of them is a join (a join must be unique).
If, however, one of those elements is established as smaller than the rest of them, it immediately qualifies.
Question: Which concept in category theory reminds you of joins?
Given two elements, the biggest element that is smaller than both of them is called the meet of these elements.
The same rules as for the joins apply.
The diagrams that we use in this section are called “Hasse diagrams” and they work much like our usual diagrams, however they have an additional rule that is followed — “bigger” elements are always positioned above smaller ones.
In terms of arrows, the rule means that if you add an arrow to a point, the point to which the arrow points must always be above the one from which it points.
This arrangement allows us to compare any two points by just seeing which one is above the other e.g. we can determine the join of two elements, by just identifying the elements that they connect to and see which one is lowest.
We all know many examples of total orders (any form of chart or ranking is a total order), but there are probably not so many obvious examples of partial orders that we can think of off the top of our head. So let’s see some. This will gives us some context, and will help us understand what joins are.
To stay true to our form, let’s revisit our color-mixing monoid and create a color-mixing partial order in which all colors point to colors that contain them.
If you go through it, you will notice that the join of any two colors is the color that they make up when mixed. Nice, right?
We saw that when we order numbers by “bigger or equal to”, they form a linear order (the linear order even.) But numbers can also form a partial order, for example they form a partial order if we order them by which divides which, i.e. if $a$ divides $b$, then $a$ is before $b$ e.g. because $2 \times 5 = 10$, $2$ and $5$ come before $10$ (but $3$, for example, does not come before $10$.)
And it so happens (actually for very good reason) that the join operation again corresponds to an operation that is relevant in the context of the objects — the join of two numbers in this partial order is their least common multiple.
And the meet (the opposite of join) of two numbers is their greatest common divisor.
Given a collection of all possible sets containing a combination of a given set of elements…
…we can define what is called the inclusion order of those sets, in which $a$ comes before $b$ if $a$ includes $b$, or in other words if $b$ is a subset of $a$.
In this case the join operation of two sets is their union, and the meet operation is their set intersection.
This diagram might remind you of something — if we take the colors that are contained in each set and mix them into one color, we get the color-blending partial order that we saw earlier.
The order example with the number dividers is also isomorphic to an inclusion order, namely the inclusion order of all possible sets of prime numbers, including repeating ones (or alternatively the set of all prime powers). This is confirmed by the fundamental theory of arithmetic, which states that every number can be written as a product of primes in exactly one way.
We mentioned order isomorphisms several times already so this is about time to elaborate on what they are. Take the isomorphism between the number partial order and the prime inclusion order as an example. Like an isomorphism between any two sets, it is comprised of two functions:
An order isomorphism is essentially an isomorphism between the orders’ underlying sets (invertible function). However, besides their underlying sets, orders also have the arrows that connect them, so there is one more condition: in order for an invertible function to constitute an order isomorphism, it has to respect those arrows, in other words it should be order preserving. More specifically, applying this function (let’s call it $F$) to any two elements in one set ($a$ and $b$) should result in two elements that have the same corresponding order in the other set (so $a ≤ b$ if and only if $F(a) ≤ F(b)$). Birkhoff’s representation theorem —
So far, we saw two different partial orders, one based on color mixing, and one based on number division, that can be represented by the inclusion orders of all possible combinations of sets of some basic elements (the primary colors in the first case, and the prime numbers (or prime powers) in the second one.) Many other partial orders can be defined in this way. Which ones exactly, is a question that is answered by an amazing result called Birkhoff’s representation theorem. They are the finite partial orders that meet the following two criteria:
The partial orders that meet the first criteria are called lattices. The ones that meet the second one are called distributive lattices.
And the “prime” elements which we use to construct the inclusion order are the elements that are not the join of any other elements. They are also called join-irreducible elements.
By the way, the partial orders that are not distributive lattices are also isomorphic to inclusion orders, it is just that they are isomorphic to inclusion orders that do not contain all possible combinations of elements.
We will now review the orders for which Birkhoff’s theorem applies i.e. the lattices. Lattices are partial orders, in which every two elements have a join and a meet. So every lattice is also partial order, but not every partial order is a lattice (we will see even more members of this hierarchy).
Most partial orders that are created based on some sort of rule are distributive lattices, like for example the partial orders from the previous section are also distributive lattices when they are drawn in full, for example the color-mixing order.
Notice that we added the black ball at the top and the white one at the bottom. We did that because otherwise the top three elements wouldn’t have a join element, and the bottom three wouldn’t have a meet.
Our color-mixing lattice, has a greatest element (the black ball) and a least element (the white one). Lattices that have a least and greatest elements are called bounded lattices. It isn’t hard to see that all finite lattices are also bounded.
Task: Prove that all finite lattices are bounded.
Lattices are partial orders that have both join and meet for each pair of elements. Partial orders that just have join (and no meet), or just have meet and no join are called semilattices. More specifically, partial orders that have meet for every pair of elements are called meet-semilattices.
A structure that is similar to a semilattice (and probably more famous than it) is the tree.
The difference between the two is small but crucial: in a tree, each element can only be connected to just one other element (although it can have multiple elements connected to it). If we represent a tree as an inclusion order, each set would “belong” in only one superset, whereas with semilattices there would be no such restrictions.
A good intuition for the difference between the two is that a semilattice is capable of representing much more general relations, so for example, the mother-child relation forms a tree (a mother can have multiple children, but a child can have only one mother), but the “older sibling” relation forms a lattice, as a child can have multiple older siblings and vise versa.
Why am I speaking about trees? It’s because people tend to use them for modeling all kinds of phenomena and to imagine everything as trees. The tree is the structure that all of us understand, that comes at us naturally, without even realizing that we are using a structure — most human-made hierarchies are modeled as trees. A typical organization of people are modeled as trees - you have one person at the top, a couple of people who report to them, then even more people that report to this couple of people.
(Contrast this with informal social groups, in which more or less everyone is connected to everyone else.)
And, cities (ones that are designed rather than left to evolve naturally) are also modeled as trees: you have several neighborhoods each of which has a school, a department store etc., connected with to each other and (in bigger cities) organized into bigger living units.
The implications of the tendency to use trees, as opposed to lattices, to model are examined in the ground-breaking essay “A City is Not a Tree” by Christopher Alexander.
In simplicity of structure the tree is comparable to the compulsive desire for neatness and order that insists the candlesticks on a mantelpiece be perfectly straight and perfectly symmetrical about the center. The semilattice, by comparison, is the structure of a complex fabric; it is the structure of living things, of great paintings and symphonies.
In general, it seems that hierarchies that are specifically designed by people, such as cities tend to come up as trees, whereas hierarchies that are natural, such as the hierarchy of colors, tend to come be lattices.
In the previous section we (along with Christopher Alexander) argued that lattice-based hierarchies are “natural”, that is, they arise in nature. Now, we will see an interesting way to uncover such hierarchies, given a set of objects that share some attributes. This is an overview of a mathematical method, called formal context analysis.
The data structure that we will be analyzing, called formal context consists of 3 sets. Firstly, the set containing all objects that we will be analyzing (denoted as $G$).
Secondly, a set of some attributes that these objects might have (denoted as $M$). Here we will be using the 3 base colors.
And finally a set of relation (called incidence) that expresses which objects have which attributes, expressed by a set of pairs $G × M$. So, having a pair containing a ball and a base color (yellow for example) would indicates that the color of the ball contains (i.e. is composed of) the color yellow (among other colors).
Now, let’s use these sets to build a lattice.
First step: From the set of pairs, we build a function, connecting each attributes with the set of objects that share this attribute (we can do this because functions are relations and relations are expressed by pairs).
Take a look at the target of this function, which is a set of sets. Is there some way to order those sets in order to visialize them better? Of course, we can order them by inclusion. In this way, each attribute would be connected to an attribute that is shared by a similar set of objects.
Add top and bottom values, and we get a lattice.
Ordering the concept as a lattice might help us see connections between the concepts, in the context e.g. we see that all balls in our set that contain the color yellow also contain the color red.
Task: Take a set of object and one containing attributes and create your own concept lattice. Example: the objects can be lifeforms: fish, frog, dog, water weed, corn etc. and attributes can be their characteristics: “lives in water”, “lives in land”, “can move”, “is a plant”, “is an animal” etc.
In the previous section, we saw how removing the law of totality from the laws of (linear) order produces a different (and somewhat more interesting) structure, called partial order. Now let’s see what will happen if we remove another one of the laws, namely the antisymmetry law. If you recall, the antisymmetry law mandated that you cannot have an object that is at the same time smaller and bigger than another one. (or that $a ≤ b ⟺ b ≰ a$).
Linear order | Partial order | Preorder | |
---|---|---|---|
$a ≤ b$ or $b ≤ a$ | $a ≤ b$ or $b ≤ a$ or neither | $a ≤ b$ or $b ≤ a$ or neither or both | |
Reflexivity | X | X | X |
Transitivity | X | X | X |
Antisymmetry | X | X | |
Totality | X |
The result is a structure called a preorder which is not exactly an order in the everyday sense — it can have arrows coming from any point to any other: if a partial order can be used to model who is better than who at soccer, then a preorder can be used to model who has beaten who, either directly (by playing him) or indirectly.
Preorders have just one law — transitivity $a ≤ b \land b ≤ c \to a ≤ c$ (two, if we count reflexivity). The part about the indirect wins is a result of this law. Due to it, all indirect wins (ones that are wins not against the player directly, but against someone who had beat them) are added as a direct result of its application, as seen here (we show indirect wins in lighter tone).
And as a result of that, all “circle” relationships (e.g. where you have a weaker player beating a stronger one) result in just a bunch of objects that are all connected to one another.
All of that structure arises naturally from the simple law of transitivity.
Preorders may be viewed as a middle-ground between partial orders and equivalence relations. As the missing exactly the property on which those two structures differ — (anti)symmetry. Because of that, if we have a bunch of objects in a preorder that follow the law of symmetry, those objects form an equivalence relation. And if they follow the reverse law of antisymmetry, they form a partial order.
Equivalence relation | Preorder | Partial order |
---|---|---|
Reflexivity | Reflexivity | Reflexivity |
Transitivity | Transitivity | Transitivity |
Symmetry | - | Antisymmetry |
In particular, any subset of objects that are connected with one another both ways (like in the example above) follows the symmetry requirement. So if we group all elements that have such connection, we would get a bunch of sets, all of which define different equivalence relations based on the preorder, called the preorder’s equivalence classes.
And, even more interestingly, if we transfer the preorder connections between the elements of these sets to connections between the sets themselves, these connections would follow the antisymmetry requirement, which means that they would form a partial order.
In short, for every preorder, we can define the partial order of the equivalence classes of this preorder.
We use maps to get around all the time, often without thinking about the fact that that they are actually diagrams. More specifically, some of them are preorders — the objects represent cities or intersections, and the relations represent the roads.
Reflexivity reflects the fact that if you have a route allowing you to get from point $a$ to point $b$ and one that allows you to go from $b$ to $c$, then you can go from $a$ to $c$ as well. Two-way roads may be represented by two arrows that form an isomorphism between objects. Objects that are such that you can always get from one object to the other form equivalence classes (ideally all intersections would be in one equivalence class, else you would have places from which you would not be able to go back from).
However, maps that contain more than one road (and even more than one route) connecting two intersections, cannot be represented using preorders. For that we would need categories (don’t worry, we will get there).
Let’s now reformat the preorder that we used in the previous two examples as a Hasse diagram that goes from left to right. Now, it (hopefully) doesn’t look so much like a hierarchy, nor like map, but like a description of a process (which, if you think about it, is also a map just one that is temporal rather than spatial.) This is actually a very good way to describe a computation model known as finite state machine.
A specification of a finite state machine consists of a set of states that the machine can have, which, as the name suggest, must be finite, and a bunch of transition functions that specify which state do we transition to (often expressed as tables.)
But as we saw, a finite state machine is similar to a preorder with a greatest and least object, in which the relations between the objects are represented by functions.
Finite state machines are used in organization planning e.g. imagine a process where a given item gets manufactured, gets checked by a quality control person, who, if they find some deficiencies, pass it to the necessary repairing departments and then they check it again and send it for shipping. This process can be modeled by the above diagram.
We saw that preorders are a powerful concept, so let’s take a deeper look at the law that governs them — the transitivity law. What this law tells us that if we have two pairs of relationship $a ≤ b$ and $b ≤ c$, then we automatically have a third one $a ≤ c$.
In other words, the transitivity law tells us that the $≤$ relationship composes i.e. if we view the “bigger than” relationship as a morphism we would see that the law of transitivity is actually the categorical definition of composition.
(we have to also verify that the relation is associative, but that’s easy)
So let’s review the definition of a category again.
A category is a collection of objects (we can think of them as points) and morphisms (arrows) that go from one object to another, where:
- Each object has to have the identity morphism.
- There should be a way to compose two morphisms with an appropriate type signature into a third one in a way that is associative.
Looks like we have law number 2 covered. What about that other one — the identity law? We have it too, under the name reflexivity.
So it’s official — preorders are categories (sounds kinda obvious, especially after we also saw that orders can be reduced to sets and functions using the inclusion order, and sets and functions form a category in their own right.)
And since partial orders and total orders are preorders too, they are categories as well.
When we compare the categories of orders to other categories, (like the quintessential category of sets), we see one thing that immediately sets them apart: in other categories there can be many different morphisms (arrows) between two objects and in orders can have at most one morphism, that is, we either have $a ≤ b$ or we do not.
In the contrast, in the category of sets where there are potentially infinite amount of functions from, say, the set of integers and the set of boolean values, as well as a lot of functions that go the other way around, and the existence of either of these functions does not imply that one set is “bigger” than the other one.
So, an order is a category that has at most one morphism between two objects. But the converse is also true — every category that has at most one morphism between objects is an order (or a thin category as it is called in category-theoretic terms).
An interesting fact that follows trivially from the fact that the they have at most one morphism between given two objects is that in thin categories all diagrams commute.
Task: Prove this.
While we are rehashing diagrams from the previous chapters, let’s look at the diagram defining the coproduct of two objects in a category, from chapter 2.
If you recall, this is an operation that corresponds to set inclusion in the category of sets.
But wait, wasn’t there some other operation that that corresponded to set inclusion? Oh yes, the join operation in orders. And not merely that, but joins in orders are defined in the exact same way as the categorical coproducts.
In category theory, an object $G$ is the coproduct of objects $Y$ and $B$ if the following two conditions are met:
In the realm of orders, we say that $G$ is the join of objects $Y$ and $B$ if:
It is bigger than both of these objects, so $Y ≤ G$ and $B ≤ G$.
It is smaller than any other object that is bigger than them, so for any other object $P$ such that $P ≤ G$ and $P ≤ B$ then we should also have $G ≤ P$.
We can see that the two definitions and their diagrams are the same. So, speaking in category theoretic terms, we can say that the categorical coproduct in the category of orders is the join operation. Which of course means that products correspond to meets.
Overall, orders (thin categories) are often used for exploring categorical concepts in a context that is easier to understand e.g. understand the order-theoretic concepts of meets and joins would help you better understand the more general categorical concepts of products and coproducts).
Orders are also helpful when they are used as thin categories i.e. as an alternative to “full-fledged” categories, in contexts when we aren’t particularily interested in the difference between the morphisms that go from one object to another. We will see an example of that in the next chapter.
Now let’s talk about one more seemingly unrelated topic just so we can “surprise” ourselves when we realize it’s category theory. By the way, in this chapter there will be another surprise in addition to that, so don’t fall asleep.
Also, I will not merely transport you to a different branch of mathematics, but to an entirely different discipline - logic.
Logic is the science of the possible. As such, it is at the root of all other sciences, all of which are sciences of the actual, i.e. that which really exists. For example, if science explains how our universe works then logic is the part of the description which is also applicable to any other universe that is possible to exist. A scientific theory aims to be consistent with both itself and observations, while a logical theory only needs to be consistent with itself (and true regardless of observations).
Logic studies the rules by which knowing one thing leads you to conclude (or prove) that some other thing is also true, regardless of the things’ domain (e.g. scientific discipline) and by only referring to their form.
On top of that, it (logic) tries to organize those rules in logical systems (or formal systems as they are also called).
Seeing this description, we might think that the subject of logic is quite similar to the subject of set theory and category theory, as we described it in the first chapter - instead of the word “formal” we used another similar word, namely “abstract”, and instead of “logical system” we said “theory”. This observation would be quite correct - today most people agree that every mathematical theory is actually logic plus some additional definitions added to it. For example, part of the reason why set theory is so popular as a theory for the foundations of mathematics is that it can be defined by adding just one single primitive to the standard axioms of logic which we will see shortly - the binary relation that indicates set membership. Category theory is close to logic too, but in a quite different way.
A consequence of logic being the science of the possible is that in order to do anything at all in it, we should have an initial set of propositions that we accept as true or false. These are also called “premises”, “primary propositions” or “atomic propositions” as Wittgenstein dubbed them.
In the context of logic itself, these propositions are abstracted away (i.e. we are not concerned about them directly) and so they can be represented with the colorful balls that you are familiar with.
At the heart of logic, as in category theory, is the concept of composition - if we have two or more propositions that are somehow related to one another, we can combine them into one using a logical operators, like “and”, “or” “follows” etc. The results would be new propositions, which we might call composite propositions to emphasize the fact that they are not primary.
This composition resembles the way in which two monoid objects are combined into one using the monoid operation. Actually, some logical operations do form monoids, like for example the operation and, with the proposition $true$ serving as the identity element.
However, unlike monoids/groups, logics study combinations not just with one but with many logical operations and the ways in which they relate to one another, for example, in logic we might be interested in the law of distributivity of and and $or$ operations and what it entails.
Important to note that $∧$ is the symbol for and and $∨$ is the symbol for $or$ (although the law above is actually valid even if and and $or$ are flipped).
When looking at the last diagram, it is important to emphasize that, propositions that are composed of several premises (symbolized by gray balls, containing some other balls) are not in any way different from “primary” propositions (single-color balls) and that they compose in the same way (although in the leftmost proposition the green ball is wrapped in a gray ball to make the diagram prettier).
As an example of a proposition that contains multiple levels of nesting (and also as a great introduction of the subject of logic in its own right), consider one of the oldest (it was already known by Stoics at 3rd century B.C.) and most famous propositions ever, namely the modus ponens.
Modus ponens is a proposition that is composed of two other propositions (which here we denote $A$ and $B$) and it states that if proposition $A$ is true and also if proposition $(A → B)$ is true (that is if $A$ implies $B$), then $B$ is true as well. For example, if we know that “Socrates is a human” and that “humans are mortal” (or “being human implies being mortal”), we also know that “Socrates is mortal.”
Let’s dive into this proposition. We can see that it is composed of two other propositions in a $follows$ relation, where the proposition that follows ($B$) is primary, but the proposition from which $B$ follows is not primary (let’s call that one $C$ - so the whole proposition becomes $C → B$.)
Going one more level down, we notice that the $C$ propositions is itself composed of two propositions in an and, relationship - $A$ and let’s call the other one $D$ (so $A ∧ D$), where $D$ is itself composed of two propositions, this time in a $follows$ relationship - $A → B$. But all of this is better visualized in the diagram.
We often cannot tell whether a given composite proposition is true or false without knowing the values of the propositions that it is made of. However, with propositions such as modus ponens we can: modus ponens is always true, regardless of whether the propositions that form it are true or false. If we want to be fancy, we can also say that it is true in all models of the logical system, a model being a set of real-world premises are taken to be signified by our propositions.
For example, our previous example will not stop being true if we substitute “Socrates” with any other name, nor if we substitute “mortal” for any other quality that humans possess.
Propositions that are always true are called tautologies. And their more-famous counterparts that are always false are called contradictions. You can turn each tautology into contradiction, or the other way around, by adding a “not”.
The other statements, ones which may be true or false depending on the values of some other propositions are called “contingent stantements”. In logic, we don’t care about contingent statements — after all, those are studied in all other sciences (and we are not like other sciences).
The simplest tautology is the so called law of identity, the statement that each proposition implies itself (e.g. “All bachelors are unmarried”). It may remind you of something.
Here are some more complex (less boring) tautologies (the symbol $¬$ means “not”/negation.
We will learn how to determine which propositions are a tautologies shortly, but first let’s see why are tautologies important in the first place.
Tautologies are useful because they are the basis of axiom schemas/rules of inference. And axiom schemas and rules of inference serve as starting point from which we can generate other true logical statements by means of substitution.
Realizing that the colors of the balls in modus ponens are superficial, we may want to represent the general structure of modus ponens that all of its variations share.
This structure (the one that looks like a coloring book in our example) is called axiom schema. And the propositions that are produced by it are axioms.
Note that the propositions that we plug into the schema don’t have to be primary. For example, having the proposition $a$ (that is symbolized below by the orange ball) and the proposition stating that $a$ implies $a \lor b$ (which is one of the tautologies that we saw above), we can plug those propositions into the modus ponens and prove that $a \lor b$ is true.
Axiom schemas and rules of inference are almost the same thing except that rules of inference allow us to actually distill the conclusion from the premises. For example in the case above, we can use modus ponens as a rule of inference to proves that $a \lor b$ is true.
All axiom schemas can be easily applied as rules of inference and the other way around.
Knowing that we can use axiom schemas/rules of inference to generate new propositions, we might ask whether it is possible to create a small collection of such schemas/rules that is curated in such a way that it enables us to generate all other propositions. You would be happy (although a little annoyed, I imagine) to learn that there exist not only one, but many such collections. And yes, collections of this sort are what we call logical systems.
Here is one such collection which consists of the following five axiom schemes in addition to the inference rule modus ponens (These are axiom schemes, even though we use colors).
Proving that this and other similar logical systems are complete (can really generate all other propositions) is due to Gödel and is known as “Gödel’s completeness theorem” (Gödel is so important that I specifically searched for the “ö” letter so I can spell his hame right).
We now have an idea about how do some of the main logical constructs (axioms, rules of inference) work. But in order to prove that they are true, and to understand what they are, we need to do so through a specific interpretation of those constructs.
We will look into two interpretations - one very old and the other, relatively recent. This would be a slight detour from our usual subject matter of points and arrows, but I assure you that it would be worth it. So let’s start.
Beyond the world that we inhabit and perceive every day, there exist the world of forms where reside all ideas and concepts that manifest themselves in the objects that we perceive e.g. beyond all the people that have ever lived, there lies the prototypical person, and we are people only insofar as we resemble that person, beyond all the things in the world that are strong, lies the ultimate concepts of strength, from which all of them borrow etc. And although, as mere mortals, we live in the world of appearances and cannot perceive the world of forms, we can, through philosophy, “recollect” with it and know some of its features.
The above is a summary of a worldview that is due to the Greek philosopher Plato and is sometimes called Plato’s theory of forms. Originally, the discipline of logic represents an effort to think and structure our thoughts in a way that they apply to this world of forms i.e. in a “formal” way. Today, this original paradigm of logic is known as “classical logic”. Although it all started with Plato, most of it is due to the 20th century mathematician David Hilbert.
The existence of the world of forms implies that, even if there are many things that we, people, don’t know and would not ever know, at least somewhere out there there exists an answer to every question. In logic, this translates to the principle of bivalence that states that each proposition is either true or false. And, due to this principle, propositions in classical logic can be aptly represented in set theory by the boolean set, which contains those two values.
According to the classical interpretation, you can think of primary propositions as just a bunch of boolean values. Logical operators are functions that take a one or several boolean values and return another boolean value (and composite propositions are, just the results of the application of these functions).
Let’s review all logical operators in this semantic context.
Let’s begin with the negation operation. Negation is a unary operation, which means that it is a function that takes just one argument and (like all other logical operators) returns one value, where both the arguments and the return type are boolean values.
The same function can also be expressed in a slightly less-fancy way by this table.
p | ¬p |
---|---|
True | False |
False | True |
Tables like this one are called truth tables and they are ubiquitous in classical logic. They can be used not only for defining operators but for proving results as well.
Having defined the negation operator, we are in position to prove the first of the axioms of the logical system we saw, namely the double negation elimination. In natural language, this axiom is equivalent to the observation that saying “I am not unable to do X” is the same as saying “I am able to do it”.
(despite its triviality, the double negation axiom is probably the most controversial result in logic, we will see why later.)
If we view logical operators as functions from and to the set of boolean values, than proving axioms involves composing several of those functions into one function and observing its output. More specifically, the proof of the formula above involves just composing the negation function with itself and verifying that it leaves us in the same place from which we started.
If we want to be formal about it, we might say that applying negation two times is equivalent to applying the identity function.
If we are tired of diagrams, we can represent the composition diagram above as table as well.
p | ¬p | ¬¬p |
---|---|---|
True | False | True |
False | True | False |
Each proposition in classical logic can be proved with such diagrams/tables.
OK, you know what and means and I know what it means, but what about those annoying people that want everything to be formally specified (nudge, nudge). Well we already know how we can satisfy them - we just have to construct the boolean function that represents and.
Because and is a binary operator, instead of a single value the function would accept a pair of boolean values.
Here is the equivalent truth-table (in which $∧$ is the symbol for and.)
p | q | p ∧ q |
---|---|---|
True | True | True |
True | False | False |
False | True | False |
False | False | False |
We can do the same for $or$, here is the table.
p | q | p ∨ q |
---|---|---|
True | True | True |
True | False | True |
False | True | True |
False | False | False |
Task: Draw the diagram for or.
Using those tables, we can also prove some axiom schemas we can use later:
Let’s now look into something less trivial: the implies operation, (also known as material condition). This operation binds two propositions in a way that the truth of the first one implies the truth of the second one (or that the first proposition is a necessary condition for the second.) You can read $p → q$ as “if $p$ is true, then $q$ must also be true.
Implies is also a binary function - it is represented by a function from an ordered pair of boolean values, to a boolean value.
p | q | p → q |
---|---|---|
True | True | True |
True | False | False |
False | True | True |
False | False | True |
Now there are some aspects of this which are non-obvious so let’s go through every case.
It might help you to remember that in classical logic $p → q$ ($p$ implies $q$) is true when $\neg p ∨ q$ (either $p$ is false or $q$ is true.)
Now, let’s review the operation that indicates that two propositions are equivalent (or, when one proposition is a necessary and sufficient condition for the other (which implies that the reverse is also true.)) This operation yields true when the propositions have the same value.
p | q | p ↔ q |
---|---|---|
True | True | True |
True | False | False |
False | True | False |
False | False | True |
But what’s more interesting about this operation is that it can be constructed using the implies operation - it is equivalent to each of the propositions implying the other one (so $p \leftrightarrow q$ is the same as $p \to q \land q \to p$) - something which we can easily prove by comparing some truth tables.
p | q | p → q | q → p | p → q ∧ q → p |
---|---|---|---|---|
True | True | True | True | True |
True | False | False | True | False |
False | True | True | False | False |
False | False | True | True | True |
Because of this, the equivalence operation is called “if and only if”, or “iff” for short.
Let’s examine the above formula, stating that $p → q$ is the same as $¬p ∨ q$.
We can easily prove this by using truth tables.
p | q | p → q | ¬p | q | ¬p ∨ q |
---|---|---|---|---|---|
True | True | True | False | True | True |
True | False | False | False | False | False |
False | True | True | True | True | True |
False | False | True | True | False | True |
But it would be much more intuitive if we do it using axioms and rules of inference. To do so, we start with the formula we have ($p → q$) plus the axiom schemas, and arrive at the formula we want to prove ($¬p ∨ q$).
Here is one way to do it. The formulas that are used at each step are specified at the right-hand side, the rule of inference is modus ponens.
Note that to really prove that the two formulas are equivalent we have to also do it the other way around (start with ($¬p ∨ q$) and ($p → q$)).
[…] logic is life in the human brain; it may accompany life outside the brain but it can never guide it by virtue of its own power. — L.E.J. Brouwer
I don’t know about you, but I feel that the classical truth-functional interpretation of logic (althought it works and is correct in its own right) doesn’t fit well the categorical framework that we are using here: It is too “low-level”, it relies on manipulating the values of the propositions. According to it, the operations and and or are just 2 of the 16 possible binary logical operations and they are not really connected to each other (but we know that they actually are.)
For these and other reasons, in the 20th century a whole new school of logic was founded, called intuitionistic logic. If we view classical logic as based on set theory, then intuitionistic logic would be based on category theory and its related theories. If classical logic is based on Plato’s theory of forms, then intuitionism began with a philosophical idea originating from Kant and Schopenhauer: the idea that the world as we experience it is largely predetermined of out perceptions of it. Thus without absolute standards for truth, a proof of a proposition becomes something that you construct, rather than something you discover.
Classical and intuitionistic logic diverge from one another right from the start: because according to intuitionistic logic we are constructing proofs rather than discovering them as some universal truth, we are off with the principle of bivalence. That is, in intuitionistic logic we have no basis to claim that each statements is necessarily true or false. For example, there might be a statements that might not be provable not because they are false, but simply because they fall outside of the domain of a given logical system (the twin-prime conjecture is often given as an example for this.)
Anyway, intuitionistic logic is not bivalent, i.e. we cannot have all propositions reduced to true and false.
One thing that we still do have there are propositions that are “true” in the sense that a proof for them is given - the primary propositions. So with some caveats (which we will see later) the bivalence between true and false proposition might be thought out as similar to the bivalence between the existence or absense of a proof for a given proposition - there either is a proof of it or there isn’t.
This bivalence is at the heart of what is called the Brouwer–Heyting–Kolmogorov (BHK) interpretation of logic, something that we will look into next.
The original formulation of the BHK interpretation is not based on any particular mathematical theory. Here, we will first illustrate it using the language of set theory (just so we can abandon it a little later).
As the existence of a proof of a proposition is taken to mean that the proposition is true, the definitions of and is rather simple - the proof of $A ∧ B$ is just a pair containing a proof of $A$, and a proof of $B$ i.e. a set-theoretic product of the two (see chapter 2). The principle for determining whether the proposition is true or false is similar to that of primary propositions - if the pair of proofs of $A$ and $B$ exist (i.e. if both proofs exist) then the proof of $A \land B$ can be constructed (and so $A \land B$ is “true”).
Question: what would be the or operation in this case?
Now for the punchline: in the BHK interpretation, the implies operation is just a function between proofs. Saying that $A$ implies $B$ ($A \to B$) would just mean that there exist a function which can convert a proof of $A$ to a proof of $B$.
And, (it gets even more interesting) the modus ponens rule of inference is nothing more than the process of functional application. i.e. if we have a proof of $A$ and a function $A \to B$ we can call this function to obtain a proof of $B$.
(In order to define this formally, we also need to define functions in terms of sets i.e. we need to have a set representing $A \to B$ for each $A$ and $B$. We will come back to this later.)
In the section on classical logic, we proved that two propositions $A$ and $B$ are equivalent if $A$ implies $B$ and $B$ implies $A$. But if the implies operation is just a function, then proposition are equivalent precisely when there are two functions, converting each of them to the other i.e. when the sets containing the propositions are isomorphic.
(Perhaps we should note that not all set-theoretic functions are proofs, only a designated set of them (which we call canonical functions) i.e. in set theory you can construct functions and isomorphisms between any pair of singleton sets, but that won’t mean that all proofs are equivalent.)
So according to BHK interpretation saying that $A$ is true, means that that we possess a proof of $A$ - simple enough. But it’s a bit harder to express the fact that $A$ is false: it is not enough to say that we don’t have a proof of $A$ (the fact that don’t have it, doesn’t mean it doesn’t exist). Instead, we must show that claiming that $A$ is true leads to a contradiction.
To express this, intuitionistic logic defines the constant $⊥$ which plays the role of False (also known as the “bottom value”). $⊥$ is defined as the proof of a formula that does not have any proofs. And the equivalent of false propositions are the ones that imply that the bottom value is provable (which is a contradiction). So $¬A$ is $A \to ⊥$.
In set theory, the $⊥$ constant is expressed by the empty set.
And the observation that propositions that are connected to the bottom value are false is expressed by the fact that if a proposition is true, i.e. there exists a proof of it, then there can be no function from it to the empty set.
The only way for there to be such function is if the set of proofs of the proposition is empty as well.
Task: Look up the definition of function and verify that there cannot exist a function from any set to the empty set
Task Look up the definition of function and verify that there does exist a function from the empty set to itself (in fact there exist a function from the empty set to any other set.
Although intuitionistic logic differs a lot from classical logic when it comes to its semantics, i.e. in the way the whole system is built (which we described above), it actually doesn’t differ so much in terms of syntax, i.e. if we try to deduce the axiom schemas/rules of inference that correspond to the definitions of the structures outlined above, we would see that they are virtually the same as the ones that define classical logic. There is, however, one exception concerning the double negation elimination axiom that we saw earlier, a version of which is known as the law of excluded middle.
This law is valid in classical logic and is true when we look at its truth tables, but there is no justification for it terms of the BHK interpretation. Why? in intuitionistic logic saying that something is false amounts to constructing a proof that it is false (that it implies the bottom value) and there is no method/function/alghorithm that can either prove that a given proposition is either true and false.
The question of whether you can use the law of excluded middle spawned a heated debate between the classical logic proponent David Hilbert and the intuitionistic logic proponent L.E.J. Brouwer, known as the Brouwer–Hilbert controversy.
Leaving the differences between intuitionistic and classical logics aside, the BHK interpretation is interesting because it provides that higher-level view of logic, that we need in order to construct a interpretation of it based on category theory.
Such higher-level interpretations of logic are sometimes called algebraic interpretations, algebraic being an umbrella term describing all structures that can be represented using category theory, like groups and orders.
Programmers might find the definition of the BHK interpretation interesting for other reason - it is very similar to a definition of a programming language: propositions are types, the implies operations are functions, and operations are composite types (objects), and or operations are sum types (which are currently not supported in most programming languages, but that’s a separate topic). Finally a proof of a given proposition is represented by a value of the corresponding type.
This similarity is known as the Curry-Howard isomorphism.
Task: The Curry-Howard isomorphism is also the basis of special types of programming languages called “proof assistants” which help you verify logical proofs. Install a proof assistant and try to see how it works (I recommend the Coq Tutorial by Mike Nahas).
Knowing about the Curry-Howard isomorphism and knowing also that programming languages can be described by category theory may lead us to think that category theory is part of this isomorphism as well. And we would be quite correct — this is why it is sometimes known as the Curry-Howard-Lambek isomorphism (Joachim Lambek being the person who formulated the categorical side). So let’s examine this isomorphism. As all other isomorphisms, it comes in two parts:
The first part is finding a way to convert a logical system into a category - this would not be hard for us, as sets form a category and the flavor of the BHK interpretation that we saw is based on sets.
Task: See whether you can prove that logic propositions and the “implies” relation form a category. What is missing?
The second part involves converting a category into a logical system - this is much harder. To do it, we have to enumerate the criteria that a given category has to adhere to, in order for it to be “logical”. These criteria have to guarantee that the category has objects that correspond to all valid logical propositions and no objects that correspond to invalid ones.
Categories that adhere to these criteria are called cartesian closed categories. To describe them here directly, but instead we would start with a similar but simpler structures that we already examined - orders.
So, we already saw that a logical system along with a set of primary propositions forms a category.
If we assume that there is only one way to go from proposition $A$, to proposition $B$ (or there are many ways, but we are not interested in the difference between them), then logic is not only a category, but a preorder in which the relationship “bigger than” is taken to mean “implies”, so ($A \to B$ is $A > B$).
Furthermore, if we count propositions that follow from each other (or sets of propositions that are proven by the same proof) as equivalent, then logic is a proper partial order.
And so it can be represented by a Hasse diagram, in which $A \to B$ only if $A$ is below $B$ in the diagram.
This is something quite characteristic of category theory — examining a concept in a more limited version of a category (in this case orders), in order to make things simpler for ourselves.
Now let’s examine the question that we asked before - exactly which categories orders represent logic and what laws does an order have to obey so it is isomorphic to a logical system? We will attempt to answer this question as we examine the elements of logic again, this time in the context of orders.
By now you probably realized that the and and or operations are the bread and butter of logic (although it’s not clear which is which). As we saw, in the BHK interpretation those are represented by set products and sums. The equivalent constructs in the realm of order theory are meets and joins (in category-theoretic terms products and coproducts.)
Logic allows you to combine any two propositions in and and or or relationship, so, in order for an order to be “logical” (to be a correct representation for a logical system,) it has to have $meet$ and $join$ operations for all elements. Incidentally we already know how such orders are called - they are called lattices.
And there is one important law of the and and or operations, that is not always present in all lattices. It concerns the connection between the two, i.e. way that they distribute, over one another.
Lattices that obey this law are called distributive lattices.
Wait, where have we heard about distributive lattices before? In the previous chapter we said that they are isomorphic to inclusion orders i.e. orders of sets, that contain a given collection of elements, and that contain all combinations of a given set of elements. The fact that they popped up again is not coincidental — “logical” orders are isomorphic to inclusion orders. To understand why, you only need to think about the BHK interpretation — the elements which participate in the inclusion are our prime propositions. And the inclusions are all combinations of these elements, in an or relationship (for simplicity’s sake, we are ignoring the and operation.)
The or and and operations (or, more generally, the coproduct and the product) are, of course, categorically dual, which would explain why the symbols that represent them $\lor$ and $\land$ are the one and the same symbol, but flipped vertically.
And even the symbol itself looks like a representation of the way the arrows converge. This is probably not the case, as this symbol is used way before Hasse diagrams were a thing — for all we know the $\lor$ symbol is probably symbolizes the “u” in “uel” (the latin word for “or”) and the and symbol is just a flipped “u”) — but I still find the similarity fascinating.
In order for a distributive lattice to represent a logical system, it has to also have objects that correspond to the values True and False (which are written $\top$ and $\bot$). But, to mandate that these objects exist, we must first find a way to specify what they are in order/category-theoretic terms.
A well-known result in logic, called the principle of explosion, states that if we have a proof of False (which we write as $\bot$) i.e. if we have a statement “False is true” if we use the terminology of classical logic, then any and every other statement can be proven. And we also know that no true statement implies False (in fact in intuitionistic logic this is the definition of a true statement). Based on these criteria we know that the False object would look like this when compared to other objects:
Circling back to the BHK interpretation, we see that the empty set fits both of these conditions.
Conversely, the proof of True which we write as $\top$, expressing the statement that “True is true”, is trivial and doesn’t say anything, so nothing follows from it, but at the same time it follows from every other statement.
So True and False are just the greatest and least objects of our order (in category-theoretic terms terminal and initial object.)
This is another example of the categorical concept of duality - $\top$ and $\bot$ are dual to each other, which makes a lot of sense if you think about it, and also helps us remember their symbols (althought if you are like me, you’ll spent a year before you stop wondering which one is which, every time I see them).
So in order to represent logic, our distributive lattice has to also be bounded i.e. it has to have greatest and least elements (which play the roles of True and False).
As we said, every lattice has representations of propositions implying one another (i.e. it has arrows), but to really represents a logical system it also has to have function objects i.e. there needs to be a rule that identifies a unique object $A → B$ for each pair of objects $A$ and $B$, such that all axioms of logic are followed.
We will describe this object in the same way we described all other operations — by defining a structure consisting of a of objects and arrows in which $A → B$ plays a part. And this structure is actually a categorical reincarnation our favorite rule of inference, the modus ponens.
Modus ponens is the essence of the implies operation, and, because we already know how the operations that it contains (and and implies) are represented in our lattice, we can directly use it as a definition by saying that the object $A → B$ is the one for which modus ponens rule holds.
The function object $A → B$ is an object which is related to objects $A$ and $B$ in such a way that such that $A ∧ (A → B) → B$.
This definition is not complete, however, because (as usual) $A → B$ is not the only object that fits in this formula. For example, the set $A → B ∧ C$ is also one such object, as is $A → B ∧ C ∧ D$ (not going to draw all the arrows here, because it will get too (and I mean too) messy).
So how do we set apart the real formula from all those “imposter” formulas? If you remember the definitions of the categorical product (or of its equivalent for orders, the meet operation) you would already know where this is going: we recognize that $A \to B$ is the upper limit of $A → B ∧ C$ and $A → B ∧ C ∧ D$ and all other imposter formulas that can be in the place of $X$ in $A ∧ X → B$. The relationship can be described in a variety of ways:
So, after choosing the best way to express the relationship (they are all equivalent) we are ready to present our final definition:
The function object $A → B$ is the topmost object which is related to objects $A$ and $B$ in such a way that $A ∧ (A → B) → B$.
The existence of this function object (called exponential object in category-theoretic terms) is the final condition for an order/lattice to be a representation of logic.
Note, by the way, that this definition of function object is valid specifically for intuinistic logic. For classical logic, the definition of is simpler — there $A → B$ is just $\lnot A ∨ B$, because of the law of excluded middle.
When we examined the if and only if operation can be defined in terms implies, that is $A \leftrightarrow B$ is equivalent to $A \to B \land B \to A$.
We have something similar for categorical logic as well — We say that when two propositions are connected to each other, then, particularily when we speak of orders, they are isomorphic.
In the previous section we saw some definitions, here we will convince ourselves that they really capture the concept of logic correctly, by proving some results using categorical logic.
The join (or least upper bound) of the topmost object $\top$ (which plays the role of the value True) and any other object that you can think of, is… $\top$ itself (or something isomorphic to it, which, as we said, is the same thing). This follows trivially from the fact that the join of two objects must be bigger or equal than both of these objects, and that there is no other object that is bigger or equal to the $\top$ is $\top$ itself. This is simply because $\top$ (as any other object) is equal to itself and because there is by definition no object that is bigger than it.
This corresponds to the logical statement that $A \lor \top$ is equal to $\top$ i.e. it is true. Hence, the above observation is a proof of that statement, (an alternative to truth tables).
Task: Think of the duel situation, with False. What does it imply, logically?
Above, we saw that the join between any random object and the top object is the top object itself. But, does this situation only occur when the second object is $\top$? Wouldn’t the same thing happen if $\top$ it is replaced by any other object that is higher than the first one?
The answer is “Yes”: when we are looking for the join of two object, we are looking for the least upper bound i.e. the lowest object that is above both of them. So, any time we have two objects and one is higher than the other, their join would be (isomorphic to) the higher object.
In other words, if $A \to B$, then $A \land B \leftrightarrow B$
For our first example with implies, let’s take the formula $A → B$, and examine the case when $A$ and $B$ are the same object. We said that, $A → B$ ($A → A$ in our case) is the topmost object $X$ for which the criteria given by the formula $A ∧ X → B$ is satisfied. But in this case, the formula is satisfied for any $X$, (because it evaluates to $A ∧ X → A$, which is always true), i.e. the topmost object that satisfies it is… the topmost object there is i.e. (an object isomorphic to) $True$.
Does this make sense? Of course it does: in fact, we just proved one of the most famous laws in logic (called the law of identity, as per Aristotel), namely that $A → A$ is always true, or that everything follows from itself.
And what happens if $A$ implies $B$ in any model, i.e. if $A \models B$ (semantic consequence)? In this case, $A$ would be below $B$ in our Hasse diagram (e.g. $A$ is the blue ball and $B$ is the orange one). Then the situation is somewhat similar to the previous case: $A ∧ X → B$ will be true, no matter what $X$ is (simply because $A$ already implies $B$, by itself). And so $A → B$ will again correspond to the $\top$ object.
This is again a well-known result in logic (if I am not mistaken, it will be a deduction theorem of some sort): if $A \models B$), then the statement $(A → B)$ will always be true.
Let’s recap: we saw that logic consists of the values True and False and the operations and or and implies.
So, in order to represent logic, an order has to have a greatest and least objects and it has to have a meet and join for each two object, and also a function object (the law of distributivity which we mentioned earlier is always true for lattices that have function object).
In other words it has to be a bounded ($\top$ and $\bot$) lattice ($\land$ and $\lor$) that has function objects ($\to$). Such lattices are called Heyting algebras.
By the way, a lattice can follow the laws of classical logic, as well. it has to be bounded and distributive and in addition to that it has to be complemented which is to say that each proposition $A$, there exist an a unique proposition $\neg A$ (such that $A ∨ \neg A = 1$ and $A ∧ \neg A = 0$). These lattices are called boolean algebras.
<!–
When thinking of logic from this perspective (algebraically, constructively, categorically, there are many ways to call it) you can view the logic as a map of all possible propositions and the connections between them.
Axioms are starting points.
and finding a proof is just finding a path from one point to another.
If classical logic is based on the belief that everything is either true or false, intuitionistic logic gives precedence to the famous common-sense principle that you cannot prove a negative.
which means that while you can given a true statement and follow the arrows to reach other true statements, false statements would remain unreachable.
Given a logical system, consisting of axioms and rules of inference, I define positive statements as statements of the type “X follows from the axioms” and negative statements as statements of type “X does not follow from the axioms”.
Given those definitions, a positive statement is proven by just applying the rules of inference to the axioms until you reach the statement you want to prove, while there is no general way to prove a negative statement.
https://www.algebraicjulia.org/blog/post/2021/09/cset-graphs-4/
https://personal.math.ubc.ca/~cytryn/teaching/scienceOneF10W11/handouts/OS.proof.4methods.html
https://en.wikibooks.org/wiki/Mathematical_Proof/Methods_of_Proof#Direct_proof
–>
From this chapter on, we will change the tactic a bit (as I am sure you are tired of jumping through different subjects) and we will dive at full throttle into the world of categories, using the structures that we saw so far as context. This will allow us to generalize some of the concepts that we examined in these structures and thus make them valid for all categories.
So far, we saw many different categories and category types. Let’s review them once more:
We began by reviewing the mother of all categories — the category of sets.
We also saw that it contains within itself many other categories, such as the category of types in programming languages.
We also learned about other algebraic objects that turned out to be just special types of categories, like categories that have just one object (monoids, groups) and categories that have only one morphism between any two objects (preorders, partial orders).
We also defined a lot of categories based on different concepts, like the ones based on logics/programming languages, but also some “less-serious ones”, as for example the color-mixing partial order/category.
And most importantly, we saw some categories that are completely made up, such as my soccer player hierarchy. Those are formally called finite categories.
Although they are not useful by themselves, the idea behind them is important — we can draw any combination of points and arrows and call it a category, in the same way that we can construct a set out of every combination of objects.
For future reference, let’s see some important finite categories.
The simplest category is $0$ (enjoy the minimalism of this diagram).
The next simplest category is $1$ — it is comprised of one object no morphism besides its identity morphism (which we don’t draw, as usual)
If we increment the number of objects to two, we see a couple of more interesting categories, like for example the category $2$ containing two objects and one morphism.
Task: There are just two more categories that have 2 objects and at most one morphism between two objects, draw them.
And finally the category $3$ has 3 objects and also 3 morphisms (one of which is the composition of the other two).
Many of the categories that we saw are similar to one another, as for example, both the color-mixing order and categories that represent logic have a greatest and a least object. To pinpoint such similarities, and understand what they mean, it is useful to have formal ways to connect categories with one another. The simplest type of such connection is the good old isomorphism.
In chapter 1 we talked about set isomorphisms, which establish an equivalence between two sets. In case you have forgotten, a set isomorphism is a two-way function between two sets.
It can alternatively be viewed as two “twin” functions such that each of which equals identity, when composed with the other one.
Then, in chapter 4, we encountered order isomorphisms and we saw that they are like set isomorphisms, but with one extra condition — aside from just being there, the functions that define the isomorphism have to preserve the order of the object e.g. the greatest object of one order should be connected to the greatest object of the other one, the least object of one order should be connected to the least object of the other one, and same for all objects that are in between.
Or more formally put, for any $a$ and $b$ if we have $a ≤ b$ we should also have $F(a) ≤ F(b)$ (and vise versa).
Now, we will generalize the definition of an order isomorphism, so it also applies to all other categories (i.e. to categories that may have more than one morphism between two objects):
Given two categories, an isomorphism between them is an invertible mapping between the underlying sets of objects, and an invertible mapping between the morphisms that connect them, which maps each morphism from one category to a morphism with the same signature.
After examining this definition closely, we realize that, although it sounds a bit more complex (and looks a bit messier) than the one we have for orders it is actually the same thing.
It is just that the so-called “morphism mapping” between categories that have just one morphism for any two objects are trivial, and so we can omit them.
Question: What are the morphism functions for orders?
However, when we can have more than one morphism between two given objects, we need to make sure that each morphism in the source category has a corresponding morphism in the target one, and for this reason we need not only a mapping between the categories’ objects, but one between their morphisms.
By the way, what we just did (taking a concept that is defined for a more narrow structure (orders) and redefining it for a more broad one (categories)) is called generalizing of the concept.
By examining them more closely, we realize that categorical isomorphisms are not so hard to define. However there is another issue with them, namely that they don’t capture the essence of what categorical equality should be. I have devised a very good and intuitive explanation why is it the case, that this margin section is too narrow to contain. So we will leave it for the next chapter, where we will also devise a more apt way to define a two-way connection between categories.
But first, we need to examine one-way connections between them, i.e. functors.
PS: Categorical isomorphisms are also very rare in practice — the only one that comes to mind me is the Curry-Howard-Lambek isomorphism from the previous chapter. That’s because if two categories are isomorphic then there is no reason at all to treat them as different categories — they are one and the same.
The logician Rudolf Carnap coined the term “functor” as part of his project to formalize the syntax for the natural languages such as English in order to create a precise way for us to talk about science. Originally, a functor meant a word or phrase whose meaning can be customized by combining it with numerical value, such as the phrase “the temperature at $x$ o’clock”, which has a different meaning depending on the value of $x$.
In other words, a functor is a phrase that acts as a function, only not a function between sets, but one between linguistic concepts (such as times and temperature).
Later, one of the inventors of category theory Sanders Mac Lane borrowed the word, to describe a something that acts as function between categories, which he defined in the following way:
A functor between two categories (let’s call them $A$ and $B$) consists of two mappings — a mapping that maps each object in $A$ to an object in $B$ and a mapping that maps each morphism between any objects in $A$ to a morphism between objects in $B$, in a way that preserves the structure of the category.
Now let’s unpack this definition by going through each of its components.
In the definition above, we use the word “mapping” to avoid misusing the word “function” for something that isn’t exactly a function. But in this particular case, calling the mapping a function would barely be a misuse — if we forget about morphisms and treat the source and target categories as sets, the object mapping is nothing but a regular old function.
A more formal definition of object mapping involves the concept of an underlying set of a category: Given a category $A$, the underlying set of $A$ is a set that has the objects of $A$ as elements. Utilizing this concept, we say that the object mapping of a functor between two categories is a function between their underlying sets. The definition of a function is still the same:
A function is a relationship between two sets that matches each element of one set, called the source set of the function, with exactly one element from another set, called the target set of the function.
The second mapping that forms the functor is a mapping between the categories’ morphisms. This mapping resembles a function as well, but with the added requirement that each morphism in $A$ a given source and target must be mapped to a morphism with the corresponding source and target in $B$, as per the object mapping.
A more formal definition of a morphism mapping involves the concept of the homomorphism set: this is a set that contains all morphisms that go between given two objects in a given category. When utilizing this concept, we say that a mapping between the morphisms of two categories consists of a set of functions between their respective homomorphism sets.
(Notice how the concepts of homomorphism set and of underlying set allowed us to “escape” to set theory when defining categorical concepts and define everything using functions.)
So these are the two mappings (one between objects and one between morphisms) that constitute a functor. But not every pair of such two mappings is a functor. As we said, in addition to existing, the mappings should preserve the structure of the source category into the target category. To see what that means, we revisit the definition of a category from chapter 2:
A category is a collection of objects (we can think of them as points) and morphisms (arrows) that go from one object to another, where:
- Each object has to have the identity morphism.
- There should be a way to compose two morphisms with an appropriate type signature into a third one, in a way that is associative.
So this definition translates to the following two functor laws
Functions between morphisms should preserve identities i.e. all identity morphisms should be mapped to other identity morphisms.
Functors should also preserve composition i.e. for any two morphisms $f$ and $g$, the morphism that corresponds to their composition $F(g•f)$ in the source category should be mapped to the morphism that corresponds to the composition of their counterparts in the target directory, so $F(g•f) = F(g)•F(f)$.
And these laws conclude the definition of functors — a simple but, as we will see shortly, very powerful concept.
To see why is it so powerful, let’s check some examples.
There is a common figure of speech (which is used all the time in this book) which goes like this:
If $a$ is like $F a$, then $b$ is like $F b$.
Or “$a$ is related to $F a$, in the same way as $b$ is related to $F b$,” e.g. “If schools are like corporations, then teachers are like bosses”.
This figure of speech is nothing but a way to describe a functor in our day-to-day language: what we mean by it is that there is a certain set of connections (or category-theory therms a “morphisms”) between schools and teachers, that is similar to the connections between corporations and bosses i.e. that there is some kind of structure-preserving map that connects the category of school-related things, to the category of work-related things which maps schools ($a$) to corporations ($F a$) and teacher ($b$) to bosses ($F b$), and which is such that the connections between schools and teachers ($a \to b$) are mapped to the connections between corporations and bosses ($F a \to F b$).
“A sign is something by knowing which we know something more.” — Charles Sanders Peirce
We will start with an example of a functor that is very meta — the diagrams/illustrations in this book.
You might have noticed that diagrams play a special role in category theory — while in other disciplines their function is merely complementary i.e. they only show what is already defined in another way, here the diagrams themselves serve as definitions.
For example, in chapter 1 we presented the following definition of functional composition.
The composition of two functions $f$ and $g$ is a third function $h$ defined in such a way that this diagram commutes.
We all see the benefit of defining stuff by means of diagrams as opposed to writing lengthy definitions like
“Suppose you have three objects $a$, $b$ and $c$ and two morphisms $f: b \to c$ and $g: a \to b$…”
However, it (defining stuff by means of diagrams) presents a problem — definitions in mathematics are supposed to be formal, so if we want to use diagrams as definitions we must first formalize the definition of a diagram itself.
So how can we do that? One key observation is that diagrams look as finite categories, as, for example, the above definition looks in the same way as the category $3$.
However, this is only part of the story as finite categories are just structures, whereas diagrams are signs. They are “something by knowing which we know something more.”, as Peirce famously put it (or “…which can be used in order to tell a lie”, in the words of Umberto Eco).
For this reason, aside from a finite category that encodes the diagram’s structure, the definition of a diagram must also include a way for “interpreting” this category in some other context i.e. they must include functors.
This is how the concept of functors allows us to formalize the notion of diagrams:
A diagram is comprised of one finite category (called an index category) and a functor from it to some other category.
If you know about semiotics, you may view the source and target categories of the functor as signifier and signified.
And so, you can already see that the concept of a functor plays a very important role in category theory. Because of it, diagrams in category theory can be specified formally i.e. they are categorical objects per se.
You might even say that they are categorical objects par excellence (TODO: remove that last joke).
A map is not the territory it represents, but, if correct, it has a similar structure to the territory, which accounts for its usefulness. Alfred Korzybski
Functors are sometimes called “maps” for a good reason — maps, like all other diagrams, are functors. If we consider some space, containing cities and roads that we travel by, as a category, in which the cities are objects and roads between them are morphisms, then a road map can be viewed as category that represent some region of that space, together with a functor that maps the objects in the map to real-world objects.
In maps, morphisms that are a result of composition are often not displayed, but we use them all the time — they are called routes. And the law of preserving composition tells us that every route that we create on a map corresponds to a real-world route.
Notice that in order to be a functor, a map does not have to list all roads that exist in real life, and all traveling options (“the map is not the territory”), the only requirement is that the roads that it lists should be actual — this is a characteristic shared by all many-to-one relationships (i.e. functions).
As we saw, in addition to category theory, functors apper in many disciplines that study the human mind, such as logic, linguistics, semiotics and the like. I thought about why is it so in a blog post that I wrote. My response to that question is that human perception, human thinking, is itself functorial: to perceive the world around us, we are going through a bunch of functors that go from more raw “low-level” mental models to more abstract “high-level” ones.
We may say that perception starts with raw sensory data. From it, we go, (using a functor) to a category containing some basic model of how the world works, mapping the objects we are seeing to some concepts that we formed in our mind.
Then we are connecting this model to another, even more abstract model (or models), which provide us with a higher-level overview of the situation that we are in, again using functorial connection (technically these are functors from subcategories).
You can view this as a progression of connections that go from simpler to more abstract (i.e. from categories with less morphisms to categories with more morphisms).
These connections are functorial in nature, because they work purely in terms of structure: the idea of a given object has nothing in common with the object itself (e.g. the idea of a biscuit isn’t round, sweet etc). What ideas have in common with the objects of representation is that the connections they have between one another mimic some connections a real-life buiscuit has with other objects.
All this is, of course, just a speculation, but how else would we be capable of forming thoughts, e.g. to imagine a person riding a bike and bumping in a tree, when no part of our brains (nor the impulses they create) resembles in any way people/bikes/trees?
The answer is structure — thoughts have the structure of the situation, that’s why they refer to it (although they cannot refer to any of the elements by itself).
So, after this slight detour, we will return to our usual modus operandi.
Hey, do you know that in group theory, there is this cool thing called group homomorphism (or monoid homomorphism when we are talking about monoids) — it is a function between the groups’ underlying sets which preserves the group operation.
So, for example, If the time of the day right now is 00:00 o’clock (or 12 PM) then what would the time be after $n$ hours? The answer to this question can be expressed as a function with the set of integers as source and target.
This function is interesting — it preserves the operation of (modular) addition: if, 13 hours from now the time will be 1 o’clock and if 14 hours from now it will be 2 o’clock, then the time after (13 + 14) hours will be (1 + 2) o’clock.
Or to put it formally, if we call it (the function) $F$, then we have the following equation: $F(a + b) = F(a) + F(b)$ (where $+$ in the right-hand side of the equation means modular addition). Because this equation holds, the $F$ function is a group homomorphism between the group of integers under addition and the group of modulo arithmetic with base 11 under modular addition (where you can replace 11 with any other number).
The groups don’t have to be so similar for there to be a homomorphism between them. Take, for example, the function that maps any number $n$ to 2 to the power of $n$, so $n \to 2ⁿ$ (here, again, you can replace 2 with any other number). This function gives a rise to a group homomorphism between the group of integers under addition and the integers under multiplication, or $F(a + b) = F(a) \times F(b)$.
Wait, what were we talking about, again? Oh yeah — group homomorphisms are functors. To see why, we switch to the category-theoretic representation of groups and revisit our first example and (to make the diagram simpler, we use $mod2$ instead of $mod11$).
It seems that when we view groups/monoid as one-object categories, a group/monoid homomorphism is just a functor between these categories. Let’s see if that is the case.
Groups/monoids have just one object when viewed as categories, so there is also only one possible object mapping between any couple of groups/monoids — one that maps the (one and only) object of the source group to the object of the target group (not depicted in the diagram).
Because of the above, the morphism mapping is the only relevant component of the group homomorphism. In the category-theoretic perspective, group objects (like $1$ and $2$ $3$ etc.) correspond to morphisms (like $+1$, $+2$ $+3$ etc.) and so the morphism mapping is just mapping between the group’s objects, as we can see in the diagram.
The first functor law trivial, it just says that the one and only identity object of the source group (which corresponds to the identity morphism of its one and only object) should be mapped to the one and only identity object of the target group.
And if we remember that the group operation of combining two objects corresponds to functional composition if we view groups as categories, we realize that the group homomorphism equation $F(a + b) = F(a) \times F(b)$ is just a formulation of the second functor law: $F(g•f) = F(g)•F(f)$.
And many algebraic operations satisfy this equation, for example the functor law for the group homomorphism between $n \to 2ⁿ$ is just the famous algebraic rule, stating that $gᵃ gᵇ= gᵃ⁺ᵇ$.
Task: Although it’s trivial, we didn’t prove that the first functor law (the one about the preservation of identities always holds. Interestingly enough, for groups/monoids it actually follows from the second law. Try to prove that. Start with the definition of the identity function.
And now let’s talk about a concept that is completely unrelated to functors, nudge-nudge (hey, bad jokes are better than no jokes at all, right?) In the theory of orders, we have the concept of functions between orders (which is unsurprising, given that orders, like monoids/groups, are based on sets) and one very interesting type of such function, which has applications in calculus and analysis, is a monotonic function (also called monotone map). This is a function between two orders that *preserves the order of the objects in the source order, in the target order. So a function $F$ is monotonic when for every $a$ and $b$ in the source order, if $a ≤ b$ then $F(a) ≤ F(b)$.
For example, the function that maps the current time to the distance traveled by some object is monotonic because the distance traveled increases (or stays the same) as time increases.
If we plot this or any other monotonic function on a line graph, we see that it goes just one direction (i.e. just up or just down).
Now we are about to prove that monotonic functions are functors too, ready?
Like with categories, the object mapping of an order is represented by a function between the orders’ underlying sets.
With monoids, the object mapping component of functors was trivial. Here is the reverse: the morphism mapping is trivial - given a morphism between two objects from the source order, we map that morphism to the morphism between their corresponding objects in the target order. The fact that the monotonic function respects the order of the elements, ensures that the latter morphism exists.
It is not hard to see that monotone maps obey the first functor law as identities are the only morphisms that go between a given object and itself.
And the second law ($F(g•f) = F(g)•F(f)$) also follows trivially: both morphisms $F(g•f)$ and $F(g)•F(f)$ have the same type signature. But because in orders there can be just one morphism with a given type signature, these two morphisms must be equal to one another.
Task: Expand the proof.
OK, enough with this abstract nonsense, let’s talk about “normal” functions — ones between numbers.
In calculus, there is this concept of linear functions (also called “degree one polynomials”) that are sometimes defined as functions of the form $f(x) = xa$ i.e. ones that contain no operations other than multiplying the argument by some constant (designated as $a$ in the example).
But if we start plotting some such functions we will realize that there is another way to describe them — their graphs are always comprised of straight lines.
Question: Why is that?
Another interesting property of these functions is that most of them preserve addition, that is for any $x$ and $y$, you have $f(x) + f(y) = f(x + y)$. We already know that this equation is equivalent to the second functor law. So linear functions are just functors between the group of natural numbers under addition and itself. As we will see later, they are example of functors in the category of vector spaces.
Question: Are the two formulas we presented to define linear functions completely equivalent?
And if we view that natural numbers as an order, linear functions are also functors as well, as all functions that are plotted with straight lines are obviously monotonic.
Note, however, that not all functions that are plotted straight lines preserve addition — functions of the form $f(x) = x * a + b$ in which $b$ is non-zero, are also straight lines (and are also called linear) but they don’t preserve addition.
For those, the above formula looks like this: $f(x) + b + f(y) + b = f(x + y) + b$.
Types in programming language form a category, and in that category there are some functors that programmers use every day, such as the list functor, that we will use as an example. The list functor is an example of a functor that maps the realm of simple (primitive) types and functions to the realm of more complex (generic) types and functions.
But let’s start with the basics: defining the concept of a functor in programming context is as simple as changing the font we use in our formulas from “modern” to “monospaced”. And also, using the more “programmy” terms listed in the table in chapter 2.
A functor between two categories (let’s call them
A
andB
) consists of a mapping that maps eachobjecttype inA
to a type inB
and a mapping that maps eachmorphismfunction between types inA
to a function between types inB
, in a way that preserves the structure of the category.
Comparing these definitions makes us realize that mathematicians and programmers are two very different communities, that are united by the fact that they both use functors (and by their appreciation of peculiar typefaces).
The first component of a functor is a mapping that converts one type (let’s call it A
) to another type (B
). So it is like a function, but between types. Such constructions are supported by almost all programming languages that have static type checking in the first place — they go by the name of generic types. A generic type is nothing but a function that maps one (concrete) type to another (this is why generic types are sometimes called type-level functions).
Note that although the diagrams they look similar, a type-level function is completely different from a value-level function, e.g. a value-level function from a
, to List<a>
(or in mathy Haskell/ML-inspired notation $a \to List\ a$) converts a value of type a
to a value of type List<a>
and is different from the type-level function List<A>
as that one converts a type $a$ to a type $List\ a$ (e.g. the type string
to the type $List\ string$, $number$ to $List\ number$ etc.).
So the type mapping of a functor is simply a generic type in a programming language (we can also have functors between two generic types, but we will review those later). So what is the function mapping — this is a mapping that convert any function operating on simple types, like $string \to number$ to a function between their more complex counterparts e.g. $List\ string \to List\ number$.
In programming languages, this mapping is represented by a higher-order function called map
with a signature (using Haskell notation), $(a \to b) \to (Fa \to Fb)$, where $F$ represents the generic type.
Note that, although any possible function that has this type signature (that that obeys the functor laws) gives rise to a functor, not all such functors are useful. Usually, there is only one of them that makes sense for a given generic type and that’s why we talk about the list functor, and we define map
directly in the in the generic datatype, as a method.
In the case of lists and similar structures, the useful implementation of map
is the one that applies the original (simple) function to all elements of the list.
class Array<A> {
map (f: A ➞ B): Array<B> {
let result = [];
for (obj of this) {
result.push(f(obj));
}
return result;
}
}
Aside from facilitating code reuse by bringing in all standard functions of simple types in a more complex context, map
allows us to work in a way that is predictable, courtesy of the functor laws, which in programming context look like this.
Identity law:
a.map(a => a) == a
Composition law:
a.map(f).map(g) == a.map((a) => g(f(a)))
Task: Use examples to verify that the laws are followed.
Now, that we have seen so many examples of functors, we finally can attempt to answer the million-dollar question, namely what are functors for and why are they useful? (often formulated also as “Why are you wasting your/my time with this (abstract) nonsense?”)
Well, we saw that maps are functors and we know that maps are useful, so let’s start from there.
So, why is a map useful? Well, it obviously has to do with the fact that the points and arrows of the map corresponds to the cities and the roads in the place you are visiting in i.e. due to the very fact that it is a functor, but there is a second aspect as well - maps (or at least those of them that are useful) are simpler to work with than the actual things they represent. For example, road maps are useful, because they are smaller than the territory they represent, so it is much easier to go look up the routes between two given places by following a map, than to actually travel through all them in real life.
And functors in programming are used for similar reason - functions that involve simple types like string
, number
, boolean
etc. are … simple, and least when compared with functions that work with lists and other generic types. Using the map
function allows us to operate on such types without having to think about them and to derive functions that transform them, from functions that transform simple values. In other words, functors are means of abstraction.
Of course, not all routes on the map and no functions that between generic datatypes can be derived just by functions between the types they contain. This is generally true for many “useful” functors: because their source categories are “simpler” than the target, some of the morphisms in the target have no equivalents in the source i.e. making the model simpler inevitably results in losing some of its capabilities. This is a consequence of “the map is not the territory” principle (or “every abstraction is a leaky abstraction”, as Joel Spolsky puts it).
Now, before we close it off, we will review one more functor-related concept that is particularly useful in programming - pointed endofunctors.
To understand what pointed endofunctors are, we have to first understand what are endofunctors, and we already saw some examples of those in the last section. Let me explain: from the way the diagrams there looked like, we might get the impression that different type families belong to different categories.
But that is not the case - all type families from a given programming language are actually part of one and the same category - the category of types.
Wait, so this is permitted? Yes, these are exactly what we call endofunctors i.e. ones that have one and the same category as source and target.
So, what are some examples of endofunctors? I want to focus on one that will probably look familiar to you - it is the identity functor of each category, the one that maps each object and morphism to itself.
And it might be familiar, because an identity functor is similar to an identity morphism - it allow us to talk about value-related stuff without actually involving values.
Finally, the identity functor, together with all other functors to which the identity functor can be naturally transformed are called pointed functors (i.e. a functor is pointed if there exist a morphism from the identity functor to it). As we will see shortly, the list functor is a pointed functor.
We still haven’t discussed what does it mean for one functor to be naturally transformed to another one (although the commuting diagram above can give you some idea). This is a complex concept and we have a whole chapter about it (the next one).
However if we concentrate solely on the category of types in programming languages, then a natural transformation is just a function that translates each value of what we called the “simple types” to a value of the functor’s generic type i.e. $a \to F\ a$), in a way that this diagram commutes.
What does it take for this diagram to commute? It means that when you have two equivalent routes for reaching from the top-left diagonal to the bottom-right diagonal i.e. that applying any function between any two types ($a \to b$), followed by the lifting function ($b \to F\ b$), is equivalent to applying the lifting function first ($a \to F\ a$), and then the mapped version of the original function second ($F\ a \to F\ b$).
The list functor is pointed, because such a function exist for the list functor - it is the function $a \to [\ a\ ]$ that puts every value in a “singleton” list. So, for every function between simple types, such as the function $length:\ string \to number$ we have a square like this one.
And the fact that the square commutes is expressed by the following equality:
[a].map(f) = [f(a)]
By the way, it may not look like it right now, but this commuting square might be the one of the most-important diagram that exist in category theory, second to only the triangle of functional composition.
Ha, I got you this time (or at least I hope I did) - you probably thought that I won’t introduce another category in this chapter, but this is exactly what I am going to do now. And (surprise again) the new category won’t be the category of functors (don’t worry, we will introduce that in the next chapter). Instead, we will examine the category of (small) categories, that has all the categories that we saw so far as objects and functors as its morphisms, like $Set$ - the category of sets, $Mon$, the category of monoids, $Ord$, the category of orders etc.
We haven’t yet mentioned the fact that functors compose (and in an associative way at that), but since a functor is just a bunch of functions, it is no wonder that it does.
Task: Go through the functor definition and see how do they compose.
Question: What are the initial and terminal object of the category of small categories.
The recursive nature of category theory might sometimes leave us confused: we started by saying that categories are composed of objects and morphisms, but now we are saying that there are morphisms between categories (functors). And on top of that, there is a category where the objects are categories themselves. Does that mean that categories are an example of… categories? Sounds a bit weird on intuitive level (as for example biscuits don’t contain other biscuits and houses don’t use houses as building material), but it is actually the case. Like, for example, every monoid is a category with one just object, but at the same time, monoids can be seen as belonging to one category - the category of monoids, where they are connected by monoid homomorphisms. We also have the category of groups, for example, which contains the category of monoids as a subcategory, as all monoids are groups etc.
Category theory does categorize everything, so, from a category-theoretic standpoint, all of maths is categories all the way down. Whether you would threat a given category as a universe or as a point depends solemnly on the context. Category theory is an abstract theory. That is, it does not seek to represent an actual state of affairs, but to provide a language that you can use to express many different ideas.