Praise

“The range of applications for category theory is immense, and visually conveying meaning through illustration is an indispensable skill for organizational and technical work. Unfortunately, the foundations of category theory, despite much of their utility and simplicity being on par with Venn Diagrams, are locked behind resources that assume far too much academic background.

Should category theory be considered for this academic purpose or any work wherein clear thinking and explanations are valued, beginner-appropriate resources are essential. There is no book on category theory that makes its abstractions so tangible as “Category Theory Illustrated” does. I recommend it for programmers, managers, organizers, designers, or anyone else who values the structure and clarity of information, processes, and relationships.”

Evan Burchard, Author of “The Web Game Developer’s Cookbook” and “Refactoring JavaScript”

“The clarity, consistency and elegance of diagrams in ‘Category Theory Illustrated’ has helped us demystify and explain in simple terms a topic often feared.”

Gonzalo Casas, Software developer and lecturer at ETH Zurich

In memory of

Francis William Lawvere

1937 - 2023

\pagebreak

"Try as you may,

you just can't get away,

from mathematics"

Tom Lehrer

\pagebreak

The story behind this book

I was interested in math as a kid, but was always messing up calculations, so I decided it was not my thing and started pursuing other interests, like writing and visual art.

A little later I got into programming and I found that this was similar to the part of mathematics that I enjoyed. I started using functional programming in an effort to explore the similarity and to improve myself as a developer. I discovered category theory a little later.

Some 5 years ago I found myself jobless for a few months and decided to publish some of the diagrams that I drew as part of the notes I kept when was reading “Category Theory for Scientists” by David Spivak. The effort resulted in a rough version of the first two chapters of this book, which I published online.

A few years after that some people found my notes and encouraged me write more. They were so nice that I forgot my imposter syndrome and got to work on the next several chapters.

On math

Ever since Newton’s Principia, the discipline of mathematics is viewed in the somewhat demeaning position of “science and engineering’s workhorse” — only “useful” as a means for helping scientists and engineers to make technological and scientific advancements, i.e., it is viewed as just a tool for solving “practical” problems.

Because of this, mathematicians are in a weird and, I’d say, unique position of always having to defend what they do with respect to its value for other disciplines. I again stress that this is something that would be considered absurd when it comes to any other discipline.

People don’t expect any return on investment from physical theories, e.g., no one bashes a physical theory for having no utilitarian value.

And bashing philosophical theories for being impractical would be even more absurd — imagine bashing Wittgenstein, for example:

“All too well, but what can you do with the picture theory of language?” “Well, I am told it does have its applications in programming language theory…”

Or someone being sceptical to David Hume’s scepticism:

“That’s all fine and dandy, but your theory leaves us at square one in terms of our knowledge. What the hell are we expected to do from there?”

Although many people don’t necessarily subscribe to this view of mathematics as a workhorse, we can see it encoded inside the structure of most mathematics textbooks — each chapter starts with an explanation of a concept, followed by some examples, and then ends with a list of problems that this concept solves.

There is nothing wrong with this approach, but mathematics is so much more than a tool for solving problems. It was the basis of a religious cult in ancient Greece (the Pythagoreans), it was seen by philosophers as means to understanding the laws which govern the universe. It was, and still is, a language which can allow for people with different cultural backgrounds to understand each other. And it is also art and a means of entertainment. It is a mode of thinking, Or we can even say it is thinking itself. Some people say that “writing is thinking”, but I would argue that writing, when refined enough, and free from any kind of bias in on the side of the author, automatically becomes mathematical writing — you can almost convert the words into formulas and diagrams.

Category theory embodies all these aspects of mathematics, so I think it’s very good grounds to writing a book where all of them shine — a book that isn’t based on solving of problems, but exploring concepts and seeking connections between them. A book that is, overall, pretty.

Who is this book for

So, who is this book for? Some people would phrase the question as “Who should read this book”, but if you ask it this way, then the answer is “nobody”. Indeed, if you think in terms of “should”, mathematics (or at least the type of mathematics that is reviewed here) won’t help you much, although it is falsely advertised as a solution to many problems (whereas it is, in fact, (as we established) something much more).

Let’s take an example — many people claim that Einstein’s theories of relativity are essential for GPS-es to work properly. Due to relativistic effects, the clocks on GPS satellites tick faster than identical clocks on the ground.

They seem to think that if the theory didn’t exist, the engineers that developed the GPSes would have faced this phenomenon in the following way:

Engineer 1: Whoa, the clocks on the satellites are off by X nanoseconds!

Engineer 2: But that’s impossible! Our mathematical model predicts that they should be correct.

Engineer 1: OK, so what do we do now?

Engineer 2: I guess we need to drop this project until we have a viable mathematical model that describes time in the universe.

Although I am not an expert in special relativity, I suspect that the way this conversation would have developed would be closer to the following:

Engineer 1: Whoa, the clocks on the satellites are off by X nanoseconds!

Engineer 2: This is normal. There are many unknowns.

Engineer 1: OK, so what do we do now?

Engineer 2: Just adjust it by X and see if it works. Oh, and tell that to some physicist. They might find it interesting.

In other words, we can solve problems without any advanced math, or with no math at all, as evidenced by the fact that the Egyptians were able to build the pyramids without even knowing Euclidean geometry. And with that I am not claiming that math is so insignificant, that it is not even good enough to serve as a tool for building stuff. Quite the contrary, I think that math is much more than just a simple tool. So going through any math textbook (and of course especially this one) would help you in ways that are much more vital than finding solutions to “complex” problems.

Some people say that we don’t use maths in our daily life. But, if true, that is only because other people have solved all hard problems for us and the solutions are encoded on the tools that we use, however not knowing math means that you will be forever a consumer, bound to use those existing tools and solutions and thinking patterns, not being able to do anything on your own.

And so “Who is this book for” is not to be read as who should, but who can read it. Then, the answer is “everyone”.

About category theory

Like we said, the fundaments of mathematics are the fundaments of thought. Category theory allows us to formalize those fundaments that we use in our daily (intellectual) lives.

The way we think and talk is based on intuition that develops naturally and is a very easy way to get our point across. However, intuition also makes it easy to be misunderstood — what we say usually can be interpreted in many ways, some of which are wrong. Misunderstanding of these kinds are the reason why biases appear. Moreover, certain people (called “sophists” in ancient Greece) would introduce biases on purpose in order to twist the discourse in the direction that suits them.

It’s in such situations, that people often resort to formulas and diagrams to refine their thoughts. Diagrams (even more than formulas) are ubiquitous in science and mathematics.

Category theory formalizes the concept of diagrams and their components — arrows and objects — to create a language for presenting all kinds of ideas. In this sense, category theory is a way to unify knowledge, both mathematical and scientific, and to unite various modes of thinking with common terms.

As a consequence of that, category theory and diagrams are also a very understandable way to communicate a formal concept clearly, something I hope to demonstrate in the following pages.

Summary

In this book we will visit various such modes of knowledge and along the way, see all kinds of mathematical objects, viewed through the lens of categories.

We start with set theory in chapter 1, which is the original way to formalize different mathematical concepts.

Chapter 2 we will make a (hopefully) gentle transition from sets to categories while showing how the two compare and (finally) introducing the definition of category theory.

In the next two chapters, 3 and 4, we jump into two different branches of mathematics and introduce their main means of abstraction, groups and orders, observing how they connect to the core category-theoretic concepts that we introduced earlier.

Chapter 5 also follows the main formula of the previous two chapters, getting to the heart of the matter of why category theory is a universal language, by showing its connection with the ancient discipline of logic. As in chapters 3 and 4, we start with a crash course in logic itself.

The connection between all these different disciplines is examined in chapter 6, using one of the most interesting category-theoretical concepts — the concept of a functor.

In chapter 7 we review another more interesting and more advanced categorical concept, the concept of a natural transformation.

Acknowledgments

Thanks to my wife Dimitrina, for all her support.

My daughter Daria, my “anti-author” who stayed seated on my knees when I was writing the second and third chapters and mercilessly deleted many sentences, most of them bad.

Thanks to my high-school arts teacher, Mrs Georgieva who told me that I have some talent, but I have to work.

Thanks to Prathyush Pramod who encouraged me to finish the book and is also helping me out with it.

And also to everyone else who submitted feedback and helped me fix some of the numerous errors that I made — knowing myself, I know that there are more.

Sets

Let’s begin our inquiry by looking at the basic theory of sets. Set theory and category theory share many similarities. We can view category theory as a generalization of set theory. That is, it’s meant to describe the same thing as set theory (everything?), but to do it in a more abstract manner, one that is more versatile and (hopefully) simpler.

In other words, sets are an example of a category (the proto-example, we might say), and it is useful to have examples.

What is an Abstract Theory

Instead of asking what can be defined and deduced from what is assumed to begin with, we ask instead what more general ideas and principles can be found, in terms of which what was our starting-point can be defined or deduced. — Bertrand Russell, from “Introduction to Mathematical Philosophy”

Most scientific and mathematical theories have a specific domain, which they are tied to, and in which they are valid. They are created with this domain in mind and are not intended to be used outside of it. For example, Darwin’s theory of evolution is created in order to explain how different biological species came to evolve using natural selection, quantum mechanics is a description of how particles behave at a specific scale, etc.

Even mathematical theories, although they are not inherently bound to a specific domain (like the scientific theories) are at least strongly related to some domain, as for example differential equations are created to model how events change over time.

Set theory and category theory are different, they are not created to provide a rigorous explanation of how a particular phenomenon works, instead they provide a more general framework for explaining all kinds of phenomena. They work less like tools and more like languages for defining tools. Such theories are called abstract theories.

The borders of the two are sometimes blurry. All theories use abstraction, otherwise they would be pretty useless: without abstraction Darwin would have to speak about specific animal species or even individual animals. The difference is that abstract theories have core concepts that don’t refer to anything in particular, and are instead left for people to generalize on. All theories are applicable outside of their domains, but set theory and category theory do not have a domain to begin with.

Concrete theories, like the theory of evolution, are composed of concrete concepts. For example, the concept of a population, also called a gene-pool, refers to a group of individuals that can interbreed. Abstract theories, like set theory, are composed of abstract concepts, like the concept of a set. The concept of a set by itself does not refer to anything. However, we cannot say that it is an empty concept, as there are countless things that can be represented by sets, for example, gene pools can be (very aptly) represented by sets of individual animals. Animal species can also be represented by sets — a set of all populations that can theoretically interbreed.

You’ve already seen how abstract theories may be useful. Because they are so simple, they can be used as building blocks to many concrete theories. Because they are common, they can be used to unify and compare different concrete theories, by putting these theories in common grounds (this is very characteristic of category theory, as we will see later). Moreover, good (abstract) theories can serve as mental models for developing our thoughts.

Sets

“A set is a gathering together into a whole of definite, distinct objects of our perception or of our thought—which are called elements of the set.” – Georg Cantor

Perhaps unsurprisingly, everything in set theory is defined in terms of sets. A set is a collection of things where the “things” can be anything you want (like individuals, populations, genes, etc.) Consider, for example, these balls.

Balls

Let’s construct a set, call it $G$ (as gray) that contains all of them as elements. There can only be one such set: because a set has no structure (there is no order, no ball goes before or after another, there are no members which are “special” with respect to their membership of the set.) Two sets that contain the same elements are just two pictures of the same set.

The set of all balls

This example may look overly-simple, but in fact, it’s just as valid as any other.

The key insight that makes the concept useful is the fact that it enables you to reason about several things as if they were one.

Subsets

Let’s construct one more set. The set of all balls that are warm in color. Let’s call it $Y$ (because in the diagram, it’s colored in yellow).

The set of all balls of warm colors

Notice that $Y$ contains only elements that are also present in $G$. That is, every element of the set of $Y$ is also an element in the set $G$. When two sets have this relation, we may say that $Y$ is a subset of $G$ (or $Y \subseteq G$). A subset resides completely inside its superset when the two are drawn together.

Y and G together

Singleton Sets

The set of all red balls contains just one ball. We said above that sets summarize several elements into one. Still, sets that contain just one element are perfectly valid — simply put, there are things that are one of a kind. The set of kings/queens that a given kingdom has is a singleton set.

The singleton set of red balls

What’s the point of the singleton set? Well, it is part of the language of set theory, e.g., if we have a function which expects a set of given items, but if there is only one item that meets the criteria, we can just create a singleton set with that item.

The Empty set

Of course if one is a valid answer, zero can be also. If we want a set of all black balls $B$ or all the white balls, $W$, the answer to all these questions is the same — the empty set.

The empty set

Because a set is defined only by the items it contains, the empty set is unique — there is no difference between the set that contains zero balls and the set that contains zero numbers, for instance. Formally, the empty set is marked with the symbol $\varnothing$ (so $B = W = \varnothing$).

The empty set has some special properties, for example, it is a subset of every other set. Mathematically speaking, $\forall A \to \varnothing \subseteq A$ ($\forall$ means “for all”)

Functions

“By function I mean the unity of the act of arranging various representations under one common representation.” — Immanuel Kant, from “The Critique of Pure Reason”

A function is a relationship between two sets that matches each element of one set, called the source set of the function, with exactly one element from another set, called the target set of the function.

These two sets are also called the domain and codomain of the function, or its input and output. In programming, they go by the name of argument type and return type. In logic, they correspond to the premise and conclusion (we will get there). We might also say, depending on the situation, that a given function goes from this set to that other one, connects this set to the other, or that it converts a value from this set to a value from the other one. These different terms demonstrate the multifaceted nature of the concept of function.

Different types of functions

Here is a function $f$, which converts each ball from the set $R$ to the ball with the opposite color in another set $G$ (in mathematics a function’s name is often accompanied by the names of its source and target sets, like this: $f: R → G$)

Opposite colors

This is probably one of the simplest type of function that exists — it encodes a one-to-one relationship between the sets. That is to say, one element from the source is connected to exactly one element from the target (and the other way around).

But functions can also express relationships of the type many-to-one, where many elements from the source might be connected to one element from the target (but not the other way around). Below is one such function.

Function from a bigger set to a smaller one

Such functions might represent operations such as categorizing a given collection of objects by some criteria, or partitioning them, based on some property that they might have.

A function can also express relationships in which some elements from the target set do not play a part.

Function from a smaller set to a bigger one

An example might be the relationship between some kind of pattern or structure and the emergence of this pattern in some more complicated context.

We saw how versatile functions are, but there is one thing that you cannot have in a function. You cannot have a source element that is not mapped to anything, or that is mapped to more than one target element — that would constitute a many-to-many relationship and as we said functions express many-to-one relationships. There is a reason for that “design decision”, and we will arrive at it shortly.

Functions in everyday life

Sets and functions can express relationships between all kinds of objects, and even people. Every question that you ask that has an answer can be expressed as a function.

The question “How far are we from New York?” is a function that has the set of all places in the world as source set and its target set consisting of the set of all natural numbers.

The question “Who is my father?” is a function whose source is the set of all people in the world.

Task 1: What is the target of this function?

Note that the question “Who is my child?” is NOT a straightforward function, because a person can have no children, or can have multiple children. We will learn to represent such questions as functions later.

Task 2: Do all functions that we drew at the beginning express something? Do you think that a function should express something in order to be valid?

The Identity Function

For every set $G$, no matter what it represents, we can define the function that does nothing — a function which maps every element of $G$ to itself. It is called the identity function of $G$ or $ID_{G}: G → G$.

The identity function

You can think of $ID_{G}$ as a function which represents the set $G$ in the realm of functions. Its existence allows us to prove many theorems, that we “know” by intuition, formally.

Functions and Subsets

For each set and subset, no matter what they represent, we can define a function (called the image of the subset) that maps each element of the subset to itself:

Function from a smaller set to a bigger one

Every set is a subset of itself, in which case this function is the same as the identity.

Functions and the Empty Set

Although it doesn’t look like it, there is a unique function from the empty set to any other set.

Function with empty set

Task 3: Is this really valid? Why? Check the definition.

Task 4: What about the other way around. Are there functions with the empty set as a target as opposed to its source?

Functions and Singleton Sets

There is a unique function from any set to any singleton set.

Function with a singleton set

Task 5: Is this really the only way to connect any set to a singleton set in a valid way?

Task 6: Again, what about the other way around?

Sets and numbers

All numerical operations can be expressed as functions acting on the set of (different types of) numbers.

Number sets

Because not all functions work on all numbers, we separate the set of numbers to several sets, many of which are subsets to one another, such the set of whole numbers $\mathbb{Z} := {… -3 -2, -1, 0, 1, 2, 3… }$, the set of positive whole numbers, (also called “natural” numbers), $\mathbb{N} := {1, 2, 3… }$. We also have the set of Real numbers $\mathbb{R}$, which includes almost all numbers and the set of positive real numbers (or $\mathbb{R}_{>0}$).

Number functions

Each numerical operation is a function between two of these sets. For example, squaring a number is a function from the set of real numbers to the set of real non-negative numbers (because both sets are infinite, we cannot draw them in their entirety, however we can draw a part of them).

The square function

Let’s reiterate some of the more important characteristics of functions:

All numbers in the target have (or should have) two arrows pointing at them (one for the positive square root and one for the negative one), and that is OK.
Zero from the source set is connected to itself in the target set — that is permitted.
Some numbers aren’t the square of any other number — that is also permitted.

Overall everything is permitted, as long as you can always provide exactly one result (also known as The result™) per value. For numerical operations, this is always true, simply because math is designed this way.

Every generalization of number has first presented itself as needed for some simple problem: negative numbers were needed in order that subtraction might be always possible, since otherwise a − b would be meaningless if a were less than b; fractions were needed in order that division might be always possible; and complex numbers are needed in order that extraction of roots and solution of equations may be always possible. — Bertrand Russell, from “Introduction to Mathematical Philosophy”

Note that most mathematical operations, such as addition, multiplication, etc. require two numbers in order to produce a result. This does not mean that they are not functions, it just means they’re a little fancier. Depending on what we need, we may present those operations as functions from the sets of tuples of numbers to the set of numbers, or we may say that they take a number and return a function. More on that later.

Sets and Functions in Programming

Sets are used extensively in programming, especially in their incarnation as types (also called classes). All sets of numbers that we discussed earlier also exist in most languages as types.

Sets and types

Sets are not exactly the same thing as types, but all types are (or can be seen as) sets. For example, we can view the Boolean type as a set containing two elements — true and false.

Set of boolean values

Another very basic set that is used in programming is the set of keyboard characters, or Char.

Set of characters

Characters are actually used rarely by themselves and mostly as parts of sequences. Most of the types that are used in programming are composite types — they are a combination of the primitive ones that are listed here. Again, we will cover these later.

Task 7: What is the type equivalent of subsets in programming?

Functions and methods/subroutines

Some functions in programming (also called methods, subroutines, etc.) kinda resemble mathematical functions &mdash — they take one value of a given type (or in other words, an element that belongs to a given set) and always return exactly one element which belongs to another type (or set). For example, here is a function that takes an argument of type Char and returns a Boolean, indicating whether the character is a letter.

A function from Char to Boolean

However functions in most programming languages can also be quite different from mathematical functions — they can perform various operations that have nothing to do with returning a value. These operations are sometimes called side effects.

Why are functions in programming different? Well, figuring a way to encode effectful functions in a way that is mathematically sound isn’t trivial and at the time when most programming paradigms that are in use today were created, people had bigger problems than the functions not being mathematically sound (e.g. actually being able to run any program at all).

Nowadays, many people feel that mathematical functions are too limiting and hard to use. And they might be right. But mathematical functions have one big advantage over non-mathematical ones — their type signature tells you almost everything about what the function does (this is probably the reason why most functional languages are strongly-typed).

Purely-functional programming languages

We said that while all mathematical functions are also programming functions, the reverse is not true for most programming languages. However, there are some languages that only permit mathematical functions, and for which this equality holds. They are called purely-functional programming languages.

Such languages don’t support functions that perform operations like rendering stuff on screen, doing I/O, etc. (in this context, such operations are called “side effects”.

In purely functional programming languages, such operations are outsourced to the language’s runtime. Instead of writing functions that directly perform a side effect, for example console.log('Hello'), we write functions that return a type that represents that side effect (for example, in Haskell side effects are handled by the IO type) and the runtime then executes those functions for us.

We then link all those functions into a whole program, often by using a thing called continuation passing style.

Functional Composition

Now, we were just about to reach the heart of the matter regarding the topic of functions. And that is functional composition. Assume that we have two functions, $g: Y → P$ and $f: P → G$ and the target of the first one is the same set as the source of the second one.

Matching functions

If we apply the first function $g$ to some element from set $Y$, we will get an element of the set $P$. Then, if we apply the second function $f$ to that element, we will get an element from type $G$.

Applying one function after another

We can define a function that is the equivalent to performing the operation described above. That would be a function such that, if you follow the arrow $h$ for any element of set $Y$ you will get to the same element of the set $G$ as the one you will get if you follow both the $g$ and $f$ arrows. Let us call it $h: Y → G$.

Functional composition

We may say that $h$ is the composition of $g$ and $f$, or $h = f \circ g$ (notice that the first function is on the right, so it’s similar to $b = f(g(a)$).

Composition is the essence of all things categorical. The key insight is that the sum of two parts is no more complex than the parts themselves (and therefore can be summed again).

Task 8: Think about which qualities of a function make composition possible, e.g., does it work with other types of relationships, like many-to-many and one-to-many.

Composition of relationships

To understand how powerful composition is, consider the following: one set being connected to another means that each function from the second set can be transferred to a corresponding function from the first one e.g if we have a function $g: P → Y$ from set $P$ to set $Y$, then for every function $f$ from the set $Y$ to any other set, there is a corresponding function $f \circ g$ from the set $P$ to the same set. In other words, every time you define a new function from $Y$ to some other set, you gain one function from $P$ to that same set for free.

Functional composition connect

For example, if we again take the relationship between a person and his mother as a function with the set of all people in the world as source, and the set of all people that have children as its target, composing this function with other similar functions would give us all relatives on a person’s mother side.

Although you might be seeing functional composition for the first time, the intuition behind it is there — we all know that each person whom our mother is related to is automatically our relative as well — our mother’s father is our grandfather, our mother’s partner is our father, etc.

Composition in engineering

Besides being useful for analyzing relationships that already exist, the principle of composition can help you in the practice of building objects that exhibit such relationships i.e. engineering.

One of the main ways in which modern engineering differs from ancient craftsmanship is the concept of a part/module/component - a product that performs a given function that is not made to be used directly, but is instead optimized to be combined with other such products in order to form a “end-user” product. For example, an espresso machine is just a combination of the components, such as , pump, heater, grinder group etc, when composed in an appropriate way.

A espresso machine

Task 9: Think about what would be those functions’ sources and targets.

By the way, diagrams that are “zoomed out” that show functions without showing set elements are called external diagrams (and the ones we used so far are internal).

Composition and external diagrams

Let’s look at the diagram that demonstrates functional composition, in which we showed that successive application of the two composed functions ($f \circ g$) and the new function ($h$) are equivalent. We showed this equivalence by drawing an internal diagram, and explicitly drawing the elements of the functions’ sources and targets in such a way that the two paths are equivalent.

Functional composition

Alternatively, we can just say that the arrow paths are all equivalent (all arrows starting from a given set element ultimately lead to the same corresponding element from the resulting set) and draw the equivalence as an external diagram.

An external diagram, showing functional composition of two functions

Or alternatively, if you want to express it as a formula (where $\circ$ is the composition operator).

An external diagram, showing functional composition of two functions, as a fformula

The external diagram is a more appropriate representation of the concept of composition, as it is more general. In fact, it is so general that it can actually serve as a definition of functional composition.

The composition of two functions $f$ and $g$ is a third function $h$ defined in such a way that all the paths in this diagram are equivalent.

Functional composition - general definition

If you continue reading this book, you will hear more about diagrams in which all paths are equivalent (they are called commuting diagrams).

Associativity

If we want compose more than two functions we might wonder if the order in which we compose the functions matters for the final outcome i.e. whether combining two functions and then combining the result with a third function…

Composing functions (f and g) and c

…would yield the same result as composing the second and the third functions, before adding the first one.

Composing functions f and (g and c)

The answer is yes — as long as the order is maintained, the result would always be the same. This property of functions is called *associativity.

Task 10: Draw the above diagrams as internal diagrams: define three functions that compose with one another (you can use the two functions that we defined earlier, you only would have to make a third one) compose them in the two ways shown above and check if the result is the same.

Category theory — a hint for the definition

At this point you might be worried that I had forgotten that I am supposed to talk about category theory and I am just presenting a bunch of irrelevant concepts. I may indeed do that sometimes, but not right now — the fact that functional composition can be presented without even mentioning category theory doesn’t stop it from being one of category theory’s most important concepts.

In fact, we can say (although this is not an official definition) that category theory is the study of things that are function-like (we call them morphisms) i.e. things that have a source and a target, they compose with one another (associatively) and they can be represented by external diagrams.

Another way of defining category theory (without actually defining it) is saying it is what you get if you replace the concept of equality with the concept of isomorphism. We haven’t talked about isomorphisms yet, but this is what we will be doing for the rest of this chapter.

Isomorphism

One of the relationships that functions can represent is the one-to-one relationship, in which we have one element from the source set, pointing to one element from the target set. But for one-to-one functions the reverse is also true — exactly one element from the target set points to one element from the source.

Opposite colors

If we have a one-to-one-function that connects sets that are of the same size (as is the case here), then this function has the following property: all elements from the target set have exactly one arrow pointing at them. In this case, the function is invertible. That is, if you flip the arrows of the function and its source and target, you get another valid function.

Opposite colors

Invertible functions are called isomorphisms. When there exists an invertible function between two sets, we say that the sets are isomorphic. For example, because we have an invertible function that converts the temperature measured in Celsius to temperature measured in Fahrenheit, and vise versa, we can say that temperatures measured in Celsius and Fahrenheit are isomorphic.

Isomorphism means “same form” in Greek (although actually their form is the only thing which is different between two isomorphic sets).

More formally, two sets $A$ and $B$ are isomorphic (or $A ≅ B$) if there exist functions $f: A \to B$ and its reverse $g: B \to A$, such that $f \circ g = ID_{B}$ and $g \circ f = ID_{A}$.

Notice how the identity function comes in handy.

Isomorphism and identity

If you look closely, you would see that the identity function is invertible too, its reverse is itself, so each set is isomorphic to itself in that way.

The identity function

Therefore, the concept of an isomorphism contains the concept of equality — all equal things are also isomorphic.

Isomorphism and composition

An interesting fact about isomorphisms is that if we have functions that convert a member of set $A$ to a member of set $B$, and the other way around, then, because of functional composition, we know that any function from/to $A$ has a corresponding function from/to $B$.

The architecture of isomorphism

For example, if you have a function “is the partner of” that goes from the set of all married people to the same set, then that function is invertible. That is not to say that you are the same person as your significant other, but rather that every statement about you, or every relation you have to some other person or object is also a relation between them and this person/object, and vice versa.

Composing isomorphisms

Another interesting fact about isomorphisms is that if we have two isomorphisms that have a set in common, then we can obtain a third isomorphism between the other two sets that would be the result of their (the isomorphisms) composition.

Composing two isomorphisms into another isomorphism is possible by composing the two pairs of functions that make up the isomorphism in the two directions.

Composing isomorphisms

Informally, we can see that the two morphisms are indeed reverse to each other and hence form an isomorphism.

Let’s prove this fact formally:

Given that if two functions are reverses of each other, then their composition is equal to an identity function, proving that functions $g \circ f$ and $f’ \circ g’$, are reverses is equivalent to proving that their composition is equal to identity.

$g \circ f \circ f’ \circ g’ = id$

But we know already that $f$ and $f’$ are reverses and hence $f\circ f’ = id$, so the above formula is equivalent to (you can reference the diagram to see what that means):

$g \circ id \circ g’ = id$

And we know that anything composed with $id$ is equal to itself, so it is equivalent to:

$g \circ g’ = id$

which is true, because $g$ and $g’$ are reverses and reverse functions composed are equal to identity.

By the way, there is another way to obtain the isomorphism — by composing the two morphisms one way in order to get the third function and then taking its reverse. But to do this, we have to prove that the function we get from composing two one-to-one functions is also one-to-one.

Isomorphisms Between Singleton Sets

Between any two singleton sets, we may define the only possible function.

The only possible function between singletons

The function is invertible, which means that all singleton sets are isomorphic to one another, and furthermore (which is important) they are isomorphic in one unique way.

Isomorphic singletons

Following the logic from the last paragraph, each statement about something that is one of a kind can be transferred to a statement about another thing that is one of a kind.

Equivalence relations and isomorphisms

We said that isomorphic sets aren’t necessarily the same set (although the reverse is true). However, it is hard to get away from the notion that being isomorphic means that they are equal or equivalent in some respect. For example, all people who are connected by the isomorphic mother/child relationship share some of the same genes.

And in computer science, if we have functions that convert an object of type $A$ to an object of type $B$ and the other way around, we also can pretty much regard $A$ and $B$ as two formats of the same thing, as having one means that we can easily obtain the other.

Equivalence relations

What does it mean for two things to be equivalent? The question sounds quite philosophical, but there is actually is a formal way to answer it, i.e., there is a mathematical concept that captures the concept of equality in a rather elegant way — the concept of an equivalence relation.

We already know what a relation is — it is a connection between two sets (an example of which is function). And a relation is an equivalence relation, when it follows three laws, which correspond to three intuitive ideas about equality. Let’s review them.

Reflexivity

The first idea that defines equivalence, is that everything is equivalent with itself.

Reflexivity

This simple principle translates to the equally simple law of reflexivity: for all sets $A$, $A=A$.

Transitivity

According to the Christian theology of the Holy Trinity, the Jesus’ Father is God, Jesus is God, and the Holy Spirit is also God, however, the Father is not the same person as Jesus (neither is Jesus the Holy Spirit). If this seems weird to you, that’s because it breaks the second law of equivalence relations, transitivity. Transitivity is the idea that things that are both equal to a third thing must also equal between themselves.

Transitivity

Mathematically, for all sets $A$ $B$ and $C$, if $A=B$ and $B=C$ then $A=C$.

Note that we don’t need to define what happens in similar situations that involve more than three sets, as they can be settled by just multiple application of this same law.

Symmetry

If one thing is equal to another, the reverse is also true, i.e, the other thing is also equal to the first one. This idea is called symmetry. Symmetry is probably the most characteristic property of the equivalence relation, which is not true for almost any other relation.

symmetry

In mathematical terms: if $A=B$ then $B=A$.

Isomorphisms as equivalence relations

Isomorphisms are indeed equivalence relations. And “incidentally”, we already have all the information needed to prove it (in the same way in which James Bond seems to always incidentally have exactly the gadgets that are needed to complete his mission).

We said that the most characteristic property of the equivalence relation is its symmetry. And this property is satisfied by isomorphisms, due to the isomorphisms’ most characteristic property, namely the fact that they are invertible.

Symmetry of isomorphisms

Task 11: One law down, two to go: Go through the previous section and verify that isomorphisms also satisfy the other equivalence relation laws.

The practice of using isomorphisms to define an equivalence relation is very prominent in category theory where isomorphisms are denoted with $≅$, which is almost the same as $=$ (and is also similar to having two opposite arrows connecting one set to the other).

Addendum: The case of composition in software development

An unstructured monolithic design is not a good idea, except maybe for a tiny operating system in, say, a toaster, but even there it is arguable.— Andrew S. Tanenbaum

Software development is a peculiar discipline — in theory, it should be just some sort of engineering, however the way it is executed in practice is sometimes closer to craftsmanship, with the principle of composition not being utilized to the fullest.

To see why, imagine a person (e.g. me), tinkering with some sort of engineering problem e.g. trying to fix a machine or modify it to serve a new purpose. If the machine in question is mechanical or electrical, this person will be forced to pretty much make due with the components that already exist, simply because they can rarely afford to manufacture new components themselves (or at least they would avoid it if possible). This limitation, forces component manufacturers to create components that are versatile and that work well together, in the same way in which pure functions work well together. And this in turn makes it easy for engineers to create better machines without doing all the work themselves.

But things are different if the machine in question is software-based — due to the ease with which new software components can be rolled out, our design can blur the line that separates some of the components or even do away with the concept of component altogether and make the whole program one giant component (monolithic design). Worse, when no ready-made components are available, this approach is actually easier than the component-based approach that we described in the previous paragraph, and so many people use it.

This is bad, as the benefits of monolithic design are mostly short-term — not being separated to components makes programs harder to reason about, harder to modify (e.g. you cannot replace a faulty component with a new one) and generally more primitive than component-based programs. For these reasons, I think that programmers are losing out if they are not utilizing the principles of functional composition. In fact, I was so unhappy with the situation that I decided to write a whole book on applied category theory to help people understand the principles of composition better, it’s called Category Theory Illustrated (Oh wait, I am writing that right now, aren’t I?)

Anyway, the composition approach is sometimes used in programming, and when it is used, it tends to work rather well. To see some examples, you don’t need to look further than the pipe operator in Unix (|), which feeds the standard output of a program into the standard input of another program.

If you want to look further, look at the Haskell programming language, or some of the numerous libraries for functional programming for other languages. There is also a whole programming paradigm based on functional composition, called “concatenative programming” utilized in languages like Forth and Factor.

From Sets to Categories

In this chapter, we will see some more set-theoretic constructs, but we will also introduce their category-theoretic counterparts in an effort to gently introduce the concept of a category itself.

When we are finished with that, we will try (and almost succeed) to define categories from scratch, without actually relying on set theory.

Products

In the previous chapter, we needed a way to construct a set whose elements are composite of the elements of some other sets e.g. when we discussed mathematical functions, we couldn’t define $+$ and $-$ because we could only formulate functions that take one argument. Similarly, when we introduced the primitive types in programming languages, like Char and Number, we mentioned that most of the types that we actually use are composite types. So how do we construct those?

So, consider the set $A$ (containing $a$’s) and the set $B$ (containing $b$’s) Product parts

The Cartesian product (or tuple) of sets $A$ and $B$ (denoted $A \times B$) is the set of ordered pairs that contain one element of the set $A$ and one element of the set $B$. Or formally speaking: $A \times B = { (a, b) }$ where $a ∈ A, b ∈ B$ ($∈$ means “is an element of”).

Product

Task 1: Why is this called a product? Hint: How many elements does it have?

Naturally, the product comes equipped with two functions, one for each property, which take a pair and extracts the value of the property, so $C \to A$ and $C \to B$, called the product’s projections (in programming terms, we would dub these the “getters”) — the functions for retrieving back it’s constituent values.

Product

Triple product

There are occasions where we want to combine not two, but three sets into a product (e.g. $A \times B \times C$). We can achieve that by combining the first and second one into a product and then combining their product with the third set, (so it will be $(A \times B) \times C$.

Triple product

There is another way to make a triple product of three sets — combining the second and the third one and then combining the result with the first one (so $A \times (B \times C)$, but it doesn’t actually matter which one you use, as the end results would be isomorphic $(A \times B) \times C \cong A \times (B \times C)$.

Triple product

You might recognize this isomorphism, from the definition of functional composition. It means that the Cartesian product operation is (like functional composition), associative.

Products as Objects

In the previous chapter, we established the correspondence of various concepts in programming languages and set theory — sets resemble types, and functions resemble methods/subroutines. This picture is made complete with products, that are like stripped-down classes (also called records or structs) — the sets that form the product correspond to the class’s properties (also called members) and the functions for accessing them are like what programmers call getter methods e.g. the famous example of object-oriented programming of a Person class with name and age fields is nothing more than a product of the set of strings, and the sets of numbers. And objects with more than two values can be expressed as compositions of nested products e.g. a record with 3 members $a$, $b$ and $c$ could be expressed as nested tuples ($a$, ($b$, $c$)), or more formally $a \times b \times c$.

Using Products to Define Numeric Operations

Products can also be used for expressing functions that take more than one argument (and this is indeed how multi-param functions are implemented in languages that actually have tuples, like the ones from the ML family). For example, “plus” is a function from the set of products of two numbers to the set of numbers, so, $+: \mathbb{Z} \times \mathbb{Z} → \mathbb{Z}$.

The plus function

By the way, such functions (ones that take two objects of one type and return a third object of the same type) are called operations.

Defining products in terms of sets

A product is, as we said, a set of ordered pairs (formally speaking $A \times B ≠ B \times A$). So, to define a product we must define the concept of an ordered pair. So how can we do that?

A pair

Note that an ordered pair of elements is not just a set containing the two elements (that would be an _ unordered pair_) but it also contains information about which of those objects comes first and which one goes second in the pair. In programming, we have the ability to assign names to each member of an object, which accomplishes the same purpose.

The order of elements in the pair is important. While some mathematical operations (such as addition) don’t care about order, others (such as subtraction) do. In programming, when we manipulate an object we obviously want to access a specific property of the object, not just any random property.

So does that mean that we have to define ordered pairs as a “primitive” type like we defined sets if we want to use them? That’s possible, but there is another approach if we can define a construct that is isomorphic to the ordered pair, using only sets, we can use that construct instead of them. And mathematicians have come up with multiple ingenious ways to do that. Here is the first one, which was suggested by Norbert Wiener in 1914. Note the smart use of the fact that the empty set is unique.

A pair, represented by sets

The next one was suggested by Felix Hausdorff in the same year. In order to use that one, we just have to define $1$, and $2$ first.

A pair, represented by sets

Suggested in 1921 by Kazimierz Kuratowski, this one uses just the component of the pair.

A pair, represented by sets

Defining products in terms of functions

The product definitions presented in the previous section worked by zooming in into the individual elements of the product and seeing what they are made of. We may think of this as a low-level approach to the definition. This time (and throughout the most of this book) we will do the opposite — we will try to be as oblivious to the contents of our sets as possible i.e. instead of zooming in we will zoom out and attempt to fly over the difficulties that we met in the previous section by providing a definition of a product in terms of functions and external diagrams.

To define products in terms of external diagrams, we must, given two sets, devise a way to pinpoint the set that is their product, by looking at the functions that come from/to them.

And what are the functions are guaranteed to exist for all products? Of course that would be the projections, the functions for retrieving back the two elements of the product $A \times B \to A$ and $A \times B \to B$. What would a product be without them?

Product

Now if we switch to the (semi) external view, this diagram already provides some definition of what a product is: if we have an object $C$ for which there are functions $C \to A$ and $C \to B$, then $C$ can potentially be the product of $A$ and $B$ ($A \times B$).

Product, external diagram

However, this definition is not complete, as the product $A$ and $B$, is not the only set for which such functions can be defined. For example, a set of triples (which is like a product, but has three elements) $A \times B \times X$ for any element $X$ also qualifies. Any other set that would happen to have some functions to $A$ and $B$ would, by this definition, be “impostor products”.

Product, external diagram

Upon further inspection we discover that there are ways to expose those impostors. Take the set of triples, $A \times B \times X$ as an example. Looking at the canonical function that converts a triple to a product $A \times B \times X \to A \times B$, we realize that $A \times B \times X$ is only connected to $A$ and $B$ because of this function. That is, if we dub it $g: A \times B \times X \to A \times B$ and let $f^{1}$ and $f^{2}$ be the arrows for retrieving elements of a product ($f^{1} : A \times B \to A$ and $f^{2} : A \times B \to B$), then, the arrow that connects the triple $A \times B \times X$ to $A$ and $B$ are just the compositions $f^{1} g$ and $f^{2}g$.

Product, external diagram

We claim that the same reasoning applies to all other objects that can take the place of our product — they all are connected to the product, and their getters are just the result of this connection. Why? Intuitively, all such objects would be more complex than the product and you can always have a function that converts a more complex structure to a simpler one by just throwing information away (in the same way in which we threw the third element of the triple).

Product, external diagram

More formally, if we suppose that there is a set $I$ that can serve as an impostor of the product of sets $A$ and $B$ (i.e. that $I$ is such that there exists two functions $I \to A$ and $I \to B$, then there must also exist a unique function with the type signature $g: I \to A \times B$, that converts the impostor product to the real product, such that the above two functions would be just the composition of $g$ with the usual “getter” functions of the product ($f^{1} : A \times B \to A$ and $f^{2} : A \times B \to B$). In other words, whichever object we pick for $I$, this diagram would commute (oh no, not this diagram again).

Product, universal property

You would see a lot of similar diagrams in this book. In category theory, we often (always) define properties that a given object might possess, by defining a structure such that all similar objects can be converted to it. This is what we call a universal property, but it is too early to go into more detail, (after all we haven’t even yet said what category theory is).

Isomorphism and equality

One thing that we should point out, is that this definition (as all the previous ones, by the way) does not rule out the sets which are isomorphic to the product. When we represent things using universal properties, an isomorphism is treated as equality.

This is the same viewpoint that we adopt in programming, especially when we work on the higher level — there might be many different implementations of pair, but as long as they work in the same way (i.e. we can convert one to the other and vice versa) they are all the same to us.

Sums

We will now study a construct that is pretty similar to the product but at the same time is very different. Similar because, like the product, it is a relation between two sets which allows you to unite them into one, without erasing their structure. But different as it encodes a very different type of relation — a product encodes an and relation between two sets, while the sum encodes an or relation.

The sum of two sets $B$ and $Y$, denoted $B + Y$ is a set that contains all elements from the first set combined with all elements from the second one.

Sum or coproduct

We can immediately see the connection with the or logical structure: For example, because a parent is either a mother or a father of a child, the set of all parents is the sum of the set of mothers and the set of fathers, or $P = M + F$.

Defining Sums in Terms of Sets

As with the product, representing sums in terms of sets is not so straightforward e.g. when a given object is an element of both sets, then it appears in the sum twice which is not permitted, because a set cannot contain the same element twice.

As with the product, the solution is to put some extra structure.

A member of a coproduct

And, as with the product, there is a low-level way to express a sum using sets alone. Incidentally, we can use pairs.

A member of a coproduct, examined

Defining sums in terms of functions

As you might already suspect, the interesting part is expressing the sum of two sets using functions. To do that, we have to go back to the conceptual part of the definition. We said that sums express an or relation between two things.

A property of every or relation is that if something is an $A$ that something is also an $A \vee B$ (The $\vee$ symbol means or by the way). For example, if my hair is brown, then my hair is also either blond or brown. This is what or means, right? This property can be expressed as a function, two functions actually — one for each set that takes part in the sum relation (for example, if parents are either mothers or fathers, then there surely exist functions $mothers → parents$ and $fathers → parents$).

Coproduct, external diagram

As you might have already noticed, this definition is pretty similar to the definition of the product from the previous section — the difference being reversed arrows. And the similarities don’t end here. As with products, we have sets that can be thought of as impostor sums — ones for which these functions exist, but which also contain additional information.

Coproduct, external diagram

All these sets express relationships which are more vague than the simple sum, and therefore given such a set, there would exist a unique function that would distinguish it from the true sum. The only difference is that, unlike the functions that define products, this time this function goes from the sum to the impostor.

Coproduct, external diagram

Interlude: Categorical Duality

The concepts of product and sum might already look similar in a way when we view them through their internal diagrams. The external view makes this similarity precise — these two diagrams are one and the same diagram, only their arrows are flipped — many-to-one relationships become one-to-many and the other way around.

Coproduct and product

The universal properties that define the two constructs are the same as well — if we have a sum $A + B$, for each impostor sum, such as $A + B + X$, there exists a trivial function $A + B \to A + B + R$.

And, if you remember, with products the arrows go the other way around — the equivalent example for a product would be the function $A \times B \times R \to A \times B $

This fact uncovers a deep connection between the concepts of the product and sum, which is not otherwise apparent — they are each other’s opposites. Product is the opposite of sum and sum is the opposite of product.

In category theory, concepts that have such a relationship are said to be dual to each other. So, the concepts of product and sum are dual. This is why sums are known in a category-theoretic setting as converse product, or coproduct for short. This naming convention is used for all dual constructs in category theory.

Defining the rest of set theory using functions

So far in the book, we saw some amazing ways of defining set-theoretic constructs without looking at the set elements and by only using functions and external diagrams.

In the first chapter, we defined functions and functional composition with this diagram.

Functional composition

And now, we also defined products and sums.

Coproduct and product

What’s even more amazing, is that we can define all of set-theory, based just on the concept of functions, as discovered by the category theory pioneer Francis William Lawvere.

Defining set elements using functions

Traditionally, everything in set theory is defined in terms of two things: sets and elements, so, if we want to define it using sets and functions, we must define the concept of a set element in terms of functions.

To do so, we will use the singleton set.

The singleton set

OK, let’s start by taking a random set which we want to describe.

A set of three elements

And let’s examine the functions from the singleton set, to that random set.

Functions from the singleton set

It’s easy to see that there would be exactly one function for each element of the set i.e. that each element of any set $X$ is isomorphic to a function $1 \to X$ (where $1$ means the singleton set).

So, we can say that what we call “elements” of a set are the functions from the singleton set to it.

Defining the singleton set using functions

Now, after coming up with a definition of a set element, based on functions, we can try to draw the elements of our set as an external diagram.

Functions from the singleton set

However, our diagram is not yet fully external, as it depends on the idea of the singleton set, i.e. the set with one element. Furthermore, this makes the whole definition circular, as we cannot define the concept of a one-element set, without the concept of element.

To avoid these difficulties, we devise a way to define the singleton set, using just functions. We do it in the same way that we did for products and sums - by using a unique property that the singleton set has. In particular, there is exactly one function from any other set to the singleton set i.e. if $1$ is the singleton set, then we have exactly one function $X \to 1$ for all objects $X$ i.e. $\forall X \exists! (X \to 1)$ (where $\exists!$ means “Exists unique”).

Terminal object

It turns out that this property defines the singleton set uniquely i.e. there is no other set that has it, other than the sets that are isomorphic to the singleton set. This is simply because, if there are two sets that have it, those two sets would also have unique functions between themselves so they would be isomorphic to one another. More formally, if we have two sets $X$ and $Y$ such that $\exists!X \to 1 \land \exists!Y \to 1$ and they both hold this property (“exactly one function from any other set to this set”) then we also have $X \cong Y$.

Terminal object

And because there is no other set, other than the singleton set that has this property, we can use it as a definition of the singleton set and say that if we have $\forall X \exists! X \to 1$, then $1$ is the singleton set.

Terminal object

With this, we acquire a fully external definition (up to an isomorphism) of the singleton set, and thus a definition of a set element — the elements of a given set are just the functions from the singleton set to that set.

Functions from the singleton set

Note that from this property it follows that the singleton set has exactly one element, which confirms that our definition is correct.

Functions from the singleton set

Task 2: Why exactly does it follow (check the definition)?

Defining the empty set using functions

The empty set is, of course, the set that has no elements, but how would we say this without referring to elements?

In the previous chapter, we noted an interesting property of the empty set:

there is a unique function from the empty set to any other set.

And, again, since the empty set is the only set that has this property, we can reverse the above statement and use it as a definition:

the empty set is the set such that there exists a function from it to any other set.

Task 3: why are the functions to the empty set unique?

Initial object

Observant readers will notice the similarities between the diagrams depicting the initial and terminal object (yes the two concepts are, of course, dual of each other).

Initial terminal duality

Some even more observant readers (folks, keep it down please, you are too observant) may also notice the similarities between the product/coproduct diagrams and the initial/terminal object diagrams.

Coproduct and product

The similarity of the diagrams, is due to a similar general approach of defining things — in both cases we find the property that makes a given concept useful and then define the concept so it has this property*.

Functional application

After seeing the functional definition of set elements, we might be inclined to ask the following: If elements are represented by functions, then how do you apply a given function to an element of a set, (and retrieve an element of another set)?

The answer is surprisingly simple — selecting an element from a set is the same as constructing a function from the singleton set to that element.

Functional application - internal diagram

And then applying a function to an element is the same as composing the element function, with the function we want to apply.

Functional application - external diagram

The result is the function that represents the element returned by the applied function.

Conclusion

This was a taste of Lawvere’s Elementary Theory of the Category of Sets (ETCS) which constitutes a rigorous definition of set theory (equivalent to ZFC set theory) using only the concept of a function.

We can cover this theory in it’s entirety, listing all axioms that are needed, but for now it is probably more important to understand why do we need in the first place?

The short answer: because it is more general than the traditional definition, this new definition also applies to objects that are not exactly sets but are like sets in some respects.

You may say that they apply to entirely different categories of objects (nudge, nudge).

Categories briefly

Maybe it is about time to see what a category is. Here is a short definition: a category consists of objects (an example of which are sets) and morphisms that go from one object to another (which behave as functions) and that are composable. We can say a lot more about categories, and even present a formal definition, but for now, it is sufficient for you to remember that sets are one example of a category and that categorical objects are like sets, except that we don’t see their elements i.e. category-theoretic notions are captured by the external diagrams, while strictly set-theoretic notions can be captured by internal ones.

Category theory and set theory compared

When we are within the realm of sets, we can view each set as a collection of individual elements. In category theory, we don’t have such a notion. However, taking this notion away allows us to define concepts such as the sum and product sets in a whole different and more general way. Plus we always have a way to “go back” to set theory, using the tricks from the last section.

Category Theory	Set theory	Programming Languages
Category	N/A	N/A
Objects and Morphisms	Sets and Functions	Classes and methods
N/A	Element	Object

Notice the somehow weird, (but actually completely logical) symmetry (or perhaps “reverse symmetry”) between the world as viewed through the lenses of set theory, and the way it is viewed through the lens of category theory:

Category Theory	Set theory
Category	N/A
Objects and Morphisms	Sets and functions
N/A	Element

By switching to external diagrams, we lose sight of the particular (the elements of our sets), but we gain the ability to zoom out and see the whole universe where we have been previously trapped. In the same way that the whole realm of sets can be thought of as one category, a programming language can also be thought of as a category. The concept of a category allows us to find and analyze similarities between these and other structures.

NB: The word “Object” is used in both programming languages and in category theory, but has completely different meanings. A categorical object is equivalent to a type or a class in programming language theory.

Sets VS Categories

One remark before we continue: in the last section, we may have made it seem like category theory and set theory are somehow competing with each other. Perhaps that notion would be somewhat correct if category and set theory were meant to describe concrete phenomena, in the way that the theory of relativity and the theory of quantum mechanics are both supposed to explain the physical world. Concrete theories are conceived mainly as descriptions of the world, and as such it makes sense for them to be connected in some sort of hierarchy.

In contrast, abstract theories, like category theory and set theory, are more like languages for expressing such descriptions — they still can be connected, and are connected in more than one way, but there is no inherent hierarchical relationship between the two and therefore arguing over which of the two is more basic, or more general, is just a chicken-and-egg problem, as you will see in the next chapter.

Categories (again)

“…deal with all elements of a set by ignoring them and working with the set’s definition.” — Dijkstra (from “On the cruelty of really teaching computing science”)

All category theory books, including this one, start by talking about set theory. Looking back, I really don’t know why this is the case — books that focus on a given subject usually don’t start off by introducing an entirely different subject, (before even starting to talk about the main one). Perhaps the set-first approach is the best way to introduce people to categories. Or perhaps using sets to introduce categories is one of those things that people do just because everyone else does it. But, one thing is for certain — we don’t need to study sets in order to understand categories. So now I would like to start over and talk about categories as a foundational concept. So let’s pretend like this is a new book (I wonder if I can dedicate this to a different person).

A category is a collection of objects (things) where the “things” can be anything you want. Consider, for example, these ~~colorful~~ gray balls:

Balls

A category consists of a collection of objects as well as some arrows connecting objects to one another. We call the arrows morphisms (for now you can think of them as functions).

A category

Wait a minute, we said that all sets form a category, but at the same time, any one set can be seen as a category in its own right (just one which has no morphisms). This is true and very characteristic of category theory — one structure can be examined from many different angles and may play many different roles, often in a recursive fashion.

This particular equivalence (a set as a category with no morphisms) is, however, rarely useful. Not because it’s incorrect in any way, but rather because category theory is all about the morphisms — if the arrows in set theory are nothing but a connection between the sets that serve as their source and a destination, in category theory it’s the objects that are nothing but a source and destination for the arrows that connect them to other objects. This is why, in the diagram above, the arrows, and not the objects, are colored: if you ask me, the category of sets should really be called the category of functions.

Speaking of which, note that objects in a category can be connected by multiple arrows and that having the same source and target sets does not in any way make arrows equivalent.

Two objects connected with multiple arrows

Why that is true is pretty obvious if we go back to set theory for a second (OK, maybe we really have to do it from time to time). There are, for example, an infinite number of functions that go from number to boolean, and the fact that they have the same input type and the same output type (or the same type signature, as we like to say) does not in any way make them equivalent to one another.

Two sets connected with multiple functions

There are some types of categories that have only one morphism between two objects (in each direction), but we will talk about them later.

Composition

The most important requirement for a structure to be called a category is that two morphisms can make a third, or in other words, that morphisms are composable.

Given three objects and two successive arrows with between them, we can make a third arrow (in set theory, it is equivalent to the consecutive application of the first two).

Composition of morphisms

Formally, this requirement says that there should exist an operation, usually denoted with the symbol $\circ$ such that for each pair of morphisms $g: A \to B$ and $f: B \to C$, there exists a morphism $(f \circ g): A \to C$.

Composition of morphisms in the context of additional morphism

If you remember, in set theory, we picked functions, as opposed to the other types of relations because they are composable. Here we just invent the concept of a morphism and define it to be composable (in the same way as we invented the (co)products and later the empty and singleton set). Let’s see where this definition gets us.

NB: Note, that functional composition is read from right to left. e.g. applying $g$ and then applying $f$ is written $f \circ g$ and not the other way around. (You can think of it as a shortcut to $f(g(a))$). Some may find it useful to pronounce “$\circ$” as “after”, e.g. $f \;\text{after}\; $g.

The law of identity

To have numbers, you have to have a zero. The zero of category theory is what we call the “identity morphism” for each object. In short, this is a morphism that doesn’t do anything.

The identity morphism (but can also be any other morphism)

It’s important to mark this morphism because there can be (let’s again add this very important, and by now probably also very boring, reminder) many morphisms that go from one object to the same object. For example, in the category of sets, we deal with a multitude of functions that have the set of numbers as source and target, such as $\operatorname{negate}$, $\operatorname{square}$, $\operatorname{add\ one}$, and are not at all the identity morphism.

A structure must have an identity morphism for each object in order for it to be called a category — this is known as the law of identity.

Task 4: What is the identity morphism in the category of sets?

The law of associativity

Composition is special not only because you can take any two morphisms with appropriate signatures and make a third, but because you can do so indefinitely, i.e. for each $n$ successive arrows, each of which has as a source object the target object of the previous, we can draw one (exactly one) arrow that is equivalent to the consecutive application of all $n$ arrows.

Composition of morphisms with many objects

If we carefully review the definition above, we can see that it can be reduced to multiple applications of the following formula: given 3 objects and 2 morphisms between them $f$ $g$ $h$, combining $h$ and $g$ and then combining the end result with $f$ should be the same as combining $h$ to the result of $g$ and $f$ (or simply $(h \circ g) \circ f = h \circ (g \circ f)$).

This formula can be expressed using the following diagram, which would only commute if the formula is true (given that all our category-theoretic diagrams are commutative, we can say, in such cases, that the formula and the diagram are equivalent).

Composition of morphisms with many objects

This formula (and the diagram) is the definition of a property called associativity. Being associative is required for functional composition to really be called functional composition (and thus for a category to really be called a category). It is also required for our diagrams to work, as diagrams can only represent associative structures (imagine if the diagram above would not commute, that would be super weird).

Associativity is not just about diagrams. For example, when we express relations using formulas, associativity just means that brackets don’t matter in our formulas (as evidenced by the definition $(h \circ g) \circ f = h \circ (g \circ f)$).

And it is not only about categories either, it is a property of many other operations on other types of objects as well e.g. if we look at numbers, we can see that the multiplication operation is associative e.g. $(1 \times 2) \times 3 = 1 \times (2 \times 3)$. While division is not $(1 / 2) / 3 \neq 1 / (2 / 3)$.

Commuting diagrams

The diagrams above use colours to illustrate the fact that the green morphism is equivalent to the other two (and not just some unrelated morphism), but in practice this notation is a little redundant, as the only reason to draw diagrams in the first place is to represent paths that are equivalent to each other. All other paths would just belong in different diagrams.

Composition of morphisms - a commuting diagram

As we mentioned briefly in the last chapter, all diagrams that are like that (ones in which any two paths between two objects are equivalent to one another) are called commutative diagrams (or diagrams that commute). All diagrams in this book (except the incorrect ones, nudge nudge) commute.

More formally, a commuting diagram is a diagram in which given two objects $a$ and $b$ and two sequences of morphisms between those two objects, we can say that those sequences are equivalent.

The diagram above is one of the simplest commuting diagrams.

NB: Despite the fact that all diagrams in books commute, in general, not all diagrams commute. That is, there are many morphisms with the same type signature that are not equivalent to one another.

Summary

For future reference, let’s restate what a category is:

A category is a collection of objects (we can think of them as points) and morphisms (or arrows) that go from one object to another, where:

Each object has to have the identity morphism.
There should be a way to compose two morphisms with an appropriate type signature into a third one, in a way that is associative.

This is it.

Addendum: Why are categories like that?

Why are categories defined by those two laws and not some other two (or one, three, four etc.). laws? From one standpoint, the answer to that seems obvious — we study categories because they work. I mean, look at how many applications there are… But at the same time, category theory is an abstract theory, so everything about it is kinda arbitrary: you can remove a law — and you get another theory that looks similar to category theory (although it might actually turn out to be quite different in practice). Or you can add one more law and get yet another theory (there are indeed such laws and such theories, and we will cover them later). So if this specific set of laws works better than any other, then this fact demands an explanation. Not a mathematical explanation (e.g. we cannot in any way prove that this theory is better than some other one), but an explanation nevertheless. What follows is my attempt to provide such an explanation, regarding the laws of identity and associativity.

Identity and isomorphisms

The reason the identity law is required is by far the more obvious one. Why do we need to have a morphism that does nothing? It’s because morphisms are the basic building blocks of our language, we need the identity morphism to be able to speak properly. For example, once we have the concept of identity morphism defined, we can define a category-theoretic definition of an isomorphism, based on it (which is important, because the concept of an isomorphism is very important for category theory).

As we said in the previous chapter, an isomorphism between two objects ($A$ and $B$) consists of two morphisms — ($A → B$ and $B → A$) such that their compositions are equivalent to the identity functions of the respective objects. Formally, objects $A$ and $B$ are isomorphic if there exist morphisms $f: A → B$ and $g: B → A$ such that $f \circ g = ID_{B}$ and $g \circ f = ID_{A}$.

And here is the same thing expressed with a commuting diagram.

Isomorphism

Like the previous one, the diagram expresses the same (simple) fact as the formula, namely that going from one object ($A$ or $B$) to the other and then back again to the starting object is the same as applying the identity morphism i.e. doing nothing.

Associativity and reductionism

If, in some cataclysm, all of scientific knowledge were to be destroyed, and only one sentence passed on to the next generations of creatures, what statement would contain the most information in the fewest words? I believe it is the atomic hypothesis (or the atomic fact, or whatever you wish to call it) that all things are made of atoms—little particles that move around in perpetual motion, attracting each other when they are a little distance apart, but repelling upon being squeezed into one another. In that one sentence, you will see, there is an enormous amount of information about the world, if just a little imagination and thinking are applied. — Richard Feynman

Associativity — what does it mean and why is it there? In order to tackle this question, we must first talk about another concept — the concept of reductionism:

Reductionism is the idea that the behaviour of complex phenomena can be understood in terms of a number of simpler and more fundamental phenomena. In other words, that things keep getting simpler and simpler as they get “smaller” (or when they are viewed from a lower level). An example of reductionism is the idea that the behaviour of matter can be understood completely by studying the behaviours of its constituents i.e. atoms (the word means “undividable”).

Whether the reductionist view is universally valid, i.e. whether it is possible to devise a theory of everything that describes the whole universe with a set of very simple laws, is a question over which we can argue until that universe’s inevitable collapse. What is certain, though, is that reductionism underpins all our understanding, especially when it comes to science and mathematics — each scientific discipline is based on a set of simple fundaments (e.g. elementary particles in particle physics, chemical elements in chemistry etc.) on which it builds its much more complex theories.

Commutativity

So, if this principle is so important, it would be beneficial to be able to formalize it (i.e. to translate it into mathematical language), and this is what we will try to do now. One way to state the principle of reductionism is to say that each thing is nothing but a sum of its parts i.e. if we combine the same set of parts, we always get the same result. To formalize that, we get a set of objects (balls) and a way to combine them (which we will denote with a dot).

So, if we have a given “recipe”, for example

Commutativity

Then, we would also have

Commutativity

Or quite simply

Commutativity

Incidentally, this is the definition of a mathematical law called commutativity.

A simple context where this law applies — the natural numbers are commutative under the operation of addiction, e.g. 1 + 2 = 2 + 1 (we will learn more about this in the chapter on groups).

Task 5: If our objects are sets, what set operations can play the part of the dot in this example (i.e. which ones are commutative)?

Associativity

Sometimes we observe phenomena that still can be represented as a combination of a given set of fundaments, but only when they are combined in a specific way (as opposed to any combination, as in commutative contexts) e.g. as any mechanic can confirm, a bicycle is indeed just the combination of wheels and frameset…

bikes are not commutative

…but that does not mean that every combination of the above parts constitutes a bicycle. For example, placing the front wheel of the bicycle on the rear end of the frame would result would not result in a working bicycle. And combining two wheels one with another is just not possible.

bikes are not commutative

And, to take a formal example, if function A can be combined with B to get C…

functions are not commutative

…would not automatically mean that B can be combined with A to get the same result

functions are not commutative - 2

Side note: composing any function with any other is only possible for functions that have the same set, both as source and target, but even then the end result would not always be the same.

functions are sometimes commutative - 2

Anyway, we determined that functional composition (as well as bike building) does not obey the law of commutativity i.e. not all functions (or bike parts) compose with all other ones. But still, we know that some of them do, i.e. if you align the function signatures (or in the case of the bicycle if you use the proper parts in the proper places) they would come together seamlessly. So, the order by which we combine the individual parts doesn’t matter for the final outcome e.g. when we are assembling a bicycle, it doesn’t matter if we attach the front wheel to the frame first, or the back wheel, the result will be the same.

bikes are associative

Similarly, when combining functions, each pair of functions can, at any time, be replaced by the function that we get by combining them.

A -> X . (X -> B . B -> C) = D, (A -> X . X -> B) . B -> C = D

In other words, all different ways of combining a given set of functions at the end converge in into one and the same result.

A . (B . C) = D, (A . B) . C = D

This fact is captured by a more restrictive version of commutativity, that we call associativity, which for functions, is usually formulated like this.

A -> X . (X -> B . B -> C) = (A -> X . X -> B) . B -> C

Or more generally (for any operation).

A . (B . C) = D, (A . B) . C = D

This is the essence of associativity — it gives us the ability to study complex phenomenon by zooming in on a part that you want to examine in a given moment, and looking at it in isolation.

Note that the operator we defined only allows for combining things in one dimension (you can attach a thing left and right, but not up or down). Later we will learn about an extension of the concept of a category theory (called monoidal category) that “supports” working in 2 dimensions.

Task 6: Actually, in this chapter, we already defined a case of 2-dimentional composition, only we didn’t say it is 2-dimentional composition. Did you see it?

Monoids etc

Since we are done with categories, let’s look at some other structures that are also interesting — monoids. Like categories, monoids/groups are abstract systems consisting of a set of elements and operations for manipulating these elements, however, the operations look different than the operations we have for categories. Let’s see them.

What are monoids

Monoids are simpler than categories. A monoid is defined by a collection/set of elements (called the monoid’s underlying set, together with a monoid operation — a rule for combining two elements that produces a third element one of the same kind.

Let’s take our familiar colorful balls.

Balls

We can define a monoid based on this set by specifying an operation for “combining” two balls into one. An example of such an operation would be blending the colours of the balls as if we are mixing paint.

An operation for combining balls

You can probably think of other ways to define a similar operation. This will help you realize that there can be many ways to create a monoid from a given set of set elements i.e. the monoid is not the set itself, it is the set together with the operation.

Associativity

The monoid operation should, like functional composition, be associative i.e. the way in which elements are grouped when applying the operation does not make any difference.

Associativity in the color mixing operation

When an operation is associative, this means we can use all kinds of algebraic operations to any sequence of terms (or in other words to apply equation reasoning), like for example we can replace any element with a set of elements from which it is composed, or add a term that is present at both sides of an equation and retain the equality of the existing terms.

Associativity in the color mixing operation

The identity element

Actually, not any (associative) operation for combining elements makes for a monoid (it makes for a semigroup, which is also a thing, but that’s a separate topic). To be a monoid, a set must feature what is called an identity element of the operation, a concept of which you are already familiar from both sets and categories — it is an element that when combined with any other element gives back that same element (not the identity but the other one). Or simply $x • i = x$ and $i • x = x$ for any $x$.

In the case of our color-mixing monoid, the identity element is the white ball (or perhaps a transparent one, if we have one).

The identity element of the color-mixing monoid

As you probably remember from the last chapter, functional composition is also associative and it also contains an identity element, so you might start suspecting that it forms a monoid in some way. This is indeed the case, but with one caveat, which we will talk about later.

Basic monoids

To keep the suspense, before we discuss the relationship between monoids and categories, we are going through see some simple examples of monoids.

Monoids from numbers

Mathematics is not only about numbers, however, numbers do tend to pop up in most of its areas, and monoids are no exception. The set of natural numbers $\mathbb{N}$ ($\{ 0, 1, 2, 3 ...\}$) forms a monoid when combined with the all too familiar operation of addition (or under addition as it is traditionally said). This monoid is denoted $\left< \mathbb{N},+ \right>$ (in general, all monoids are denoted by specifying the set and the operation, enclosed in angle brackets).

The monoid of numbers under addition

If you see a $1 + 1 = 2$ in your textbook you know you are either reading something very advanced, or very simple, although I am not really sure which of the two applies in the present case.

Anyways, the natural numbers also form a monoid under multiplication as well.

The monoid of numbers under multiplication

Task 1: Which are the identity elements of those monoids?

Task 2: Go through other mathematical operations and verify that they are monoidal.

Task 3: The natural numbers form a monoid under multiplication, but not a group. Find out why.

Monoids from boolean algebra

Thinking about operations that we covered, we may remember the boolean operations and and or. Both of them form monoids, which operate on the set, consisting of just two values ${ True, False }$.

Task 4: Prove that AND $\land$ is associative by expanding the formula $(A \land B) \land C = A \land (B \land C)$ with all possible values. Do the same for or.

Task 5: Which are the identity elements of the and and or operations?

Monoid operations in terms of set theory

We now know what the monoid operation is, and we even saw some simple examples. However, we never defined the monoid rule/operation formally i.e. using the language of set theory with which we defined everything else. Can we do that? Of course we can — everything can be defined in terms of sets.

We said that a monoid consists of two things: a set (let’s call it $A$), and a monoid operation that acts on that set. Since $A$ is already defined in set theory (because it is just a set), all we have to do is define the monoid operation.

Defining the operation is not hard at all. Actually, we have already done it for the operation $+$ — in Chapter 2, we said that addition can be represented in set theory as a function that accepts a product of two numbers and returns a number (formally $+: \mathbb{Z} \times \mathbb{Z} \to \mathbb{Z}$).

The plus operation as a function

Every other monoid operation can also be represented in the same way — as a function that takes a pair of elements from the monoid’s set and returns one other monoid element.

The color-mixing operation as a function

Formally, we can define a monoid from any set $A$, by defining an (associative) function with type signature $A \times A \to A$. That’s it. Or to be precise, that is one way to define the monoid operation. And there is another way, which we will see next. Before that, let’s examine some other types of structures.

Other monoid-like objects

Monoid operations obey two laws — they are associative and there exists an identity element. In some cases, we come across operations that also obey other laws that are also interesting. Imposing more (or less) rules to the way in which objects are combined results in the definition of other monoid-like structures.

Commutative (abelian) monoids

Looking at the monoid laws and the examples we gave so far, we observe that all of them obey one more rule (law) which we didn’t specify — the order in which the operations are applied is irrelevant to the end result.

Commutative monoid operation

Such operations (ones for which combining a given set of elements yields the same result no matter which one is first and which one is second) are called commutative operations. Monoids with operations that are commutative are called commutative monoids.

As we said, addition is commutative as well — it does not matter whether I have given you 1 apple and then 2 more, or if I have given you 2 first and then 1 more.

Commutative monoid operation

All monoids that we examined so far are also commutative. We will see some non-commutative ones later.

Groups

A group is a monoid such that for each of its elements, there is another element which is the so-called “inverse” of the first one where the element and its inverse cancel each other out when applied one after the other. Plain-English definitions like this make you appreciate mathematical formulas more — formally we say that for all elements $x$, there must exist $x’$ such that $x • x’ = i$ (where $i$ is the identity element).

If we view monoids as a means of modelling the effect of applying a set of (associative) actions, we use groups to model the effects of actions which are also reversible.

A nice example of a group, which is related to a monoid we covered, is the set of integers under addition — the operation is again ($+$), but the objects are the integers $\mathbb{Z}$, not the natural numbers $\mathbb{N}$ (so it’s not $\{ 0, 1, 2, 3 ...\}$, but $\{... -3, -2 -1, 0, 1, 2, 3 ...\}$). The negative numbers are added, as the natural numbers don’t have inverses. The inverse of each number is its opposite number (positive numbers’ inverse are negatives and vice versa).

In this instance, the above formula becomes $x + (-x) = 0$

The study of groups is a field that is much bigger than the theory of monoids (and perhaps bigger than category theory itself). And one of its biggest branches is the study of “symmetry groups” which we will look into next.

Summary

Before we move on — the algebraic structures that we saw above can be summarized based on the laws that define them in this table:

	Semigroups	Monoids	Groups
Associativity	X	X	X
Identity		X	X
Invertability			X

And now on to symmetry groups.

Symmetry groups and group classifications

An interesting kind of groups/monoids are the groups of symmetries of geometric figures. Given some geometric figure, a symmetry is an action after which the figure is not displaced (e.g. it can fit into the same mold that it fitted before the action was applied).

We won’t use the balls this time, because in terms of symmetries, they have just one position and hence just one action — the identity action (which is its own reverse, by the way).

Instead, let’s take this triangle, which, for our purposes, is the same as any other triangle. We are not interested in the triangle itself, but in its rotations. The only thing we need to make ourselves believe is that this is an “unrotated” triangle i.e. the one which represents the identity rotation.

A triangle

Groups of rotations

Let’s first review the group of ways in which we can rotate our triangle i.e. its rotation group. A geometric figure can be rotated without displacement in positions equal to the number of its sides, so, for our triangle, there are 3 positions.

The group of rotations in a triangle

Connecting the dots (or the triangles in this case) shows us that there are just 3 possible rotations that get us from any state of the triangle to any other one — a 120-degree rotation (i.e. flipping the triangle one time) and a 240-degree rotation (i.e. flipping it twice, or equivalently, flipping it once in the opposite direction) and the identity action of 0-degree rotation.

The group of rotations in a triangle

The rotations of a triangle form a monoid — the rotations are objects (of which the zero-degree rotation is the identity) and the monoid operation which combines two rotations into one is just the operation of performing the first rotation and then performing the second one.

NB: Note once again that the elements in the group are the rotations, not the triangles themselves, actually the group has nothing to do with triangles, as we shall see later.

Cyclic groups/monoids

The diagram that enumerates all the rotations of a more complex geometrical figure looks quite messy at first.

The group of rotations in a more complex figure

But it gets much simpler to grasp if we notice the following: although our group has many rotations, and there are more still for figures with more sides (if I am not mistaken, the number of rotations is equal to the number of the sides), all those rotations can be reduced to the repetitive application of just one rotation, (for example, the 120-degree rotation for triangles and the 45-degree rotation for octagons). Let’s make up a symbol for this rotation.

The group of rotations in a triangle

Symmetry groups that have such “main” rotation, and in general, groups and monoids that have an object that is capable of generating all other objects by its repeated application, are called cyclic groups. The “main” rotation is called the group’s generator.

All rotation groups/monoids are cyclic groups. Another example of a cyclic monoid is, yes, the natural numbers under addition, with $+1$ as the generator.

The monoid of natural numbers under addition

The group of integers under addition is cyclic too — here we can use $+1$ or $-1$ as the generator (as whichever of the two we choose, we would get the other one by applying the inverse wall).

The group of integers under addition

Wait, how can this be a cyclic group when there are clearly no cycles? This is because the integers are an infinite cyclic group.

A number-based example of a finite cyclic group is the group of integers under modular arithmetic (sometimes called “clock arithmetic”). Modular arithmetic’s operation is based on a number called the modulus (let’s take $12$ for example). In it, each number is mapped to the remainder of the integer division of that number and the modulus.

For example: $1 \pmod{12} = 1$ (because $1/12 = 0$ with $1$ remainder) $2 \pmod{12} = 2$ etc.

But $13 \pmod{12} = 1$ (as $13/12 = 1$ with $1$ remainder) $14 \pmod{12} = 2$, $15 \pmod{12} = 3$ etc.

In effect, numbers “wrap around” forming a group with as many elements as the modulus number. For example, a group representation of modular arithmetic with modulus $3$ has 3 elements.

The group of numbers under addition

All cyclic groups that have the same number of elements (or that are of the same order) are isomorphic to each other (careful readers might notice that we haven’t yet defined what a group isomorphism is, even more careful readers might already have an idea about what it is).

For example, the group of rotations of the triangle is isomorphic to the group of integers under the addition with modulo $3$.

The group of numbers under addition

All cyclic groups are commutative (or “abelian” as they are also called).

Task 6: Show that there are no other groups with 3 objects, other than $Z_3$.

There are abelian groups that are not cyclic, but, as we shall see below, the concepts of cyclic groups and abelian groups are deeply related.

Group isomorphisms

We already mentioned group isomorphisms, but we didn’t define what they are. Let’s do that now — an isomorphism between two groups is an isomorphism ($f$) between their respective sets of elements, such that for any $a$ and $b$ we have $f(a \circ b) = f(a) \circ f(b)$. Visually, the diagrams of isomorphic groups have the same structure.

Group isomorphism between different representations of S3

As in category theory, in group theory isomorphic groups are considered instances of one and the same group. For example, the one above is called $Z_3$.

Finite groups

Like with sets, the concept of an isomorphism in group theory allows us to identify common finite groups.

The smallest group is just the trivial group $Z_1$ that has just one element.

The smallest group

The smallest non-trivial group is the group $Z_2$ which has two elements.

The smallest non-trivial group

$Z_2$ is also known as the boolean group, due to the fact that it is isomorphic to the ${ True, False }$ set under the operation that negates a given value.

Like $Z_3$, $Z_1$ and $Z_2$ are cyclic.

Group/monoid products

We already saw a lot of abelian groups that are also cyclic, but we didn’t see any abelian groups that are not cyclic. So let’s examine what these look like. This time, instead of looking into individual examples, we will present a general way for producing abelian non-cyclic groups from cyclic ones — it is by uniting them by using group product.

Given any two groups, we can combine them to create a third group, comprised of all possible pairs of elements from the two groups and of the sum of all their actions.

Let’s see how the resulting group looks after taking the product of the following two groups (which, having just two elements and one operation, are both isomorphic to $Z2$). To make it easier to imagine them, we can think of the first one as based on the vertical reflection of a figure and the second, as the horizontal reflection.

Two trivial groups

(again we have to pretend that the left versions of the figures are the “unflipped” versions, while the right ones are flipped (although it can work the other way around too))

We get the set of elements of the new group by taking the Cartesian product of the set of elements of the first group and the set of elements of the second.

Two trivial groups

And the actions of a product group are comprised of the actions of the first group, combined with the actions of the second, where each action is applied only to the element that is a member of its original group, leaving the other element unchanged.

Klein four

The product of the two groups presented is called the Klein four-group and it is the simplest non-cyclic Abelian group.

Another way to present the Klein four-group is the group of symmetries of a non-square rectangle.

Klein four

Task 7: Show that the two representations are isomorphic.

Here are some examples of how elements of the Klein four-group are combined.

Klein four

(i.e. horizontal/vertical rotations cancel each other out, while a horizontal rotation doesn’t cancel out a vertical one.)

The Klein four-group is non-cyclic (because there are not one, but two generators) — vertical and horizontal spin. It is, however, still abelian, because the ordering of the actions still does not matter for the end result. Actually, the Klein four-group is the smallest non-cyclic group.

Cyclic product groups

In the previous chapter, we saw one non-cyclic product group (the Klein four-group), which was a product of cyclic groups. Most product groups (even the product of cyclic groups) would be non-cyclic, because it would have the generators of both groups that comprise it, i.e. even if the two original groups are cyclic and thus have 1 generator each, their product would still have 2 generators. But the product of two cyclic groups would still be cyclic if the number of elements of those groups (their orders) don’t have a common divisor other than 1 (i.e. if they are relatively prime numbers).

So, if you combine two groups with orders that have some common divisor (as $2$ and $2$, which are both divided by 2), then, their product would not be cyclic. But, if you combine two groups with orders that are relatively prime, (like $2$ and $3$) you would get a cyclic group.

Furthermore, the product of two relatively prime groups would be isomorphic to a cyclic group of the same order, as the product of the orders of its components e.g. the product of $Z_3$ and $Z_2$ is isomorphic to the group $Z_6$ ($Z_3 \times Z_2 \cong Z_6$)

Chinese reminder theorem

This is a consequence of an ancient result, known as the Chinese Remainder theorem.

Abelian product groups

Product groups are abelian, provided that the groups that form them are abelian. We can see that this is true by noticing that, although there is more than one generator, each acts only on its own part of the group, and so doesn’t interfere with any others.

Fundamental theorem of Finite Abelian groups

Products provide one way to create non-cyclic abelian groups — by creating a product of two or more cyclic groups. The fundamental theory of finite abelian groups is a result that tells us that this is the only way to produce non-cyclic abelian groups i.e.

All abelian groups are either cyclic or products of cyclic groups.

We can use this law to gain an intuitive understanding of what Abelian groups are, but also to test whether a given group can be broken down to a product of more elementary groups.

Dihedral groups

Now, let’s finally examine a non-commutative group — the group of rotations and reflections of a given geometrical figure. It is the same as the last one, but here besides the rotation action that we already saw (and its composite actions), we have the action of flipping the figure vertically, an operation which results in its mirror image:

Reflection of a triangle

Those two operations and their composite results in a group called $Dih3$ that is not abelian (and is furthermore the smallest non-abelian group).

The group of rotations and reflections in a triangle

Task 8: Prove that this group is indeed not abelian.

Task 9: Besides having two main actions, what is the defining factor that makes this and any other group non-abelian?

Groups that represent the set of rotations and reflections of any 2D shape are called dihedral groups.

Groups/monoids categorically

Now it’s the place for the grand reveal — groups/monoids are categories. More precisely, monoids are a specific type of categories, (and groups too).

This is not to say that the definition that we examined, where we describe them as sets and binary operations, is a lie. It just says that there is an alternative, categorical definition, which is equivalent to it. Let’s dive in.

Monoid elements as objects

When we defined monoids, we presented their elements as objects and their operation — as a function/morphism that converts two objects into a third one. Then, we introduced a way for representing such operations using set theory — as functions that take a pair of elements from the monoid’s set and return one other monoid element.

The color-mixing operation as a function

Under this correspondence, this specific mixing in the color-mixing monoid…

Monoid operation

…corresponds to this specific point in the above function (point, being a mapping of a specific element of the set).

Monoid operations as functions from pair of objects to a third object: (A X B) -> C)

However, this is not the only way to represent multi-argument functions set-theoretically — there is another, equally interesting way, that doesn’t rely on any data structures, but only on functions.

Monoid elements as morphisms

We saw that for some groups, like the groups of symmetries and rotations, the group elements can be understood not as objects but as actions. This is actually true for all other groups as well, e.g. the red ball in our color-blending monoid can be seen as the action of adding the color red to the mix, the number $2$ in the monoid of addition can be seen as the operation $+2$ etc.

Formally, any function that takes a pair of objects, can be transformed to a function that takes one object and returns a function that takes the other one and returns the result.

Monoid operations as functions (A X B) -> C) = A -> B -> C

This transformation is called currying in the name of Haskell Curry, although it was invented some years earlier by Moses Schönfinkel (Schönfinkelisation didn’t stick out for some reason). So, Schönfinkel discovered that the following two expressions are isomorphic.

The equivalence of curried and uncurried functions

Let’s take a step back and examine the groups/monoids that we covered so far in light of this equivalence e.g. let’s examine the symmetric group $Z_3$:

The group of rotations in a triangle - group notation

The elements of this group can be viewed as functions which take a figure and rotate it a given amount of degrees.

The group of rotations in a triangle - set notation

And, we can represent the group operation itself as functional composition.

The group of rotations in a triangle - set notation and normal notation

Formally, the 3 elements of $Z_3$ can be seen as 3 bijective (invertible) functions from a set of 3 elements to itself (in group-theoretic context, these kinds of functions are called permutations, by the way).

We can do the same for the addition monoid — numbers can be seen not as quantities (as in two apples, two oranges etc.), but as operations, (e.g. as the action of adding two to a given quantity).

Formally, the operation of the addition monoid, that we saw above has the following type signature.

$+: \mathbb{Z} \times \mathbb{Z} \to \mathbb{Z}$

Because of the isomorphism we presented above, this function is equivalent to the following function.

$+: \mathbb{Z} \to (\mathbb{Z} \to \mathbb{Z})$

When we apply an element of the monoid to that function (say $2$), the result is the function $+2$ that adds 2 to a given number.

$+2: \mathbb{Z} \to \mathbb{Z}$

And because the monoid operation is always given in the context of a given monoid, we can view the element $2$ and the function $+2$ as equivalent in the context of the monoid.

$2 \cong +2$

In other words, in addition to representing the monoid elements in the set as objects that are combined using a function, we can represent them as functions themselves.

Monoid operations as functional composition

As we said, when monoid elements are represented as functions, the monoid operation is represented as functional composition. The functions that represent the monoid elements have the same set as source and target, or the same signature, as we say (formally, they are of the type $A \to A$ for some $A$). Because of that, they all can be composed with one another, and the result of such compositions would also have the same signature.

The group of rotations in a triangle - set notation

This is true for all monoids, e.g. number functions can also be combined using functional composition.

$+2 \circ +3 \cong +5$

So, basically, the functions that represent the elements of a monoid also form a monoid, under the operation of functional composition (and the functions that represent the elements that form a group also form a group).

Task 10: Which are the identity elements of function groups?

Task 11: Show that the functions representing inverse group elements are also inverse.

Interlude: Currying

Take any function that accepts a pair of arguments of a given type (say $A$ and $B$) and maps them into some result of type $C$, so $A\times B\to C$ (in the case of monoids, the signature would be $A \times A \to A$, as all monoid objects are of the same type).

Schönfinkel showed that for each such function, there exists a function that maps the first of the two arguments (i.e. from $A$) to another function that maps the second argument to the final result (i.e. $B \to C$). So $A\to (B \to C)$, and vice versa.

The equivalence of curried and uncurried functions

In programming, currying is achieved by a higher-order function. Here is how such a function might be implemented.

const curry = <A, B, C> (f:(a:A, b:B) => C) => (a:A) => (b:B) => f(a, b)

And equally important is the opposite function, which maps a curried function to a multi-argument one, which is known as uncurry.

const uncurry = <A, B, C> (f:(a:A) => (b:B) => C) => (a:A, b:B) => f(a)(b)

There is a lot to say about these two functions, starting from the fact that their existence gives rise to an interesting relationship between the concept of a product and the concept of a morphism in category theory, called an adjunction. But we will cover this later. For now, we are interested in the fact the two function representations are isomorphic, formally $A\times B\to C\cong A\to B \to C$.

By the way, this isomorphism can be represented in terms of programming as well. It is equivalent to the statement that the following function always returns true for any arguments,

(...args) => uncurry(curry(f))(...args) === f(...args)

This is one part of the isomorphism, the other part is the equivalent function for curried functions.

Task 12: Write the other part of the isomorphism.

Cayley’s theorem

Once we learn how (using currying) to represent the elements of any monoid as permutations that also form a monoid, it isn’t too surprising to learn that this constructed permutation monoid is isomorphic to the one from which it was constructed. This is a result known as the Cayley’s theorem:

Any group is isomorphic to its corresponding permutation group.

Formally, if we use $Perm$ to denote the permutation group then $Perm(A) \cong A$ for any $A$.

The group of rotations in a triangle --- set notation and normal notation

Or in other words, representing the elements of a monoid/group as permutations actually yields a representation of the monoid itself (sometimes called its standard representation).

Cayley’s theorem may not seem very impressive, but that only shows how influential it has been as a result (and how much we learned).

Monoids as categories

We saw that converting the monoid’s elements to actions/functions yields an accurate representation of the monoid in terms of sets and functions.

The group of rotations in a triangle - set notation and normal notation

However, it seems that the set part of the structure in this representation is kinda redundant — you have the same set everywhere — so, it would do good if we can simplify it. And we can do that by depicting it as an external (categorical) diagram, like this one.

The group of rotations in a triangle - categorical notation

But wait, if the monoids’ underlying sets correspond to objects in category theory, then the corresponding category would have just one object. And so the correct representation would involve just one point from which all arrows come and to which they go.

The group of rotations in a triangle - categorical notation

The only difference between different monoids would be the number of morphisms that they have and the relationship between them.

The intuition behind this representation from a category-theoretic standpoint is encompassed by the law of closure that monoid and group operations have and that categories lack — it is the law stating that applying the operation (functional composition) on any two objects always yields the same object, e.g. no matter how you flip a triangle, you still get a triangle. As Tom Lehrer sings, “Try as you may, you just can’t get away from mathematics”. In the case of monoids, we can’t get away from the object.

	Categories	Monoids	Groups
Associativity	X	X	X
Identity	X	X	X
Invertibility			X
Closure		X	X

When we view a monoid as a category, this law says that all morphisms in the category should be from one object to itself - a monoid, any monoid, can be seen as a category with one object. The converse is also true: any category with one object can be seen as a monoid.

Let’s elaborate on this thought by reviewing the definition of a category from chapter 2.

A category is a collection of objects (we can think of them as points) and morphisms (arrows) that go from one object to another, where:

Each object has to have an identity morphism.

There should be a way to compose two morphisms with an appropriate type signature into a third one in a way that is associative.

Aside from the little-confusing fact that monoid objects are morphisms when viewed categorically, this describes exactly what monoids are.

Categories have an identity morphism for each object, so for categories with just one object, there should also be exactly one identity morphism. And monoids do have an identity object, which when viewed categorically corresponds to that identity morphism.

Categories provide a way to compose two morphisms with an appropriate type signature, and for categories with one object, this means that all morphisms should be composable with one another. And the monoid operation does exactly that — given any two objects (or two morphisms, if we use the categorical terminology), it creates a third.

Philosophically, defining a monoid as a one-object category corresponds to the view of monoids as a model of how a set of (associative) actions that are performed on a given object alter its state. Provided that the object’s state is determined solely by the actions that are performed on it, we can leave it out of the equation and concentrate on how the actions are combined. And as per usual, the actions (and elements) can be anything, from mixing colors, to adding quantities to a given set of things etc.

Group/monoid presentations

When we view cyclic groups/monoids as categories, we would see that they correspond to categories that (besides having just one object) also have just one morphism (which, as we said, is called a generator) along with the morphisms that are created when this morphism is composed with itself. In fact, the infinite cyclic monoid (which is isomorphic to the natural numbers), can be completely described by this simple definition.

Presentation of an infinite cyclic monoid

This is so because applying the generator again and again yields all elements of the infinite cyclic group. Specifically, if we view the generator as the action $+1$ then we get the natural numbers.

Presentation of an infinite cyclic monoid

Finite cyclic groups/monoids are the same, except that their definition contains an additional law, stating that that once you compose the generator with itself $n$ number of times, you get identity morphism. For the cyclic group $Z_3$ (which can be visualized as the group of triangle rotations), this law states that composing the generator with itself $3$ times yields the identity morphism.

Presentation of a finite cyclic monoid

Composing the group generator with itself, and then applying the law, yields the three morphisms of $Z_3$.

Presentation of a finite cyclic monoid

We can represent product groups this way too. Let’s take Klein four-group as an example, The Klein four-group has two generators that it inherits from the groups that form it (which we considered as vertical and horizontal rotation of a non-square rectangle) each of which comes with one law.

Presentation of Klein four

To make the representation complete, we add the law for combining the two generators.

Presentation of Klein four - third law

And then, if we start applying the two generators and follow the laws, we get the four elements.

The elements of Klein four

The set of generators and laws that defines a given group is called the presentation of a group. Every group has a presentation.

Free monoids

We saw how picking a different selection of laws gives rise to various types of monoids. But what monoids would we get if we pick no laws at all? These monoids (we get a different one depending on the set we pick) are called free monoids. The word “free” is used in the sense that once you have the set, you can upgrade it to a monoid for free (i.e. without having to define anything else).

If you revisit the previous section you will notice that we already saw one such monoid. The free monoid with just one generator is isomorphic to the monoid of natural numbers.

The free monoid with one generator

We can make a free monoid from the set of colorful balls — the monoid’s elements would be sequences of all possible combinations of the balls.

The free monoid with the set of balls as a generators

The free monoid is a special one — each element of the free monoid over a given set can be converted to a corresponding element in any other monoid that uses the same set of generators by just applying the monoid’s laws. For example, here is how the elements above would look if we apply the laws of the color-mixing monoid.

Converting the elements of the free monoid to the elements of the color-mixing monoid

Task 14: Write up the laws of the color-mixing monoid.

If we put on our programmers’ hat, we will see that the type of the free monoid under the set of generators T (which we can denote as FreeMonoid<T>) is isomorphic to the type List<T> (or Array<T>, if you prefer) and that the intuition behind the special property that we described above is actually very simple: keeping objects in a list allows you to convert them to any other structure i.e. when we want to perform some manipulation on a bunch of objects, but we don’t know exactly what this manipulation is, we just keep a list of those objects until it’s time to do it.

While the intuition behind free monoids seems simple enough, the formal definition is not easily written… yet, simply because we have to cover more stuff.

We understand that being the most general of all monoids for a given set of generators, a free monoid can be converted to all of them. i.e. there exists a function from it to all of them. But what kind of function would that be? Tune in after a few chapters to find out.

Orders

Given a set of objects, there can be numerous criteria, based on which to order them (depending on the objects themselves) — size, weight, age, alphabetical order etc.

However, currently we are not interested in the criteria that we can use to order objects, but in the nature of the relationships that define order. Of which there can be several types as well.

Mathematically, the order as a construct is represented (much like a monoid) by two components.

One is a set of things (e.g. colorful balls) which we sometimes call the order’s underlying set.

Balls

And the other is a binary relation between these things, which are often represented as arrows.

Binary relation

Not all binary relationships are orders — only ones that fit certain criteria that we are going to examine as we review the different types of order.

Linear order

Let’s start with an example — the most straightforward type of order that you think of is linear order i.e. one in which every object has its place depending on every other object. In this case the ordering criteria is completely deterministic and leaves no room for ambiguity in terms of which element comes before which. For example, order of colors, sorted by the length of their light-waves (or by how they appear in the rainbow).

Linear order

Using set theory, we can represent this order, as well as any other order, as a cartesian products of the order’s underlying set with itself.

Binary relation as a product

And in programming, orders are defined by providing a function which, given two objects, tells us which one of them is “bigger” (comes first) and which one is “smaller”. It isn’t hard to see that this function is actually a definition of a cartesian product.

[1, 3, 2].sort((a, b) => { 
  if (a > b) {
    return true 
  } else {
    return false
  } 
})

However (this is where it gets interesting) not all such functions (and not all cartesian products) define orders. To really define an order (e.g. give the same output every time, independent of how the objects were shuffled initially), functions have to obey several rules.

Incidentally, (or rather not incidentally at all), these rules are nearly equivalent to the mathematical laws that define the criteria of the order relationship i.e. those are the rules that define which element can point to which. Let’s review them.

Reflexivity

Let’s get the most boring law out of the way — each object has to be bigger or equal to itself, or $a ≤ a$ for all $a$ (the relationship between elements in an order is commonly denoted as $≤$ in formulas, but it can also be represented with an arrow from first object to the second.)

Reflexivity

There is no special reason for this law to exist, except that the “base case” should be covered somehow.

We can formulate it the opposite way too and say that each object should not have the relationship to itself, in which case we would have a relation than resembles bigger than, as opposed to bigger or equal to and a slightly different type of order, sometimes called a strict order.

Transitivity

The second law is maybe the least obvious, (but probably the most essential) — it states that if object $a$ is bigger than object $b$, it is automatically bigger than all objects that are smaller than $b$ or $a ≤ b \land b ≤ c \to a ≤ c$.

Transitivity

This is the law that to a large extend defines what an order is: if I am better at playing soccer than my grandmother, then I would also be better at it than my grandmother’s friend, whom she beats, otherwise I wouldn’t really be better than her.

Antisymmetry

The third law is called antisymmetry. It states that the function that defines the order should not give contradictory results (or in other words you have $x ≤ y$ and $y ≤ x$ only if $x = y$).

antisymmetry

It also means that no ties are permitted — either I am better than my grandmother at soccer or she is better at it than me.

Totality

The last law is called totality (or connexity) and it mandates that all elements that belong to the order should be comparable ($a ≤ b \lor b ≤ a$). That is, for any two elements, one would always be “bigger” than the other.

By the way, this law makes the reflexivity law redundant, as reflexivity is just a special case of totality when $a$ and $b$ are one and the same object, but I still want to present it for reasons that will become apparent soon.

connexity

Actually, here are the reasons: this law does not look so “set in stone” as the rest of them i.e. we can probably think of some situations in which it does not apply. For example, if we aim to order all people based on soccer skills there are many ways in which we can rank a person compared to their friends their friend’s friends etc. but there isn’t a way to order groups of people who never played with one another.

Orders, like the order people based on their soccer skills, that don’t follow the totality law are called partial orders, (and linear orders are also called total orders.)

Task 1: Previously, we covered a relation that is pretty similar to this. Do you remember it? What is the difference?

Task 2: Think about some orders that you know about and figure out whether they are partial or total.

Partial orders are actually much more interesting than linear/total orders. But before we dive into them, let’s say a few things about numbers.

The order of natural numbers

Natural numbers form a linear order under the operation bigger or equal to (the symbol of which we have been using in our formulas.)

numbers

In many ways, numbers are the quintessential order — every finite order of objects is isomorphic to a subset of the order of numbers, as we can map the first element of any order to the number $1$, the second one to the number $2$ etc (and we can do the opposite operation as well).

If we think about it, this isomorphism is actually closer to the everyday notion of a linear order, than the one defined by the laws — when most people think of order, they aren’t thinking of a transitive, antisymmetric and total relation, but are rather thinking about criteria based on which they can decide which object comes first, which comes second etc. So it’s important to notice that the two are equivalent.

Linear order isomorphisms

From the fact that any finite order of objects is isomorphic to the natural numbers, it also follows that all linear orders of the same magnitude are isomorphic to one another.

So, the linear order is simple, but it is also (and I think that this isomorphism proves it) the most boring order ever, especially when looked from a category-theoretic viewpoint — all finite linear orders (and most infinite ones) are just isomorphic to the natural numbers and so all of their diagrams look the same way.

Linear order (general)

However, this is not the case with partial orders that we will look into next.

Partial order

Like a linear order, a partial order consists of a set plus a relation, with the only difference that, although it still obeys the reflexive, transitive and the antisymmetric laws, the relation does not obey the law of totality, that is, not all of the sets elements are necessarily ordered. I say “necessarily” because even if all elements are ordered, it is still a partial order (just as a group is still a monoid) — all linear orders are also partial orders, but not the other way around. We can even create an order of orders, based on which is more general.

Partial orders are also related to the concept of an equivalence relations that we covered in chapter 1, except that symmetry law is replaced with antisymmetry.

If we revisit the example of the soccer players rank list, we can see that the first version that includes just myself, my grandmother and her friend is a linear order.

Linear soccer player order

However, including this other person whom none of us played yet, makes the hierarchy non-linear i.e. a partial order.

Soccer player order - leftover element

This is the main difference between partial and total orders — partial orders cannot provide us with a definite answer of the question who is better than who. But sometimes this is what we need — in sports, as well as in other domains, there isn’t always an appropriate way to rate people linearly.

Chains

Before, we said that all linear orders can be represented by the same chain-like diagram, we can reverse this statement and say that all diagrams that look something different than the said diagram represent partial orders. An example of this is a partial order that contains a bunch of linearly-ordered subsets, e.g. in our soccer example we can have separate groups of friends who play together and are ranked with each other, but not with anyone from other groups.

Soccer order - two hierarchies

The different linear orders that make up the partial order are called chains. There are two chains in this diagram $m \to g \to f$ and $d \to o$.

The chains in an order don’t have to be completely disconnected from each other in order for it to be partial. They can be connected as long as the connections are not all one-to-one i.e. ones when the last element from one chain is connected to the first element of the other one (this would effectively unite them into one chain.)

Soccer order - two hierarchies and a join

The above set is not linearly-ordered. This is because, although we know that $d ≤ g$ and that $f ≤ g$, the relationship between $d$ and $f$ is not known — any element can be bigger than the other one.

Greatest and least objects

Although partial orders don’t give us a definitive answer to “Who is better than who?”, some of them still can give us an answer to the more important question (in sports, as well as in other domains), namely “Who is number one?” i.e. who is the champion, the player who is better than anyone else. Or, more generally, the element that is bigger than all other elements.

We call such element the greatest element. Some (not all) partial orders do have such element — in our last diagram $m$ is the greatest element, in this diagram, the green element is the biggest one.

Join diagram with one more element

Sometimes we have more than one elements that are bigger than all other elements, in this case none of them is the greatest.

A diagram with no greatest element

In addition to the greatest element, a partial order may also have a least (smallest) element, which is defined in the same way.

Joins

The least upper bound of two elements that are connected as part of an order is called the join of these elements, e.g. the green element is a join of the other two.

Join

There can be multiple elements bigger than $a$ and $b$ (all elements that are bigger than $c$ are also bigger than $a$ and $b$), but only one of them is a join. Formally, the join of $a$ and $b$ is defined as the smallest element that is bigger than both $a$ and $b$ (i.e. smallest $c$ for which $a ≤ c$, and $b ≤ c$.)

Join with other elements

Given any two elements in which one is bigger than the other (e.g. $a ≤ b$), the join is the bigger element (in this case $b$).

In a linear orders, the join of any two elements is just the bigger element.

Like with the greatest element, if two elements have several upper bounds that are equally big, then none of them is a join (a join must be unique).

A non-join diagram

If, however, one of those elements is established as smaller than the rest of them, it immediately qualifies.

A join diagram

Task 3: Which concept in category theory reminds you of joins?

Meets

Given two elements, the biggest element that is smaller than both of them is called the meet of these elements.

Meet

The same rules as for the joins apply.

Hasse diagrams

The diagrams that we use in this section are called “Hasse diagrams” and they work much like our usual diagrams, however they have an additional rule that is followed — “bigger” elements are always positioned above smaller ones.

In terms of arrows, the rule means that if you add an arrow to a point, the point to which the arrow points must always be above the one from which it points.

A join diagram

This arrangement allows us to compare any two points by just seeing which one is above the other e.g. we can determine the join of two elements, by just identifying the elements that they connect to and see which one is lowest.

Color order

We all know many examples of total orders (any form of chart or ranking is a total order), but there are probably not so many obvious examples of partial orders that we can think of off the top of our head. So let’s see some. This will gives us some context, and will help us understand what joins are.

To stay true to our form, let’s revisit our color-mixing monoid and create a color-mixing partial order in which all colors point to colors that contain them.

A color mixing poset

If you go through it, you will notice that the join of any two colors is the color that they make up when mixed. Nice, right?

Join in a color mixing poset

Numbers by division

We saw that when we order numbers by “bigger or equal to”, they form a linear order (the linear order even.) But numbers can also form a partial order, for example they form a partial order if we order them by which divides which, i.e. if $a$ divides $b$, then $a$ is before $b$ e.g. because $2 \times 5 = 10$, $2$ and $5$ come before $10$ (but $3$, for example, does not come before $10$.)

Divides poset

And it so happens (actually for very good reason) that the join operation again corresponds to an operation that is relevant in the context of the objects — the join of two numbers in this partial order is their least common multiple.

And the meet (the opposite of join) of two numbers is their greatest common divisor.

Divides poset

Inclusion order

Given a collection of all possible sets containing a combination of a given set of elements…

A color mixing poset, ordered by inclusion

…we can define what is called the inclusion order of those sets, in which $a$ comes before $b$ if $a$ includes $b$, or in other words if $b$ is a subset of $a$.

A color mixing poset, ordered by inclusion

In this case the join operation of two sets is their union, and the meet operation is their set intersection.

This diagram might remind you of something — if we take the colors that are contained in each set and mix them into one color, we get the color-blending partial order that we saw earlier.

A color mixing poset, ordered by inclusion

The order example with the number dividers is also isomorphic to an inclusion order, namely the inclusion order of all possible sets of prime numbers, including repeating ones (or alternatively the set of all prime powers). This is confirmed by the fundamental theory of arithmetic, which states that every number can be written as a product of primes in exactly one way.

Divides poset

Order isomorphisms

We mentioned order isomorphisms several times already so this is about time to elaborate on what they are. Take the isomorphism between the number partial order and the prime inclusion order as an example. Like an isomorphism between any two sets, it is comprised of two functions:

One function from the prime inclusion order, to the number order (which in this case is just the multiplication of all the elements in the set)
One function from the number order to the prime inclusion order (which is an operation called prime factorization of a number, consisting of finding the set of prime numbers that result in that number when multiplied with one another).

Divides poset

An order isomorphism is essentially an isomorphism between the orders’ underlying sets (invertible function). However, besides their underlying sets, orders also have the arrows that connect them, so there is one more condition: in order for an invertible function to constitute an order isomorphism, it has to respect those arrows, in other words it should be order preserving. More specifically, applying this function (let’s call it $F$) to any two elements in one set ($a$ and $b$) should result in two elements that have the same corresponding order in the other set (so $a ≤ b$ if and only if $F(a) ≤ F(b)$). Birkhoff’s representation theorem —

So far, we saw two different partial orders, one based on color mixing, and one based on number division, that can be represented by the inclusion orders of all possible combinations of sets of some basic elements (the primary colors in the first case, and the prime numbers (or prime powers) in the second one.) Many other partial orders can be defined in this way. Which ones exactly, is a question that is answered by an amazing result called Birkhoff’s representation theorem. They are the finite partial orders that meet the following two criteria:

All elements have joins and meets.
Those meet and join operations distribute over one another, that is if we denote joins as meets as $∨$ or $∧$, then $x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z)$.

The partial orders that meet the first criteria are called lattices. The ones that meet the second one are called distributive lattices.

And the “prime” elements which we use to construct the inclusion order are the elements that are not the join of any other elements. They are also called join-irreducible elements.

By the way, the partial orders that are not distributive lattices are also isomorphic to inclusion orders, it is just that they are isomorphic to inclusion orders that do not contain all possible combinations of elements.

Lattices

We will now review the orders for which Birkhoff’s theorem applies i.e. the lattices. Lattices are partial orders, in which every two elements have a join and a meet. So every lattice is also partial order, but not every partial order is a lattice (we will see even more members of this hierarchy).

Most partial orders that are created based on some sort of rule are distributive lattices, like for example the partial orders from the previous section are also distributive lattices when they are drawn in full, for example the color-mixing order.

A color mixing lattice

Notice that we added the black ball at the top and the white one at the bottom. We did that because otherwise the top three elements wouldn’t have a join element, and the bottom three wouldn’t have a meet.

Bounded lattices

Our color-mixing lattice, has a greatest element (the black ball) and a least element (the white one). Lattices that have a least and greatest elements are called bounded lattices. It isn’t hard to see that all finite lattices are also bounded.

Task 4: Prove that all finite lattices are bounded.

Interlude: Formal concept analysis

In the previous section we (along with Christopher Alexander) argued that lattice-based hierarchies are “natural”, that is, they arise in nature. Now, we will see an interesting way to uncover such hierarchies, given a set of objects that share some attributes. This is an overview of a mathematical method, called formal context analysis.

The data structure that we will be analyzing, called formal context consists of 3 sets. Firstly, the set containing all objects that we will be analyzing (denoted as $G$).