I’ve always wondered how did we formally define the numbers starting from the ZFC (Zermelo-Fraenkel-Choice) set theory. However, I also knew that Zermelo’s axioms sets were sufficient to define “arithmetic”, as it is at least enough to prove its incompleteness (Gödel’s incompleteness theorem).

So in this article, I want to write about how I managed to formally prove any of the usual number properties, based only on Zermelo’s axioms.

*Disclaimer:* I have never formally learned these reasoning, so what I say may be a little off. Also, while I tried to be as beginner-friendly as possible, I am still redefining everything, so it may be a little abrupt for a beginner (also, the fact that I try to be as formal as possible may not help).

NB: the convention in this article is that 0 \in \N.

## The starting point

Obviously, Zermelo’s axioms are based on some logic: here it is the classical logic. What this means is that it uses different rules, such as double negation elimination:

\neg \neg A \equiv A

and quantifiers:

\exists, \forall

And of course, being a *set* theory, every object is, well, a set. Note that a *set* is to be distinguished from a *class*: any set is a class, but there exists classes that are not sets. For example, the class of all sets that do not contain themselves: if it were a set, then it would contain itself, in which case it wouldn’t, in which case it would… Such classes (“collections of objects” that are not sets) are called “proper classes”.

The only operator defined are thus quantifiers, boolean relations (and, or, implies, equivalent), set equality, and membership: a \in b.

### The axioms

Here is a list of Zermelo’s axioms, though rephrased as I understand them.

#### Axiom of extensionality

If two sets contain the same elements, they are the same:

\forall a \forall b [(\forall x \enspace x \in a \Leftrightarrow x \in b) \Rightarrow a = b]

Of course, if two sets are equal they contain the same elements. This axiom means that you cannot distinguish two sets that have the same elements (eg, they are not ordered, for example).

#### Axiom of pairing

For every two sets, there exists a set that only contains the two of them:

\forall a \forall b \exists c \forall x [x \in c \Leftrightarrow (x = a \lor x = b)]

Such a set will be denoted as \{a, b\}.

It also implies the existence of the singleton \{a\}: if we set b = a

\forall a \exists c \forall x [x \in c \Leftrightarrow (x=a \lor x=a)] \\ \forall a \exists c \forall x [x\in c \Leftrightarrow x=a]

#### Axiom of union

For every set, there exists the set of the union of its elements:

\forall a \exists c \forall x [x \in c \Leftrightarrow (\exists y \enspace x \in y \land y \in a)]

Note that in ZF, a weaker version of this axiom is usually given according to Wikipedia, but because ZF also contains another axiom schema (replacement) which is not in Z, I cannot use the weaker version.

Such a set will be denoted \displaystyle\bigcup_{y\in a} y.

#### Axiom of power set

The class of all subsets of a set is also a set:

\forall a \exists c \forall x [x \in c \Leftrightarrow x \sub a]

where inclusion is defined as:

\forall a \forall b [a \sub b \Leftrightarrow \forall x (x \in a \Rightarrow x \in b)]

The above set will be denoted as \mathcal{P}(a).

#### Axiom of the empty set

There exists a set:

\exists c (c = c)

Yup, that’s it! The existence of the empty set is due to another axiom (see later).

#### Axiom schema of specification/separation/restricted comprehension

The class of elements of a set which respect a given property is a set: given a property P,

\forall a_1 \cdots \forall a_n \; \forall b \exists c \forall x [x \in c \Leftrightarrow (x \in b \land P\:x\:a_1\cdots a_n)]

Such a set will be denoted \{x \in b | P\:x\:a_1\cdots a_n\}.

#### Axiom of infinity

The class of “all natural numbers”, for a certain sense of natural numbers, is a set: if we (temporarily) denote

S(a) \equiv \varnothing \in a \land \forall x [ x \in a \Rightarrow x \cup \{x\} \in a]

then:

\exists \mathbb N \forall a [S(\mathbb N) \land (S(a) \Rightarrow \mathbb N \sub a)]

And yes, my use of \mathbb N is the one I will continue to use for this set. Remember, everything is sets, even numbers.

#### Fraenkel’s axioms and AC

The three other axioms from ZFC are:

- Axiom of foundation/regularity: any non-empty set contains a set that has no element in common: \forall a \exists x [x \in a \land x \cap a = \varnothing]
- Axiom schema of replacement: the image of a set by a partial application is also a set, for a given definition of “partial application” (the formalization is a bit too big to write)
- Axiom of choice: there are three versions of this one, but the one used in ZFC is the most general and controversial: for every non-empty set X of sets, there exists a function that maps any element A of this set X to one of its (A) element.

However, those axioms are not required to define the natural numbers, as will be demonstrated next (though the axiom of regularity would be a shortcut for a demonstration, for example).

### First conclusions

There are some useful constructs that are needed first.

#### For all in, exists in

I will use the following notations:

\forall x \in a\;P\:x \equiv \forall x [x \in a \Rightarrow P\:x] \\ \exists x \in a\;P\:x \equiv \exists x [x \in a \land P\:x] \\ \exists! x \;P\:x \equiv \exists x \forall y [P\:x \land (P\:y \Rightarrow y = x)] \\ \exists! x \in a\;P\:x \equiv \exists! x [x \in a \land P\:x] \\ \nexists x \;P\:x \equiv \forall x \;\lnot P\:x \\ \nexists x \in a\;P\:x \equiv \forall x[x \in a \Rightarrow \lnot P\:x]

#### Set operations

The existence of the empty set is given by the axiom of the empty set and the axiom schema of comprehension: given any set s (s exists due to the axiom of the empty set),

\exists \varnothing \forall x [x \in \varnothing \Leftrightarrow (x \in s \land x \ne x)]

Given two sets s_1 and s_2, the defined empty set (subscripted by 1 and 2) are equal as they have the same elements:

\forall x [x \in \varnothing_1 \Leftrightarrow (x \in s_1 \land x \ne x) \\ \Leftrightarrow (x \in s_2 \land x \ne x) \\ \Leftrightarrow x \in \varnothing_2]

Given two sets a and b, their union is:

a \cup b = \bigcup_{y \in \{a, b\}}y

Any finite union of sets thus stays a set.

The intersection of two sets is:

a \cap b = \{x \in a | x \in b\}

Any finite intersection of sets thus stays a set.

The difference between two sets is:

a \backslash b = \{x \in a | x \notin b\}

The intersection of the elements of a non-empty set exists (the set c):

\forall a[a \ne \varnothing \Rightarrow \exists s \:s\in a \land \exists c \forall x (x\in c \Leftrightarrow (x \in s \land \forall b \in a \: x \in b))]

The convoluted way of giving the existence is to render more explicitly the use of the axiom schema of comprehension: the intersection of all elements in a is, given an element s of a, all elements of s that are also in all elements of a.

(\forall x \;x \in a \Rightarrow x \in a) \therefore a \sub a \\ a \sub b \sub c \Leftrightarrow \forall x [(x \in a \Rightarrow x \in b) \land (x \in b \Rightarrow x \in c)] \\\Rightarrow \forall x [x\in a \Rightarrow x \in c] \Leftrightarrow a \sub c \\ a \sub b \sub a \Leftrightarrow (\forall x [x \in a \Rightarrow x \in b] \land \forall x [x \in b \Rightarrow x \in a]) \\\Leftrightarrow \forall x [x \in a \Leftrightarrow x \in b] \Leftrightarrow (a = b)

Set inclusion is a partial order (as if A = \{\varnothing, \{\varnothing\}\} and B = \{\{\{\varnothing\}\}\}, then A \not\sub B and B \not\sub A).

#### Pairs

Given two elements a and b, how do you define the *ordered* pair (a, b)? Well, like this:

(a, b) = \{\{a\}, \{a, b\}\}

And then, (a, b) = (c, d) \text{ iff } a = c \land c=d.

Indeed, if a = c and b = d then (a, b) = (c, d) becomes trivial, and for the other direction:

\{\{a\}, \{a, b\}\} = \{\{c\}, \{c, d\}\} \\ \begin{align*} \Leftrightarrow &(\{a\} = \{c\} \lor \{a\} = \{c, d\}) \\\land& (\{a, b\} = \{c\} \lor \{a, b\} = \{c, d\}) \\\land& (\{c\} = \{a\} \lor \{c\} = \{a, b\}) \\\land& (\{c, d\} = \{a\} \lor \{c, d\} = \{a, b\}) \end{align*}

If {a} = {c}, then a = c; if {a} = {c, d}, then a = c = d. Then, it follows that b = c or b = d; if b = c, then a = b = c, and \{c, d\} = \{a\} \lor \{c, d\} = \{a\}: d = a = b. In any case, a = c and b = d.

Then, to extract a or b, we can use \pi_1 and \pi_2, defined as followed:

\forall p \forall a [\pi_1 \;p \;a \Leftrightarrow \exists b (p = \{\{a\}, \{a, b\}\})] \\ \forall p \forall b [\pi_2 \;p \;b \Leftrightarrow \exists a (p = \{\{a\}, \{a, b\}\})] \\

Of course, under such a definition, a and b are unique as p is the *p*air (a, b).

Also, here is a definition of the set of all pairs from sets a and b:

a \times b = \{p \in \mathcal P(\mathcal P(a) \cup \mathcal P(a \cup b)) | \exists a \exists b \; p = (a, b)\}

#### Involution

Let P be a property such that P(\varnothing) and \forall x [P\:x \Rightarrow P\:(x \cup \{x\})]

Let’s look at S = \{n \in \mathbb N | P\:n\}. We know that S follows the property S(a) defined when defining \mathbb N, and we know that S \sub \mathbb N by definition.

So, \mathbb N \sub S \sub \mathbb N: S is the entire set of natural numbers.

Meaning, any proof by involution works. This may be the single most useful consequence of the axioms, as it will be used everywhere.

#### Applications

Given two sets a and b, we can define a partial application f from a to b in two ways:

\forall x \forall y [f \:x\:y \Leftrightarrow x \in a \land y = f(x)]

An application is a binary relation between x and y. We denote f(x) the only y that is in relation with x. In this case, a and b can also be proper classes.

f = \{p \in a \times b | \pi_2(p) = f(\pi_1(p))\}

An application is a set of pairs of antecedent and its corresponding image. We then denote f(x) = \pi_2(e), where e is the only element of f such that \pi_1(e) = x.

Notice that in this case, we used the partial applications \pi_1 and \pi_2, defined earlier.

The set of all partial applications, which to my knowledge has no official notation, is:

\mathcal F_p(A, B) = \{f \in \mathcal P(A \times B) | \\\forall a \exists b \;(a, b) \in f \Rightarrow (\forall c \;(a, c) \in f \Rightarrow c = b)\}

Any element of A has at most one image. The antecedent and image are implicitly in the set resp. A and B.

The set of all functions from A to B is:

\mathcal F(A, B) = \{f \in \mathcal F_p(A, B) | \forall a \in A \exists b \;(a, b) \in f\} \\ \mathcal F(A, B) = \{f \in \mathcal P(A \times B) | \forall a \in A \exists! b \;(a, b) \in f\}

One last thing: composition. In the rest of the document, I will write for any application f: a \to b where b \subseteq a:

f^{\circ 0}(x) = x \\ f^{\circ (n+1)}(x) = f^{\circ s(n)}(x) = f(f^{\circ n}(x)) = f^{\circ n}(f(x))

where s is the successor function (see below). Of course the “well-defineness” of such an expression can be proven by a simple induction, left for the reader as an exercise.

## The set of natural numbers

Finally, after all this preliminary work, we can start defining the operations in the set of natural numbers!

Starting with the start…

### The successor function

This function associates to a natural number its successor. To be more exact:

\begin{align*} s: \mathbb N &\to \mathbb N \\ n &\mapsto n \cup \{n\} \end{align*}

and

0 \colonequals \varnothing

As such, we also define

1 \colonequals s(0) = \{\varnothing\}

Now, a quite useful property of the successor function would be that it is injective, meaning:

\forall x\in\mathbb N \quad \forall y \in \N \quad (s(x) = s(y) \Rightarrow x = y)

However, while it would seem logical, remember that in Zermelo’s axioms there is not the axiom of foundation, and so nothing prevents a set from being in itself… What we actually need to do is a proof by induction that this axiom holds for any set in \N.

For this proof, let the following sets be defined recursively:

\begin{align*} \N_0 &\colonequals \N \times \{\varnothing\} \\ \forall n \in \N, \N_{s(n)} &\colonequals \{u \in \N \times \N_n | \pi_1(\pi_2(u)) \in \pi_1(u)\} \end{align*}

We first want to prove the following:

\mathcal P(n) : \text{\textquotedblleft}\forall m \in n, \forall p \in m, p \in n\text{\textquotedblright}

For \mathcal P(0), we get that for everything in the empty set, something must be true: it is true. Let’s suppose now that \mathcal P(n) is true.

Let m \in s(n) = n \cup \{n\}\text{, and } p \in m.

- If m \in n, then by inductive hypothesis, p \in n \sub n \cup \{n\} = s(n).
- Otherwise, m = n: p \in n \sub n \cup \{n\} = s(n).

We have proven that \mathcal P(n) \Rightarrow \mathcal P(s(n)), and as such the property is true for any element of \N.

Now we can prove the following:

\mathcal P(n) : \text{\textquotedblleft}\exists! e \in \N_n, \pi_1(e) = n \land \nexists e \in \N_n, \pi_1(e) \in n\text{\textquotedblright}

and we can explicitly give the unique element e that respect the first half.

For n = 0, \mathcal P(0) becomes:

\exists! e \in \N_0, \pi_1(e) = 0 \land \nexists e \in \N_0, \pi_1(e) \in 0 \\ \Leftrightarrow \exists!(e, \varnothing) \in \N \times \{\varnothing\}, e = \varnothing \\\land \nexists (e, \varnothing) \in \N \times \{\varnothing\}, e \in \varnothing \\

which is obviously true. Here the element is (0, \varnothing).

Suppose \mathcal P(n) for a given n. Let’s prove \mathcal P(s(n)).

Let e be in \N_{s(n)}, such that \pi_1(e) = s(n). Then:

\pi_1(e) = n \cup \{n\} \\ \pi_1(\pi_2(e)) \in n \cup \{n\}

However, since \pi_2(e) \in \N_n, it is impossible for \pi_1(\pi_2(e)) to be included in n. It is equal to n, and by inductive hypothesis, such an element \pi_2(e) is unique. As such, e is unique and exists. By induction, it is equal to (s(n), (n, (\cdots, (1, (0, \varnothing))\cdots))).

Now, let e be in \N_{s(n)}, such that \pi_1(e) \in s(n). Then:

\pi_1(e) \in n \lor \pi_1(e) = n \\ \pi_2(e) \in \N_n \land \pi_1(\pi_2(e)) \in \pi_1(e)

If \pi_1(e) = n, then we have a contradiction by induction, since there is no element of \N_n that starts with something contained in n, and if \pi_1(e) \in n, then \pi_1(\pi_2(e)) \in \pi_1(e) \in n \Rightarrow \pi_1(\pi_2(e)) \in n, which also leads to the same contradiction. We are done.

Now we show that s is injective:

\begin{align*} \forall n \in \N, \forall m \in \N,&\\ s(n)=s(m) \Leftrightarrow& n \cup \{n\} = m \cup \{m\} \\ \Leftrightarrow& (n \sub m \cup \{m\}) \land (n \in m \lor n = m) \\&\land (m \sub n \cup \{n\}) \land (m \in n \lor m = n) \\ \Leftrightarrow& (n = m) \lor (n \in m \land m \in n) \end{align*}

This second condition is impossible as otherwise, (m, (n, (\cdots, (n, \varnothing))\cdots)) \in \N_n and starts with something contained in n but no such element exist.

As such, the successor function is injective.

### Addition

Let + : \N^2 \to \N be the following infix function:

\begin{align*} n + 0 &= n \\ n + s(m) &= s(n + m) \end{align*}

This function is well-defined over all natural numbers, as when looking at the set D = \{n \in \N | m + n\text{ finishes } \forall m \in \N\} we have \N \sub D \sub \N.

I will next prove in order that + is associative, 0 is the neutral, 1 commutes with any number, thus + is commutative, and finally that + is cancellative, meaning a + b = a + c \Rightarrow b = c, or rather b + a = c + a \Rightarrow b = c.

\mathcal P(n) : \text{\textquotedblleft}\forall a \in \N, \forall b \in \N, a + (b + n) = (a + b) + n\text{\textquotedblright} \\\ \\ \forall a \in \N, \forall b \in \N, a + (b + 0) = a + b = (a + b) + 0: \\\mathcal P(0)\\\ \\ \begin{align*} \forall n \in \N, \forall a \in \N, \forall b \in \N,&\\ a + (b + s(n)) &= a + s(b + n) \\&= s(a + (b + n)) \\&= s((a+b)+n) \text{ [I.H.]} \\&= (a+b)+s(n) \\ \mathcal P(n) &\Rightarrow \mathcal P(s(n)) \end{align*}

\mathcal P(n) : \text{\textquotedblleft}n + 0 = n = 0 + n\text{\textquotedblright} \\\ \\ \forall n \in \N, n + 0 = n \text{ by definition} \\ 0 + 0 = 0: \mathcal P(0)\\ \begin{align*} \forall n \in \N, 0 + s(n) &= s(0 + n) \\&= s(n) \text{ [I.H.]:} \\ \mathcal P(n) &\Rightarrow \mathcal P(s(n)) \end{align*}

\mathcal P(n) : \text{\textquotedblleft}1 + n = n + 1\text{\textquotedblright} \\\ \\ 1 + 0 = 1 = 0 + 1: \mathcal P(0)\\ \begin{align*} \forall n \in \N, 1 + s(n) &= s(1 + n) \\&= s(n + 1) \text{ [I.H.]} \\&= n + s(1) \\&= n + (1 + 1) \\&= (n+1)+1 \\&= s(n) + 1: \\\mathcal P(n) &\Rightarrow \mathcal P(s(n)) \end{align*}

\mathcal P(n) : \text{\textquotedblleft}\forall m \in \N, m + n = n + m\text{\textquotedblright} \\\ \\ m + 0 = m = 0 + m: \mathcal P(0)\\ \begin{align*} \forall n \in \N, \forall m \in \N, m + s(n) &= s(m + n) \\&= s(n + m) \text{ [I.H.]} \\&= n + s(m) \\&= n + (m + 1) \\&= n + (1 + m) \\&= (n+1)+m \\&= s(n) + m: \\\mathcal P(n) &\Rightarrow \mathcal P(s(n)) \end{align*}

As an immediate consequence, \forall n\in \N, \forall m \in \N, n + s(m) = n + (m + 1) = n + (1 + m) = (n + 1) + m = s(n) + m.

\mathcal P(n) : \text{\textquotedblleft}\forall a \in \N, \forall b \in \N, a + n = b + n \Rightarrow a = b\text{\textquotedblright} \\\ \\ \forall a \in \N, \forall b \in \N, a = a + 0 = b + 0 = b: \mathcal P(0)\\ \begin{align*} \forall n \in \N, \forall a \in \N, \forall b \in \N,&\\ a + s(n) = b + s(n) &\Rightarrow s(a + n) = s(b + n) \\&\Rightarrow a+n = b+n \text{ as $s$ is injective} \\&\Rightarrow a = b \text{ [I.H.]:} \\\mathcal P(n) &\Rightarrow \mathcal P(s(n)) \end{align*}

### Less than relation

Let \leqslant be the following binary relation:

\leqslant \;= \{(a, b) \in \N^2 | \exists c \in \N, a + c = b\}

I need to prove that this relation is reflexive, transitive, antisymmetric and (strongly) connected.

\forall n \in \N, n + 0 = n: n \leqslant n \\\,\\ \begin{align*} \forall n \in \N, \forall m \in \N, &\forall l \in \N, [(n \leqslant m) \land (m \leqslant l) \\\Rightarrow \exists c_1 \in \N, \exists c_2 \in \N, &(n + c_1 = m) \land (m + c_2 = l) \\\Rightarrow \exists c_1 \in \N, \exists c_2 \in \N,& n + (c_1 + c_2) \\&\qquad= (n + c_1) + c_2 = m + c_2 = l \\&\Rightarrow n \leqslant l] \\ \end{align*}

Let n \in \N, m \in \N such that (n \leqslant m) \land (m \leqslant n). Let c_1 \in \N, c_2 \in \N such that n + c_1 = m and m + c_2 = n.

Then, we have:

n + (c_1 + c_2) = (n + c_1) + c_2 = m + c_2 = n

Since addition is cancellative, it means c_1 + c_2 = 0. The only way to be so is that c_1 = c_2 = 0, as otherwise, if c_1 = s(c_0) (for example), then c_1 + c_2 = s(c_0 + c_2) \ne 0 as no element is an antecedent of 0 by the successor function (for any set e, e \in s(e): s(e) \ne \varnothing).

Now we prove that \leqslant is (strongly) connected: let m \in \N. Lets prove

\mathcal P_m(n) : \text{\textquotedblleft}(n \leqslant m) \lor (m \leqslant n)\text{\textquotedblright} \\\ \\ 0 + m = m: 0 \leqslant m, \mathcal P(0)\\\,\\\text{Let $n \in \N$ such that $\mathcal P(n)$.}\\ \begin{align*} \text{If }m \leqslant n,\;&\exists c \in \N, \\&[m + c = n \Rightarrow m + s(c) = s(n)]: m \leqslant s(n)\\\text{else }(n \leqslant m) &\land (n \ne m) \text{ [I.H.]:}\\\exists c \in \N,\;&n + s(c) = m: s(n) + c = m, s(n) \leqslant m \\ \end{align*} \\ \mathcal P(n) \Rightarrow \mathcal P(s(n))

Next, I will show that adding a with b and c with d, where a \leqslant c and b \leqslant d, yields the inequality a + b \leqslant c + d: under those notations, there exists c_1 and c_2 such that

(a + c_1 = c) \land (b + c_2 = d): \\ c + d = (a + c_1) + (b + c_2) = (a + b) + (c_1 + c_2): \\ a + b \leqslant c + d

We can also have alternative definitions to this \leqslant relation:

n \leqslant m \Leftrightarrow n \sub m \Leftrightarrow (n \in m) \lor (n = m)

(Note: this is still a work in progress — expect some update in the future.)