Monthly Mathematics for Masters and Masses

This may be where I actually start using all three levels distinctly. There will definitely be noob-level stuff, and a lot of master-level stuff, but perhaps I will find some med-level stuff too, who knows. The post has yet to be written. This post is also going to feature one or two spoilers for things to omit on first reading which are master-level or even above that, like, for masters among masters who are brave enough to wade through the ocean of abstraction that those things are.

OK, so in the last post we talked about relations on sets. Now consider the relation of ownership. What set is it defined on? If we want a single set to defined it on, it would have to be the set of people, pets, plants, and things. But that sounds like a big mishmash. It would make way more intuitive sense to consider the set of people and the set of ownable entities (i.e. pets plants things). But we do not have a way to define a relation between different sets… just yet. Of course, the definition is just an adaptation of the definition of relation on a set: couldn't be easier.

Definition 1: Relation between sets. Given two sets $X,Y$ , a relation between

$X$ and

$Y$ , or of

$X$ with

$Y$ (in that order), is just a subset of $X\times Y$ . Again, we say $x\operatorname{R}y$ if $(x,y)\in R$ .

Obviously a relation on

$X$ is now a relation of

$X$ with itself.
Now, clearly the definitions of particular classes of relations on sets we saw last time cannot be reintroduced here. But there is a thing we can do with this new definition of relations between sets. Now, in the last post (btw sorry I'm posting this late, I sort of got caught up with other stuff to do first, and with research for this post then), we mentioned "functions" without ever defining them. I'm sure all of you have heard the word, and have an idea of a function which probably has nothing to do with relations, and can be formulated as follows.

Definition 2: Function (naïve definition). A function between two sets $X$ and $Y$ is a law that to every $x\in X$ associates one and one only $y\in Y$ , typically denoted $f(x)$ .

This is a vague term. What is a law, after all? Well, nowhere near as vague as what we gave for "sets", but getting formal there requires a whole new level of abstraction and is way over med-level so we'll get to that later in the post, but I advise you to just skip over it unless you're very well-versed in pure maths, as I consider myself to be. Anyways, it isn't hard to give a more formal definition of a function. If we have that kind of "law", we can easily associate to it a subset

$F\subseteq X\times Y$ : just say

$(x,y)\in R\iff y=f(x)$ . So we need to translate the "one and one only" part of the above definition, and it is done as follows.

Definition 3: function (formal definition). A function between two sets $X,Y$ is a relation $F\subseteq X\times Y$ such that for all $x\in X$ there exists a unique $y=:f(x)$ such that $(x,y)\in F$ .

When functions are thought of as laws, the set in definition 3 is usually called the graph of the function. So you may say that the formal definition of a function is simply identifying a function with its graph. Which makes total sense, since the graph identifies the law and viceversa, as is obvious. Indeed, the graph is often used to paint a picture of the function, especially if you can draw it on paper, as is done ever since middle school in math classes.
From now on, outside the foundational master-level spoiler, the naïve definition will suffice, so we will use that as a basis for properties of functions. I will now start following chapter 2 of Soardi very closely.

Definition 4: Various terms related to functions. Let $f:X\to Y$ . Then:

$f(x)$ will be called image of $x$ via $f$ ;
Map will be a synonym for function;
Given $A\subseteq X$ , $f(A):=\{f(x):x\in A\}=\{y\in Y:\exists x\in A:f(x)=y\}$ will be called image of $A$ via $f$ ; it goes without saying that $f(\varnothing)=\varnothing$ , but let's specify it just for the heck of it; if a function is defined on a set which contains sets as members, it might be wise to distinguish images of sets and of elements via notation other than capitalization, so I predispose the notation $\operatorname{Im}_f(A):=\{f(x):x\in A\}$ ;
Given $B\subseteq Y$ , we call $f^{-1}(B):=\{x\in X:f(x)\in B\}$ the preimage of $B$ via $f$ ; again, $f^{-1}(\varnothing)=\varnothing$ ; as in the item above, I predispose $\operatorname{Im}^{-1}_f(B):=\{x:f(x)\in B\}$ as a disambiguative notation;
$\operatorname{Dom}f:=X$ is called the domain or definition set of $f$ , and $\operatorname{Cod}f:=Y$ is the codomain; the image of $f$ is $\operatorname{Im}f:=f(X)=\operatorname{Im}_f(\operatorname{Dom}f)$ ;
$f$ is injective or one-to-one if $f(x)=f(y)\iff x=y$ ;
$f$ is surjective or onto if $f(X)=Y$ , or in other words, $\operatorname{Cod}f=\operatorname{Im}f$ ;
$f$ is bijective if it is both injective (one-to-one) and surjective (onto);
For any set $X$ , the function from $X$ to $X$ which maps each element to itself is called identity on $X$ and denoted by $\operatorname{id}_X$ ;
If $f:X\to Y$ and $g:Y\to Z$ , the composition of $g$ and $f$ , denoted $g\circ f$ , is the function $X\to Z$ such that $g\circ f(x)=g(f(x))$ ;
If $f$ is bijective, we will see it has an inverse, that is there exists $g:Y\to X$ such that $f\circ g=\operatorname{id}_Y$ and $g\circ f=\operatorname{id}_X$ ; such a $g$ is denoted $f^{-1}$ , so that $f^{-1}(y)$ , which would normally be the set $\{x\}$ (where $f(x)=y$ ), is also used to mean just the element $x$ .
If $A\subseteq X$ , the restriction of $f$ to $a$ , denoted $f|_A$ , is the function $A\to Y$ such that $f|_A(x)=f(x)$ for all $x\in A$ ;

Now we have some simple properties to prove for functions.

Theorem 1: Elementary properties of functions, images, preimages, and compositions. Let $f:X\to Y$ and $g:Y\to Z$ , $A,B\subseteq X$ and $C,D\subseteq Y$ . Then:

$f^{-1}(C\cup D)=f^{-1}(C)\cup f^{-1}(D)$ ;
$f^{-1}(C\cap D)=f^{-1}(C)\cap f^{-1}(D)$ ;
$f(A\cup B)=f(A)\cup f(B)$ ;
$f(A\cap B)\subseteq f(A)\cap f(B)$ , and if $f$ is injective equality is guaranteed;
$A\subseteq f^{-1}(f(A))$ and equality holds if $f$ is injective;
$B\supseteq f(f^{-1}(B))$ and equality holds if $f$ is surjective;
$f^{-1}(B^C)=(f^{-1}(B))^C$ ;
If $f$ is injective, $f(A^C)\subseteq(f(A))^C$ ;
If $f$ is surjective, $(f(A))^C\subseteq f(A^C)$ ;
If $f$ is bijective, $f(A^C)=(f(A))^C$ ;
The restriction of an injective $f$ is still injective;
If we replace the codomain with the image, any $f$ becomes surjective; in short, any $f$ is surjective on its image;
If $f,g$ are bijective, then $g\circ f$ is also bijective; in fact, injectivity and surjectivity are separately preserved by composition, that is, $f,g$ are injective, the composition also is, and if $f,g$ are surjective, so is the composition;
Composition of functions is associative, that is $(h\circ g)\circ f=h\circ(g\circ f)$ ;
$f$ is injective if and only if it has a left inverse, i.e. there is $g:Y\to X$ such that $g\circ f=\operatorname{id}_X$ ;
$f$ is surjective if and only if it has a right inverse, i.e. there is $g:Y\to X$ such that $f\circ g=\operatorname{id}_Y$ ; note: this requires the axiom of choice;
$f:X\to X$ is bijective if and only if it has an inverse.

Proof.

Let $x$ be on the left side. Then either $f(x)\in C$ , in which case $x\in f^{-1}(C)$ , or $f(x)\in D$ , in which case $x\in f^{-1}(D)$ . In both cases, $x\in f^{-1}(C)\cup f^{-1}(D)$ . The converse is similarly easy.
If $x$ is on the left side, $f(x)\in C\cap D$ , which implies $f(x)\in C$ and $f(x)\in D$ , that is $x\in f^{-1}(C)$ and $x\in f^{-1}(D)$ , putting it on the right side and proving $\subseteq$ . The reverse inclusion is similarly simple.
Assume $x$ is on the left. Then there is $y\in A\cup B$ such that $f(y)=x$ . If $y\in A$ , $x\in f(A)$ , and if $y\in B$ , $x\in f(B)$ . In all cases, $x$ is on the right, giving $\subseteq$ . If $x$ is on the right, then either there is $y\in A:f(y)=x$ , or there is $y\in B:f(y)=x$ . In both cases, $x$ is on the left, giving the desired equality.
If $x$ is on the left, there is $y\in A\cap B:f(y)=x$ , which puts $x$ in both $f(A)$ and $f(B)$ , giving the desired inclusion. If $f$ is not injective, there may be $x\in A,y\in B$ such that $f(x)=f(y)$ , so that $f(x)\in f(A)\cap f(B)$ , but $x\in A\smallsetminus B,y\in B\smallsetminus A$ , and if every element in $f^{-1}(f(x))$ is like that, then $f(x)\notin f(A\cap B)$ . This is clearly the only way to avoid an equality here, so injectivity, which excludes this option, guarantees equality.
If $x\in A$ , $f(x)\in f(A)$ , and any $y:f(y)=f(x)$ will be in $f^{-1}(f(A))$ , including $x$ , thus proving the desired inclusion. If $f$ is injective, assume $x\in f^{-1}(f(A))$ . This means $f(x)\in f(A)$ , making it so that there is $y\in A:f(y)=f(x)$ , but injectivity then forces $x=y$ , thus placing $x\in A$ and proving the reverse inclusion.
If $x$ is on the right, we have $y\in f^{-1}(B):f(y)=x$ , but then $f(y)\in B$ , which puts $x$ on the left. If $x$ is on the left, as soon as there is $y:f(y)=x$ , $y$ ends up in $f^{-1}(B)$ , so that $x$ is on the right. This means that the reverse inclusion is guaranteed if $f$ is surjective, or if $B\subseteq f(X)$ .
If $x\in f^{-1}(B^C)$ , then $f(x)\notin B$ , so $x\notin f^{-1}(B)$ , hence $f^{-1}(B^C)\subseteq(f^{-1}(B))^C$ . If $x$ is on the right, then $x\notin f^{-1}(B)$ , so $f(x)\notin B$ , hence $f(x)\in B^C$ , but then $x\in f^{-1}(\{f(x)\})\subseteq f^{-1}(B^C)$ by 6 above, completing the proof.
If $x$ is on the left, there is $y\notin A:f(y)=x$ , but there cannot be $z\in A:f(z)=x$ because any such $z$ would equal $y$ and lie outside $A$ by injectivity, so $x\notin f(A)$ , as desired.
If $x$ is on the left, there is no $y\in A:f(y)=x$ . However, a $y:f(y)=x$ exists by surjectivity, so such a $y$ must be in $A^C$ , placing $x\in f(A^C)$ , as desired.
Follows from the two above.
If $f|_A(x)=f|_A(y)$ , then $f(x)=f(y)$ , but $f$ is injective, so $x=y$ , proving $f|_A$ is also injective.
By definition of image, for all $y\in\operatorname{Im}(f)$ there is $x\in X:f(x)=y$ , so $f:X\to\operatorname{Im}(f)$ is surjective.
Assume $f,g$ injective. Suppose $g(f(x))=g(f(y))$ . By injectivity of $g$ , we have $f(x)=f(y)$ , but by injectivity of $f$ , $x=y$ , proving injectivity of $g\circ f$ . Assume $f,g$ surjective. Fix $z\in Z$ . There is $y\in Y:g(y)=z$ , by surjectivity of $g$ . We then have $x\in X:f(x)=y$ , by surjectivity of $f$ . But then $g(f(x))=g(y)=z$ , proving surjectivity of $g\circ f$ . If $f,g$ are bijective, they are injective and surjective, allowing the previous arguments and making $g\circ f$ injective and surjective, and thus bijective.
$(h\circ g)\circ f(x)=(h\circ g)(f(x))=h(g(f(x)))=h(g\circ f(x))=h\circ(g\circ f)(x)$ .
If $f$ has a left inverse, suppose $f(x)=f(y)$ , then $x=g(f(x))=g(f(y))=y$ , proving injectivity. If $f$ is injective, define $g(y)$ to be the unique element $x:f(x)=y$ if there exists one, and any element of $X$ otherwise. $g$ is a left inverse of $f$ .
If $f$ is surjective, for any $y\in Y$ there is $x\in X:f(x)=y$ . Set $g(y):=x$ if $f(x)=y$ . Then you have a right inverse of $f$ . If $f$ has a right inverse, pick $y\in Y$ , and $f(g(y))=y$ , so that $f(X)=Y$ , or $f$ is surjective.
Apply item two above with $X=Y=\operatorname{Dom}f$ and you get a left inverse $g$ . For all $x\in X$ , $g(f(x))=x$ . Pick $y\in Y$ . What is $f(g(y))$ ? We know $g$ sends it to $g(y)$ , so to conclude it's $y$ , and thus that $g$ is a right inverse of $f$ and thus an inverse of $f$ , we just need to prove $g$ is injective. Suppose $g(y)=g(y')$ . We certainly have $y=f(x),y'=f(x')$ by surjectivity of $f$ . By the left inverse property, $x=g(f(x))=g(y)=g(y')=g(f(x'))=x'$ , but then $y=f(x)=f(x')=y'$ . Hence, $g$ is injective, so $g(f(g(y)))=g(y)\implies f(g(y))=y$ , and thus $g$ is the inverse of $f$ . Oh yes, the inverse, because if $g,g'$ are inverses of $f$ , by associativity of composition we have $g=g\circ(f\circ g')=(g\circ f)\circ g'=g'$ .

Definition 5: indexations and sequences. Another term used for functions is indexing or indexation. The domain is then called set of indices, and the images $f(i)$ will be denoted $x_i$ , with $\{x_i\}_{i\in\operatorname{Dom}}$ denoting both the image and the function. If the domain is the natural numbers, the function is called sequence.

This is where I stop following Soardi: unions and intersections are already defined in the last post, and I mean to save cardinality for later. So let me talk about Mathematical Induction first, then we will delve into axiomatic set theory (in the master-level spoiler), and finally we will get to cardinality and cardinal arithmetic.
Induction, in its two forms, is a great tool to prove a result one has obtained previously via manipulations or conjectures.

Principle of Induction. Suppose $P(n)$ is a proposition that can be checked for natural numbers. Assume $P(n_0)$ holds for some $n_0\in\mathbb N$ , and that whenever $P(n)$ holds, $P(n+1)$ can also be proved. Then, $P(n)$ holds for any $n\geq n_0$ .

Principle of Complete Induction. Suppose $P(n)$ is a proposition that can be checked for natural numbers. Assume $P(n_0)$ holds for some $n_0\in\mathbb N$ and that whenever $p(k)$ holds for $n_0\leq k\leq n$ , then $P(n+1)$ also holds. Then $P(n)$ holds whenever $n\geq n_0$ .

Let us see some applications of these principles. Since this is childplay for an average mathematician (and maybe not only for that kind of person), I thought I'd leave the option to just hide it and skip to more interesting stuff. Clicking the button just below will hide the applications of induction I propose in this post.

Theorem 2: sum of the first natural numbers. For any $n\in\mathbb N$ , the sum of the first $n$ natural numbers is $\frac{n(n+1)}{2}$ . In other words, $\sum_1^nk=\frac{n(n+1)}{2}$ .

Proof.
We use the principle of induction with

$n_0=1$ . For

$n=1$ , the sum is just 1, and

$\frac{1(1+1)}{2}=\frac{1\cdot2}{2}=\frac22=1$ . Suppose the property holds for

$n$ . Then:

$\sum_{k=1}^{n+1}k=\sum_{k=1}^nk+n+1=\frac{n(n+1)}{2}+n+1=\frac{n^2+n+2n+1}{2}=\frac{n^2+3n+1}{2}=\frac{(n+1)(n+2)}{2},$ which is what we want. Induction therefore gives us the theorem.

Theorem 3: Sum of the first $i2^i$ . Let $n\in\mathbb N$ . Then: $\sum_{k=1}^ni\cdot2^i=2^{n+1}(n-1)+2.$
Proof.
We use the principle of induction with

$n_0=2$ . For

$n=1$ and

$n=2$ , this can be easily verified. Suppose it holds for

$n$ . Then:

$\sum_{k=1}^{n+1}i2^i=\sum_{k=1}^ni2^i+(n+1)2^{n+1}=2^{n+1}(n-1)+2+(n+1)2^{n+1}=2^{n+1}(n-1+n+1)+2=2^{n+2}(n+1)+2,$ which is what we wanted. Induction then gives us the theorem.

If you're wondering how I got the formula for this sum, here are my manipulations.

Theorem 4: Sums of cubes. Let $n\in\mathbb N$ . Then: $\sum_{k=1}^nk^3=\left(\frac{n(n+1)}{2}\right)^{\!\!2}.$
I found this online via googling.
Proof.
Let me try with induction.

$n_0=1$ . That case is easy, just like 0. Suppose it holds for

$n$ . Then:

$\begin{align*} \sum_{k=1}^{n+1}k^3={\!\!}&\left(\frac{n(n+1)}{2}\right)^{\!\!2}+(n+1)^3=(n+1)^2\left(\frac{n^2}{4}+n+1\right)=(n+1)^2\frac{n^2+4n+4}{4}=\frac{(n+1)^2(n+2)^2}{4}={} \\ {}={}&\left(\frac{(n+1)(n+2)}{2}\right)^{\!\!2}, \end{align*}$ as we wanted.

The last application of induction we see is one of complete induction.

Fundamental Theorem of Arithmetic. Let $n\in\mathbb N$ . Then there is a way to decompose $n$ as a product of prime numbers.

For algebraists: 1) I am aware of the distinction between prime and irreducible, and that what I should write here is irreducible, but we'll get to that in another post; 2) Yes, the FTOA has a uniqueness part, but induction won't prove that, or at least not as easily as apt for this post.
Proof.
For

$n_0=2$ , the statement is obvious. We use complete induction here. Suppose any

$2\leq k\leq n$ admits a decomposition. Look at

$n+1$ . If it is prime, it is already decomposed. Otherwise, it can be written

$n+1=hk$ with

$h,k<n+1$ . If any of those is less than 2, it will be one, and the other will be

$n+1$ , contradicting the inequality

$h,k<n+1$ . Hence, by the induction step,

$h,k$ can be decomposed. The decomposition of

$n+1$ is then the concatenation of those of

$h$ and

$k$ . Complete induction then gives the theorem.

But can we prove those two forms of induction? Let's start by equating them with a property of the natural numbers.

Theorem 5: Induction and the well-ordering of the naturals. Both the above forms of induction are equivalent to the statement that $\mathbb N$ is well-ordered under the normal ordering (a statement I'll call WON, Well-Ordering of the Naturals). More specifically, WON implies induction which implies complete induction which implies WON.

Proof.
Assume WON. Suppose the hypotheses of induction are verified for a property

$P$ , with base case

$n_0$ . By contradiction, suppose induction doesn't hold. The set

$S:=\{n\in\mathbb N:P(n)\text{ is false}\wedge n\geq n_0\}$ is then nonempty, and by WON there is a minimal element

$m\in S$ . Suppose

$m>n_0$ . This implies

$P(m-1)$ holds, otherwise

$m-1\in S$ , which contradicts the minimality of

$m$ . But the assumption is that

$P(m-1)\implies P(m)$ , so

$m\notin S$ , contradiction. Thus,

$S=\varnothing$ , which is what induction would have us conclude. Induction is thus proved.
If induction holds, given that complete induction has stronger assumptions than induction, complete induction holds too.
If complete induction holds, consider

$S\subseteq\mathbb N$ a nonempty subset of the naturals. Let

$P(n)$ be "

$n\notin S$ ". If

$P(0)$ doesn't hold, we have our minimum: zero. If

$P(0)$ holds, let's prove the induction step. Assume

$k\notin S$ for

$k\leq n$ . If

$n+1\in S$ , then we have our minimum. If

$n+1\notin S$ , then we just showed that, if

$P(k)$ holds for

$k\leq n$ , then

$P(n+1)$ holds. By complete induction,

$P(n)$ holds for all

$n$ . But then

$S$ is empty, contradiction. The contradiction stemmed from eliminating every chance we had to find a minimum. This implies a minimum must exist, and WON is thus proved.

Theorem 6: Well-ordering of the naturals. The natural numbers are well-ordered under the usual relation. In other words, given $S\subseteq\mathbb N$ nonempty, there is an element $x\in S$ , called minimum of

$S$ and denoted $\min S$ , such that $x\leq s$ for all $s\in S$ .

Proof.

$S$ is nonempty, so there exists

$y\in S$ . Consider

$S_y:=\{s\in S:s\leq y\}$ . Suppose

$0\in S_y$ . Then we have a minimum. Otherwise, if

$1\in S_y$ , we have a minimum. Go on like that for at most

$y+1$ steps, and you will find a minimum. Indeed, after

$y+1$ tests, you will be testing

$y$ , and you know

$y\in S$ , so if

$y$ is not the minimum, you will have found another element of

$S$ before reaching

$y$ . This means you will have stopped before getting to

$y$ , because as soon as a number is in

$S$ , you stop because you have the minimum. So for any

$S$ , after a finite number of tests and a single choice, you have a minimum. WON is proved.

At this point, you may be wondering: can we not formulate a "principle of induction" with any well-ordered set? Oh yes we can. In fact, we don't even need a well-order.

Definition 6: Well-founded relation. A relation $R\subseteq X\times X$ is called well-founded if, for any $S\subseteq X$ nonempty, there is $\min S$ , that is an element $s\in S$ such that $s\operatorname{R}x$ for all $x\in S$ .

Theorem 7: Well-founded induction. Let $R$ be a well-founded relation on a set $X$ . Then, if a property $P(x)$ can be proven for $x_0$ , and moreover it can be proven that whenever $P(y)$ holds for all $y\neq x:x_0\operatorname{R}y\operatorname{R}x$ , then it holds for $x$ , we can conclude that $P(x)$ holds for all $x:x_0\operatorname{R}x$ .

The proof below is just a rehashing of WON implies induction.

Proof.
Suppose there is a set

$X$ with a well-founded relation

$R$ that satisfies the assumptions of well-founded induction but falsifies well-founded induction. Then,

$S:=x:x_0\operatorname{R}x\wedge\lnot P(x)\}$ is nonempty, so it must have a minimal element

$y_0$ . It cannot be

$x_0$ because the assumptions include

$P(x_0)$ . Since any

$y\neq x:x_0\operatorname{R}y\operatorname{R}y_0$ is outside

$S$ by definition of minimal element, by the induction assumption we must conclude

$P(y_0)$ holds, which contradicts the fact

$y_0\in S$ . This implies

$S=\varnothing$ , so that well-founded induction is proved.

Now there are two "relations" that we have been using all the time and haven't talked about at all: inclusion and membership (

$\subseteq$ and

$\in$ ). Are those well-founded? Wait wait. What are those defined on again? Well, the set of all sets, of course. Wait. First of all, membership is not defined there… is it? There are elements that are not sets, and sets are not elements, right? Wrong. Or not. It depends. The second things is definitely wrong: any set is a member of its power set. The first thing may be right or wrong depending on whether or not your theory of sets postulates urelements. You will probably be thinking: what about numbers? Those are bound to be urelements, right? Wrong! Feel mindblown yet? You have no idea what I'm going into now. Let Vegeta warn you about it. But don't be scared just yet: there is still one thing that should be noob-level here, and it is why "set of all sets" can lead to problems. You see, when we defined a set in the last post, we were very vague. As far as we know, any collection of things we can either enumeerate or characterize is valid as a set. So let me tell you why this is no bueno.

Russel's paradox. Let $S$ be the set of all sets that are not a member of themselves. Then $S\in S\iff S\notin S$ .

No, I'm not going crazy. No, I'm not joking or mistyping. This set evidently cannot be a valid set. Indeed, let's prove the paradox.
Proof.
Suppose

$S\in S$ . Then

$S$ doesn't contain itself as a member, by definition of

$S$ , so

$S\notin S$ . Viceversa, if

$S\notin S$ , by definition of

$S$ it has to satisfy

$S\in S$ , or it would be in

$S$ .

And yeah, that last phrase very much illustrates the level of contradictoriness this set has by definition: "it must be a member of itself, otherwise it would belong to itself".
A real-life version of the paradox is the following. Suppose a certain barber decides to shave every person in his village who doesn't shave themselves. Will he shave himself? And the answer is: he will if and only if he won't, just like in the mathematical version

$S\in S\iff S\notin S$ .
So how do we get out of this paradox? The answer is axioms. You may have heard this word in Geometry: they are truths that are assumed as self-evident and serve as a starting point for a theory. The same way points and lines are not defined in Geometry, but are assumed as primitive and given relations they satisfy by assumption, the same is done with sets.

For the following part, I will rearrange material from the Wikipedia references in the bibliography, adding in the proofs Wikipedia doesn't include (mine where I can, properly credited otherwise). So, which axioms do we take? First of all I will need a couple definitions.

Definition 6: Formulas defining functions and images via functions. If $\phi(w_1,\dotsc,w_n,x,y)$ is a formula, we say it defines a formula over a set

$A$ if $\forall w_1,\dotsc,w_n,\,x\in A\implies\exists!y:\phi(w_1,\dotsc,w_n,x,y)$ . In that case, we will denote $y$ as $F_\phi(w_1,\dotsc,w_n,x)$ . If $\phi$ defines a function over $A$ , the

$\phi$ -image of

$A$ is the "collection" $\{F_\phi(w_1,\dotsc,w_n,x):x\in A\}=\{y:\exists w_1,\dotsc,w_n,\exists x\in A:\phi(w_1,\dotsc,w_,x,y)\}$ .

Zermelo-Fraenkel axioms.

Axiom of extensionality. If two sets have exactly the same items, they are the same set. In symbols: $(a\in x\iff a\in y)\implies x=y$ .
Axiom of (epsilon-)induction. If $P$ is a property, and it can be proved that $P(y)$ being true for all $y\in x$ implies $P(x)$ is true, then $P$ holds for any set. In symbols $[(y\in x\implies P(y))\implies P(x)]\implies\forall x P(x)$ .
Axiom of the empty set. There exists a set $\varnothing$ such that it contains no element. In symbols $\exists\varnothing:\nexists y:y\in\varnothing$ .
Axiom of union. For any set of sets $F$ there is a set $A$ containing every member of any member of $F$ . In symbols $\forall F\exists A:Y\in F\wedge x\in Y\implies x\in A$ .
Axiom schema of replacement. For any formula $\phi$ defining a function over a set $A$ , the $\phi$ -image of $A$ falls inside a set. In symbols $\forall\phi\forall A[(\forall w_1,\dotsc,w_n,\forall x\in A\exists!y:\phi(w_1,\dotsc,w_n,x,y))\implies\exists B:(x\in A\wedge\exists w_1,\dotsc,w_n:\phi(w_1,\dotsc,w_n,x,y))\implies y\in B]$ .
Axiom of infinity. Define $S(w):=w\cup\{w\}$ . The union is valid by the above axiom, and $\{w\}$ is valid as we shall see below. Then there exists a set $X$ such that $\varnothing\in X$ and, for all $x\in X$ , $S(x)\in X$ . In symbols $\exists X:\varnothing\in X\wedge(x\in X\implies S(x)\in X)$ .
Axiom of power set. Define $x\subseteq y\iff(a\in x\implies a\in y)$ . Then for any set $x$ there exists a set $y$ such that any $t\subseteq x$ is a member of $y$ . In symbols, $\forall x\exists y:t\subseteq x\implies t\in y$ .

Comments and consequences.

Axiom 1 can even be viewed as a definition of set equality. The name "axiom of extensionality" is because it defines equality in an "extensional" way, that is, via properties of the things being compared. For example, $f(x):=2x+10$ and $g(x):=2(x+5)$ are two extensionally equal functions, because they output the same number for any input, but are defined differently and are thus intensionally different. Cfr. Wikipedia on this.
Since $x=y$ implies, by logic, that any property is true of $x$ if and only if it's true for $y$ , the implication $\impliedby$ in Axiom 1 is obvious.
The axiom of extensionality clearly implies the uniqueness of the empty set, because if $\emptyset,\varnothing$ are two sets satisfying the property that they have no members, then no member can be in one and not in the other, giving $\varnothing=\emptyset$ by axiom 1.
Given the other axioms above, the axiom of the empty set is equivalent to the axiom schema of specification, which states that, given a property $P$ and a set $x$ , the subset $y\in x:P(y)$ is a valid set. In symbols, $\forall P\forall x,\exists y:(z\in y\iff(z\in x\wedge P(z)))$ .
The axiom of epsilon-induction is equivalent, given the other axioms above, to the axiom of regularity (aka axiom of foundation), which states that, for any nonempty set $x\neq\varnothing$ , there exists $y\in x:x\cap y=\varnothing$ . Obviously, if the axiom of the empty set is not assumed, it may be opportune to reformulate that as $\forall x,\,\exists y\in x\implies\exists y\in x:\not\exists t\in y\cap x$ , to avoid using the empty set when it is not assumed to exist.
The axioms of union and of power set do not per se give us the validity of the union and of the power set: they just give a set in which those two, if they are valid sets, are contained. Specification can then extract the union and the power set out of those supersets.
Applying the axiom of power set to the empty set, which exists by the axiom of the empty set, we get a set $x$ with at least one element $y$ . The axiom schema of specification, which as we noted is equivalent to the axiom of the empty set, yields that $\{z\in x:z=y\}$ is a valid set, and that is of course simply $\{y\}$ . Applying the axiom of power set again to $\{y\}$ , we get a set with at least two elements, given $\{y\}$ has two distinct subsets, $\varnothing,\{y\}$ .
Let $x$ now be a set with at least two distinct elements $z\neq y$ , as is guaranteed to exist by the previous comment. Given two sets $a,b$ , consider $\phi(c,d):=(c=z\wedge d=a)\vee(c\neq z\wedge d=b)$ . This is a valid formula with the properties that $\phi(z,w)$ only holds if $w=a$ and $\phi(t,w)$ with $t\neq z$ only holds if $w=b$ . Thus, it defines a function over $x$ (and any superset of its), and by the axiom schema of replacement we get a set $e$ such that, for any $c\in x$ , the $d:\phi(c,d)$ falls within it. But those $d$ s can only be $a,b$ . We thus have that, for any $a,b$ , there is a set containing both as members. This is called the axiom of pairing.
Let $x$ be a set as above, containing $a,b$ . The axiom schema of specification then gives us $\{a,b\}=\{y\in x:(y=a)\vee(y=b)\}$ is a valid set. Suppose now $a=b$ . This yields that, for any set $a$ , $\{a\}$ is a valid set, which was of course already true from item 7 above.
Naturally, with arguments similar to the above, given any finite number of sets $x_1,\dotsc,x_n$ , the set $\{x_1,\dotsc,x_n\}$ will be valid.
Given the axiom of regularity (deduced from the axiom of induction as we will do below) and the comment two items above, we can prove that no set is an element of itself, or in symbols $\forall x,x\notin x$ .
We can also prove no infinite strictly descending sequence of sets exists, by using regularity and replacement.
Regularity, together with pairing from above, can be used to prove that, for any two sets $x,y$ , it cannot hold that $x\in y$ and $y\in x$ . That is, there are no two sets such that either is an element of the other.

Let's go over the proof of any claim above.
Proof.

Assuming specification, take any set $x$ , and consider $\{y:y\in x\wedge y\notin x\}$ . Since the two statements are contradictory, the given set has no elements, thus proving the axiom of the empty set.
Viceversa, assume the axiom of the empty set. We want to prove that, for any $\phi$ , $\{y\in x:\phi(y)\}$ is valid. If $\nexists y\in x:\phi(y)$ , then $\{y\in x:\phi(y)\}$ is just the empty set we are assuming to exist. If at least one element of $x$ satisfies $\phi$ , fix $y_0\in x:P(y_0)$ , and consider $\psi(a,b):=(a=b\wedge\phi(a))\vee(\lnot\phi(a)\wedge b=y_0)$ . Pick an element $t\in x$ . If $P(t)$ holds, then to satisfy $\psi(t,?)$ we must feed in $?=t$ . Otherwise, we must feed in $?=y_0$ . Hence, for any element $t\in x$ , there is one and one only element $t'$ that satisfies $\psi(t,t')$ (and it happens to be in $x$ , but that is irrelevant). Thus, $\psi$ defines a function over $x$ , and by replacement the $\psi$ -image of $x$ is a set. But that image is evidently $\{y\in x:\phi(y)\}$ , so that specification is proved.
First of all, out of curiosity: can we exclude $x\in x$ without resorting to regularity? Our statement is that $\forall x P(x)$ , where $P(x):=x\notin x$ . Let's try using $\in$ -induction on this. Suppose then that $P(y)$ holds for all $y\in x$ , but $P(x)$ doesn't hold. This means $x\in x$ , but then we can set $y=x$ and find that $P(y)$ must hold since $y\in x$ , yet it can't hold since $y=x\in x=y$ . Thus, $P(x)$ must hold, and $\in$ -induction concludes our proof. The following proof that $\in$ -induction implies regularity is expanded on from this answer. Consider $P(x):=x\in u\implies(\exists y\in u:y\cap u=\varnothing)$ . Suppose $y\in x\implies P(y)$ . Can we prove $P(x)$ ? Unless $x\cap u=\varnothing$ , which implies $P(x)$ by itself, there must be $y\in x\cap u$ , and $P(y)$ holds, so that there is indeed $z\in u:z\cap u=\varnothing$ . So $P(x)$ holds for all sets by $\in$ -iduction. Now note that the choice of $u$ was arbitrary in $P$ . So for all $u$ we have a $P$ that holds for all sets. This mean that, as soon as $u\neq\varnothing$ , we have an item $x\in u$ for which $P(x)$ holds, so there is $y\in u:y\cap u=\varnothing$ . Since $\varnothing$ is excluded from the scope of regularity, we are done.
Conversely, assume regularity and suppose epsilon-induction is violated, that is, there are a property $P$ and a set $s$ such that $P(y)$ holds whenever $y\in s$ , but $P(s)$ doesn't hold. The rest of the proof is copied from this answer. By the axiom of union, we can define $s_1:=\bigcup s,s_2:=\bigcup s_1,\dotsc$ . Given that $\mathbb N$ is a valid set (and we can construct it from the empty set as will be shown below), let $\phi(a,b):=\exists i\in\mathbb N:a=i\wedge b=s_i$ . This defines a function on $\mathbb N$ , and the $\phi$ -image of $\mathbb N$ is $\{s,s_1,s_2,\dotsc,s_n,\dotsc\}$ . Consider the union of that, and call it $\operatorname{Tr}s$ . This has a name, it's the "transitive closure" of $s$ , but let's not dive into that. Specification is available (and was implicitly used to extract the unions above), so $A:=\{x\in\operatorname{Tr}s:\not P(y)\}$ is a valid set. $s\in\operatorname{Tr}s$ , and $s$ satisfies the condition by assumption, so $s\in A$ . By regularity, there is $q\in A:q\cap A=\varnothing$ . Given the assumptions of epsilon-induction work for $P$ , there must be $p\in q:\lnot P(p)$ , or $P(q)$ would hold and $q\notin A$ . But $p\in q$ and $q\in\operatorname{Tr}s$ implies $p\in\operatorname{Tr}s$ , and then $\lnot P(p)$ yields $p\in A$ , except then $p\in q\cap A=\varnothing$ , which is a contradiction.
Suppose there exists $x:x\in x$ . Consider $\{x\}$ , which is valid by item 7. By regularity, this must be disjoint from one of its members, but $x$ is its only member, so it must be disjoint from $x$ . Except $x\in x\cap\{x\}$ , contradiction.
Suppose there is a formula $f$ which defines a function on $\mathbb N$ and is such that the only $x:f(k,x)$ holds has the only $y:f(k+1,y)$ holds as its element. Don't worry, we will construct $\mathbb N$ as a valid set later on. By replacement and specification, $F:=\{y:\exists k\in\mathbb N:f(k,y)\}$ is a valid set. Hence, by regularity, it must be disjoint from some $B\in F$ . Such a $B$ must satisfy $f(k,B)$ for some $k\in\mathbb N$ . Then again, if $x$ satisfies $f(k+1,x)$ , $x\in B$ , and of course $x\in F$ , contradicting that $B\cap F=\varnothing$ . Hence, an $f$ as taken here cannot exist.
Given two sets $x,y$ , consider $\{x,y\}$ , which is a valid set by pairing and specification. Then, it must be disjoint from one of its members, either $x$ or $y$ . In the former case, $y\notin x$ , or else $x\cap\{x,y\}=\{y\}\neq\varnothing$ . In the latter case, by similar arguments, $x\notin y$ .

So we did not prove that super-general induction statement, we took it as an axiom. How does all of this solve Russel's paradox? Well, consider again our "set of non-self-containing sets"

$R$ . Given no set contains itself as a member, this set would be the set of all sets. But, that is not a valid set, as it would contain itself, and this is not allowed (see third-to-last bullet above). So we restricted the class of valid sets in order to avoid the paradox.
But what about the naturals? Is

$\mathbb N$ a valid set? Well, let us construct it from… nothing. Yep, you read that right: we will construct the set of natural numbers from nothing, i.e. from the empty set, which we have assumed to exist as an axiom.
Let us take the empty set as our zero:

$0=\varnothing$ . Makes sense, right? 0 elements, call it zero. What about 1? We introduced, in the axiom of infinity, the "successor function"

$S(w)=w\cup\{w\}$ . Define

$1:=S(0)=\{\varnothing\}$ . 1 element, so it's a sensible one. Define

$2:=S(1)=\{\varnothing,\{\varnothing\}\}$ . And so on, and so on. Any number

$n$ is the successor of

$n-1$ , so any number can be identified with a set in this fashion. But is the set of all naturals a valid set? First of all, invoke the axiom of infinity: a valid set

$X$ exists that contains this wannabe set

$\mathbb N$ . Finally, we invoke specification to produce

$\mathbb N$ proper. What property do we use to identify

$\mathbb N$ ? Well, every number

$n\neq0$ is the successor of another

$n-1\leq n$ , and for any

$k\leq n$ there is

$k-1\leq n:k=S(k-1)$ . How do we define

$\leq$ , though? That is actually easy: the construction above tells us that

$k<n$ is rendered by

$k\in n$ . In particular, any

$n$ is the union of all numbers strictly less than

$n$ . So here is our set of naturals:

$\mathbb N:=\{x\in X:(x=\varnothing\vee\exists y\in x:x=S(y))\wedge(\forall y\in x\exists z\in x:y=S(z))\}.$ Such a lot of abstraction and apparent Gibberish, and nothing has produced the naturals! That is some fine Mathemagic here, don't you think? By the way, the axiom of induction now automatically implies the principle of complete induction, and the well-ordering of the naturals as a consequence.
Can we define the addition of the naturals? Absolutely. A union will be involved, but just defining

$n+m:=n\cup m$ won't do, because

$n\subseteq m$ if

$n\leq m$ . So we will need to make the two disjoint somehow. This is done by cartesian products, so let's define those. All we need is the product of two sets, so that we can define

$n+m$ by starting from

$(n\times1)\cup(m\times\{1\}$ . We need to define an ordered pair. Many definitions exist, but I will take the Kuratowski route:

$(a,b):=\{\{a\},\{a,b\}\}.$ This makes the pair a valid set, since

$\{a\},\{a,b\}$ are by pairing and then so is our pair by pairing (pun accidental). We can even define how to extract the elements of the pair from the pair:

$\begin{align*} \pi_1((a,b)):={}&\bigcup\bigcup\{x\in(a,b):\exists y\in(a,b):x\subsetneq y\}; \\ \pi_2((a,b)):={}&\bigcup\left\{z\in\bigcup\{x\in(a,b):\exists y\in(a,b):y\subsetneq x\}:(\exists c,d\in(a,b):c\neq d\implies z\neq\pi_1((a,b)))\right\}. \end{align*}$

$\pi_1$ 's definition works as expected because

$\{x\in(a,b):\exists y\in(a,b):x\subsetneq y\}=\{\{a\}\}$ , this is a valid set because it is produced from the valid set

$(a,b)$ via specification, and

$\cup(\cup\{\{a\}\})=\cup\{a\}=a$ .

$\pi_2$ works as expected because the valid-by-specification set

$\{x\in(a,b):\exists y\in(a,b):y\subsetneq x\}=\{\{a,b\}\}$ , and its union is

$\{a,b\}$ , and the rest is to extract just

$b$ out of that, and it works because if the pair has two distinct elements, aka

$\{a,b\}\neq\{a\}$ , aka

$a\neq b$ , then I'm requiring

$\pi_1((a,b))$ to be out of the set whose union is

$\pi_2((a,b))$ , meaning

$\pi_2((a,b))$ ends up being just

$\cup\{b\}=b$ , whereas the

$\implies$ is vacuous if there are no two distinct elements in the pair, so again

$\pi_2((a,b))=\cup\{a,b\}=\cup\{b\}=\cup\{a\}$ .

Lemma 1: Expected property of ordered pairs. Let $a,b,c,d\in X$ . Then $(a,b)=(c,d)\iff a=c\wedge b=d$ .

Proof.
If

$a=c\wedge b=d$ , then

$\{a\}=\{c\}$ and

$\{a,b\}=\{c,d\}$ by extensionality, and by extensionality again

$(a,b)=\{\{a\},\{a,b\}\}=\{\{c\},\{c,d\}\}=(c,d)$ .
Suppose

$(a,b)=(c,d)$ and

$a=b$ . Then

$\{\{c\},\{c,d\}\}=\{\{a\},\{a,a\}\}=\{\{a\}\}$ . Then

$\{c\}=\{c,d\}=\{a\}$ , so that

$c=a=d$ . By hypothesis

$b=a$ so that

$b=d$ .
Suppose

$(a,b)=(c,d)$ and

$a\neq b$ . Then either

$\{a\}=\{c\}\wedge\{a,b\}=\{c,d\}$ , or

$\{a\}=\{c,d\}\wedge\{a,b\}=\{c\}$ . The latter would imply

$a=c=b$ , but

$a\neq b$ , contradiction. The former is then the case, and so we have

$a=c$ . By

$\{a,b\}=\{c,d\}$ we conclude either

$a=c\wedge b=d$ or

$b=c\wedge a=d$ . We already have

$a=c$ , so

$a=d$ would imply

$c=d$ , but then from

$b=c$ we would conclude

$a=b$ , contradiction. So the proof is complete.

Is the set of all ordered pairs of an element in

$X$ and one in

$Y$ a valid set? Let's see. We know

$2^{X\cup Y}$ is valid because of the axioms of union and power set. We can then define:

$X\times Y:=\{x\in 2^{X\cup Y}:\exists a\in X,b\in Y:x=\{\{a\},\{a,b\}\}\},$ which is just specification applied to

$2^{X\cup Y}$ with the formula

$\phi(x):=\exists a\in X,b\in Y:x=\{\{a\},\{a,b\}\}$ , which is valid because

$\{\{a\},\{a,b\}\}$ is valid by pairing and whatnot (see comments on definition of ordered pair).
We then need to define functions between sets to then say that

$n+m$ is the number that is in a bijection with

$(n\times1)+(m\times\{1\})$ , and prove that such a number exists and is unique, but we already have all the definitions and terminology for that: a relation between

$X,Y$ is

$R\subseteq X\times Y$ , and the right side is now a valid set, and any subset of a valid set is again valid by power set, and the definition of function from relation is a valid formula, so we have functions and all the terminology of the beginning of this post. Is the set of functions from

$X$ to

$Y$ valid? Yes: we have the power set by the homonymous axiom, and specification allows us to isolate those relations that are functions.

And this is where noobs and meds join back in. So the spoiler can be summed up as "Let there be nothing! Let there be possibilities of shenanigans with sets! Oh hello natural numbers!". What follows should be noob-level, if a little more complicated than the average noob-level stuff. While we did those things, we defined the natural numbers in such a way that

$0=\varnothing$ and

$n=\{0,\dotsc,n-1\}$ for any

$n$ . Keep that in mind. We are also trying to define addition by

$n+m$ being the

$k$ such that there is a bijection

$f:k\to(n\times1)\cup(m\times\{1\})$ , where 1 means

$\{\varnothing\}$ and

$\{1\}=\{\{\varnothing\}\}$ . The notation

$S(x)$ means

$x\cup\{x\}$ , which is just

$x+1$ when

$x$ is a natural.
Two loose ends are left here (that is, in what I was doing just before the spoiler ended): that for all

$m,n$ there is

$k$ in bijection with

$(n\times1)\cup(m\times\{1\})$ , and that it is unique.
To prove uniqueness, note that, if

$k\neq h$ , we can assume modulo renaming that

$h\in k$ , and then via a sufficient number of applications of

$S$ we get

$k=S(S(\dots(S(h))))$ , meaning

$h\subseteq k$ and

$h\in k$ but

$h\notin h$ as seen above. OK, that is an interesting property, but we don't need that. Let me just leave it there because why not :).

Lemma 2: Naturals and bijections. Let $h,k\in\mathbb N$ . Then there exists a bijection $f:k_1\to k_2$ if and only if $k_1=k_2$ .

Proof.
If

$h=k$ , pick the identity as a bijection.
Suppose, by way of contradiction, that there are

$h\neq k$ in bijection. Define

$P(k):=\exists h\neq k:h,k\text{ are in bijection}$ . This is a valid property, so the set

$\{k\in\mathbb N:P(k)\}$ is valid, it is nonempty by assumption, and so, by the well-ordering we proved above, it must have a minimal element

$k_0$ . If

$k_0$ were empty, it could not be in bijection with anything other than itself, but by assumption this is not the case, so

$k_0\neq\varnothing$ . Thus, we have

$k_0':k_0=S(k_0')$ , and there exists

$\tilde k\neq k_0$ such that

$\tilde k$ and

$k_0$ are in bijection, and of course

$\tilde k=S(k')$ . Restrict the bijection

$f:k_0\to\tilde k$ to one

$f|_{k_0'}:k_0'\to\tilde k\smallsetminus\{f(k_0')\}$ , which is valid because

$\tilde k\smallsetminus\{f(k_0')\}=\{x\in\tilde k:x\neq f(k_0')\}$ . Define now

$g:\tilde k\smallsetminus\{f(k_0')\}\to k'-1$ by setting

$g(x)=x$ if

$x\in f(k_0-1)$ and

$g(x)=S^{-1}(x)$ otherwise. Now,

$g\circ f:k_0'\to k'$ is a bijection, which violates the minimality of

$k_0$ , given that

$k_0'=k'$ would imply

$k_0=S(k_0')=S(k')=\tilde k$ which we assumed not to be true. So, we have a contradiction, which gives us that, whenever

$k,h$ are in bijection, they must be equal.
Simplified sum-up: Suppose there are two distinct "numbers" in bijection with each other. If one is 0, we clearly have a contradiction. Otherwise, we can construct a bijection between their predecessors (that is, those numbers minus 1), which means that, if there is no bijection between

$k-1$ and something different than it,

$k$ satisfies the same property. By induction, we conclude our claim holds true.

Lemma 3: Existence of $k$ in bijection with $(n\times1)\cup(m\times\{1\})$ . Suppose $n,m\in\mathbb N$ . Then there is $k=:m+n$ such that $k$ is in bijection with $(n\times1)\cup(m\times\{1\})$ .

Proof.
Define

$f:n\times1\to n$ by

$f((k,1)):=k$ . Define

$f((m,\{1\})):=S(S(S(\dots(S(m)))))$ , the successor being taken

$n$ times (that is, the number of times needed to make

$f((0,\{1\}))=n$ ). This is clearly an injective function

$f:(n\times1)\cup(m\times\{1\})\to\mathbb N$ , and its image is, by construction,

$k\in\mathbb N$ .

Let us now define the cartesian product of a family.

Definition 7: cartesian products of families. Let $\mathcal F$ be a family of sets. We define: $\prod\mathcal F:=\left\{f:\mathcal F\to\bigcup\mathcal F:x\in\mathcal F\implies f(x)\in x\right\},$ and call $\prod\mathcal F$ the (cartesian) product of the family

$\mathcal F$ .

And now the question arises: is this empty? Clearly if

$\varnothing\in\mathcal F$ it is, because

$f(\varnothing)$ would need to be in

$\varnothing$ , but that is impossible. But what if

$\varnothing\notin\mathcal F$ ? If you read last post, you will no doubt be thinking "isn't an element of the product thus defined simply a choice function for

$\mathcal F$ "? And you are totally right. So we finally get to introduce the axiom of choice into this post.

Axiom of Choice - revisited. Let $\mathcal F$ be a family of sets such that $\varnothing\notin\mathcal F$ . Then $\prod\mathcal F\neq\varnothing$ .

Clearly if

$\mathcal F$ is in bijection with some

$n$ , aka is finite, we can construct an element of the product by choosing a finite number of elements. The axiom of choice is required for generality. Of course, we can have a weaker form of choice for "countable" families.

Definition 8: Countably infinite set. Let $x$ be a set. We say $x$ is countably infinite if it is in bijection with $\mathbb N$ .

It is easy to see any subset

$\mathbb N_k:=\{x\in\mathbb N:k\in x\}$ (naturals no smaller than

$k$ ) is countably infinite: just shift up

$k$ times and you have your bijection. Obviously all sets of numbers with fixed remainder

$r$ when divided by

$k$ are countably infinite by

$f_{k,r}(x):=kx+r$ . Can we prove that any subset of

$\mathbb N$ is countably infinite (or finite)? As far as I know, we need the following extra axiom, and the Schröder-Bernstein theorem.

Axiom of Countable Choice. Let $\mathcal F$ be a countably infinite family of sets. Then $\prod\mathcal F\neq\varnothing$ .

In the terminology of last post, all countable families of sets admit at least a choice function.

Theorem 8: Schröder-Bernstein theorem. Let $X,Y$ be sets such that there exist $f:X\to Y$ and $g:Y\to X$ both injective. Then there exists $h:X\to Y$ bijective.

Proof.
I simply copy the Wikipedia proof attributed to Julius König.
For any

$a\in X$ or

$b\in Y$ , we can produce a unique double-sided sequence:

$\dots,f^{-1}(g^{-1}(a)),g^{-1}(a),a,f(a),g(f(a)),\dots$ Such a sequence has three options:

Terminate on the left with an element of $X\smallsetminus g(Y)$ , in which case we call it an $X$ -stopper.
Terminate on the left with an element of $Y\smallsetminus f(X)$ , in which case we call it a $Y$ -stopper.
Repeat itself, in which case we call it cyclic.
Not terminate on the left and not repeat itself, in which case we call it doubly infinite.

By injectivity of

$f,g$ , for any

$a\in x$ or

$b\in y$ there is a unique such sequence to which

$a$ or

$b$ belongs. The sequences then form a partition of the disjoint union of

$X,Y$ , or of

$(X\times1)\cup(Y\times\{1\})$ if we want. To produce our bijection

$h$ , we just have to define it on the elements of

$X$ and

$Y$ of each sequence. For an

$X$ -stopper,

$f$ defines a bijection between elements of

$X$ and elements of

$Y$ , so if

$a\in X$ belongs to an

$X$ -stopper, we define

$h(a):=f(a)$ . With analogous arguments, for

$a\in X$ belonging to a

$Y$ -stopper, we can define

$h(a):=g^{-1}(a)$ , and for

$a$ in a doubly infinite or cyclic sequence we can use either

$h(a):=f(a)$ or

$h(a):=g^{-1}(a)$ , provided we make a single choice for each such sequence.

Theorem 9: Subsets of the naturals. Let $A\subseteq\mathbb N$ . Then $A$ is either finite or countably infinite.

Proof
If

$A$ is finite, we are done. If

$A$ is infinite, clearly

$\operatorname{id}_{\mathbb N}|_A:A\to\mathbb N$ is an injection, so by Schröder-Bernstein we simply need to construct an injection

$f:\mathbb N\to A$ to conclude

$A$ is countably infinite. Since

$A$ is not finite, it is not empty, so we can pick

$f(1)\in A$ . Since

$A$ is not finite, we can pick

$f(2)\in A\smallsetminus\{f(1)\}$ , for if this were empty,

$A=\{f(1)\}$ would be finite, which it isn't by assumption. Rinse and repeat, and for any

$n$ you have

$f(n)$ . Here is your injection.

As was pointed out to me here, I am trying to simultaneously choose all of the countably infinitely many

$f(i)$ , which is not allowed by ZF with no choice. So the above proof doesn't work. However, it is a nice start. What we take from it, is that, for any

$n$ ,

$A(n):=\{x\subseteq A:\exists b:A\to n\text{ bijection}\}$ is nonempty. By countable choice, we pick a choice function for

$\{A(2^n):n\in\mathbb N\}$ which is a valid set by replacement since

$A(n)\in 2^A$ for all

$n$ and

$2^A$ is valid by power set (and specification). We have then isolated

$B(n)\subseteq A$ in bijection with

$2^n$ for any

$n$ . These are not necessarily disjoint, so picking a choice function for

$\{B(n):n\in\mathbb N\}$ , which is a valid set, may not produce an injection. Set:

$C(0):=B(0)\qquad\qquad\qquad C(k):=B(k)\smallsetminus\bigcup\{B(j):j<k\}.$ Clearly

$C(k)\neq\varnothing$ for all

$k$ , because

$\bigcup\{B(j):j<k\}$ has at most

$1+2+\dotso+2^{k-1}$ elements, that is

$2^k-1$ elements. Thus, our injection will be a choice function for the

$C(k)$ .
If you're wondering why I picked

$\{A(2^n)\}$ , it's because with

$\{A(n)\}$ we cannot guarantee the

$C(k)$ are nonempty since

$\{B(j):j<k\}$ could exhaust

$B(k)$ , in principle.

There is a more straightforward way of proving the above, but it requires a stronger form of choice. And noobs can skip ahead of this, which is going to be med-level.

Definition 9: Entire binary relation. A binary relation $R\subseteq X\times X$ on a set $X$ is called entire (or serial or total) if for all $x\in X$ there is at least one element $y\in X:x\operatorname{R}y$ . In other words, if every element is related to at least one element.

Clearly, a reflexive relation is serial: just take

$y=x$ . The converse isn't true:

$<$ is serial on

$\mathbb N$ (take y:=S(x)=x+1), but it is not reflexive. However, a serial relation that is symmetric and transitive can, according to Wikipedia, be shown to be reflexive.

Axiom of Dependent Choice. Given a set $X$ with a relation $\preceq$ , if $\preceq$ is serial, there exists a sequence $\{x_i\}_{i\in\mathbb N}$ such that $x_i\preceq x_{i+1}$ for all $i\in\mathbb N$ .

Lemma 4: The various forms of choice. The full Axiom of Choice implies the Axiom of Dependent Choice which in turn implies the Axiom of Countable Choice.

Proof.
Assume Choice. Let

$X$ be a set and

$R$ an entire relation on

$X$ . For all

$x\in X$ , we know

$R_x:=\{t\in X:x\operatorname{R}y\}$ is nonempty. Therefore, by Choice, there is a choice function for

$\{R_x:x\in X\}$ . We now have

$f:X\to X$ such that

$x\operatorname{R}f(x)$ for all

$x$ : just map

$x$ to

$R_x$ first, and then use the choice function. With this, we consider

$x,f(x),f(f(x)),\dots$ . This is our sequence. Dependent Choice is them proved.
Assume Dependent Choice, and suppose we have that

$A(n)$ is a set for all

$n\in\mathbb N$ . Consider the set of all finite choice functions

$f:n\to\cup\{A(0),\dotsc,A(n-1)\}$ . This is valid because the functions

$n\to\bigcup\{A(n)\}$ are valid (note that

$\phi(x,y):=x=n\wedge y=A(n)$ is valid and the

$\phi$ -image of

$\mathbb N$ is

$\{A(n):n\in\mathbb N\}$ which makes

$\bigcup\{A(n)\}$ a valid set by union and specification), so choice functions defined on

$n$ are a valid set by specification, and by union we get all finite chioce functions fall into a set, so by specification the set of all finite choice functions is valid. Define the relation

$f\preceq g$ if

$f$ is defined on

$m$ ,

$g$ is defined on

$n\geq m$ and

$g|_m=f$ . This relation is clearly serial because, given

$f$ defined on

$n$ , we can always define

$g$ on

$n+1$ so that

$g|_n=f$ . Hence, we have a sequence

$f_0\preceq f_1\preceq\dotso$ . Now define

$f(i):=f_{i+1}(i)$ . This is a choice function for

$\{A(n)\}$ . Countable Choice is thus proved.

I introduced Dependent Choice as a simpler way to prove Theorem 9. I will do much more with it, though the much more can be done with Countable Choice too.

Theorem 10: Every infinite set is at least countable. For any $X$ that is not finite, there exists an injection $f:\mathbb N\to X$ .

Noobs, you can try cracking Dependent Choice and the Lemma above, if you want, and then you should be able to understand the following proof. Of course, you can simply work this out in a way similar to what I did in Theorem 9.

Proof.
We know we can build injections from

$n$ to

$X$ for any

$n$ by the first argument in Theorem 9. Consider the set of all such injections, which is valid by arguments similar to those used for finite choice functions in Lemma 4. Define on it the relation

$f\preceq g$ if

$g|_{\operatorname{Dom}f}=f$ . By Dependent Choice we have a sequence of such functions, and the conclusion is obtained like in Lemma 4.

Corollary 1: Infinity and bijections. Let $X$ be an infinite set. Then:

There is a countable subset $A\subseteq X$ such that $X\smallsetminus A$ is infinite;
For any countable $A\subseteq X$ , $X$ is in bijection with $X\smallsetminus A$ ; in particular, every infinite set is in bijection with a proper subset of its;
A set $X$ is infinite if and only if it is in bijection with a proper subset of its.

This proof is from Soardi's book (see Bibliography), Lemma 2.8.1 and Corollary 2.8.2.
Proof.
We know there is an infinite (countable) subset

$A\subseteq X$ . If

$X\smallsetminus A$ is not infinite, fix a bijection

$f:\mathbb N\to A$ and consider

$B:=f(2\mathbb N)$ , the image of even numbers.

$X\smallsetminus B$ contains

$A\smallsetminus B$ , which is in bijection with the odd numbers, which are countable. Hence, 1 is proved.
Take

$A$ as in 1. By Theorem 10, there is

$B\subseteq X\smallsetminus A$ which is countable. Fix a bijection

$f:\mathbb N\to A$ and one

$g:\mathbb N\to B$ . Define:

$h(x):=\left\{\begin{array}{cc} x & x\in X\smallsetminus(A\cup B) \\ g(\frac{n}{2}) & x=g(n)\text{ with $n$ even} \\ f(\frac{n-1}{2}) & x=g(n)\text{ with $n$ odd} \end{array}\right..$ This is a bijection

$h:X\smallsetminus A\to X$ , as desired for 2.
If

$X$ is infinite, there is a proper subset

$A\subset X$ in bijection with

$X$ by 2. If

$X$ is finite, it is in bijection with

$n$ , and any subset

$A\subseteq X$ is in bijection with

$m\leq n$ (run over the numbers from 0 to

$n-1$ and, whenever

$A$ skips one, consider the function that shifts all bigger numbers down by 1, compose all the functions you considered, and the image of

$A$ via that composition will have no gaps, and the composition will be injective over

$A$ ), so that by Lemma 2 a bijection of

$X$ with a subset of

$X$ would imply the subset is the whole

$X$ , since it implies a bijection between

$n$ and the

$m$ that's in bijection with

$A$ , and then those must be equal. So the subset can't be proper, and the corollary is thus proved.

It is now time to define cardinals. And also, to define finite and infinite sets, because I don't think I've done that just yet.

Definition 10: Equal cardinality and cardinals. If $X,Y$ are sets, we say they are equinumerous or equipotent or equipollent, or that they have the same cardinality, if there exists a bijection $f:X\to Y$ . A cardinal is a class of sets with the same cardinality. We write $|X|=|Y|$ for same cardinality. We say $|X|\leq|Y|$ if there exists an injection $f:X\to Y$ . We say $|X|<|Y|$ if $|X|\leq |Y|$ and $|X|\neq|Y|$ . We say two cardinals $\kappa,\kappa'$ satisfy $\kappa\leq\kappa'$ if for all $X\in\kappa,Y\in\kappa'$ we have $|X|\leq|Y|$ . The same trick extends $<$ and $=$ to cardinals. Naturally, this is the same as having the relation for some $X\in\kappa$ and some $Y\in\kappa'$ , since then it holds for all such $X,Y$ by using the bijections between elements of the same cardinal.

Naturally, we will call the cardinals of natural numbers the same as the natural numbers. So

$n$ is both the natural number and the matching cardinal. We have proved in Lemma 2 that each natural number has its own cardinal.

Definition 11: Finite cardinals, the cardinal of countability, the cardinal of continuum. The cardinals associated to natural numbers are called finite cardinals, and sets belonging to finite cardinals are called finite. The cardinal of $\mathbb N$ is called cardinal of countability and denoted $\aleph_0$ (aleph-zero or aleph-naught or aleph-null), and sets in $\aleph_0$ are said to be countable (Definition 7) or to have cardinality

$\aleph_0$ . The cardinal of $2^{\mathbb N}$ is called cardinal of continuum, and sets belonging to it have the cardinality of continuum.

Corollary 2: Subsets of countable sets are finite or countable. If $|X|=\aleph_0$ and $Y\subseteq X$ , then either $Y$ is finite or $|Y|=\aleph_0$ .

Proof.
Pick a bijection

$f:X\to\mathbb N$ , which exists by

$|X|=\aleph_0$ , and set

$A:=f(Y)$ . Then,

$A\subseteq\mathbb N$ . By Theorem 9, either

$A$ is finite, or it is countable.

$f|_Y:Y\to A$ is a bijection, so

$|Y|=|A|$ , completing the proof.

Corollary 3: $\aleph_0$ and order. $\aleph_0$ is the smallest infinite cardinal. In other words, if a cardinal $\kappa$ satisfies $\kappa\geq n$ for all $n$ finite cardinal, then $\kappa\geq\aleph_0$ .

Proof.
Given

$\kappa\geq n$ for all

$n$ , it cannot be finite, as if

$\kappa=n$ , we would need

$n=\kappa\geq n+1$ which is not the case. Theorem 10 then tells us any

$X\in\kappa$ has a countable subset

$A$ . Pick a bijection

$f:\mathbb N\to A$ . This is an injection

$\mathbb N\to X$ , proving

$|X|\geq|\mathbb N|$ , and therefore

$\kappa\geq\aleph_0$ .

So from Theorem 9 on, all the proofs depended on some form of choice, typically via Theorem 9. So the question comes natural: is choice provable in ZF? And the answer is, sadly, no.

Theorem 11: Choice is independent of ZF. The Zermelo-Fraenkel axioms introduced in the spoiler imply neither the axiom of choice nor its opposite. In fact, one can build models of set theory that satisfy ZF, but either satisfy or not satisfy the full Axiom of Choice.

And of course, the proof is done by constructing said models, but that is getting a bit too complex for me, so I just leave you this reference.
Another question comes to mind now: can there be sets with cardinality strictly bigger than countable? And is

$\mathcal P(\mathbb N)$ of a different cardinality than

$\mathbb N$ ? Well, yes to both. In fact, there are cardinals as big as you want.

Theorem 12: Power set and cardinality. $\mathcal P(X)$ always has a strictly greater cardinality than $X$ , whatever set $X$ we pick. In the case of $X$ finite, $|\mathcal P(X)|=2^{|X|}$ .

The proof is taken again from Soardi, Theorem 2.8.6 and Corollary 2.8.7. Warning for noobs: this might be a bit above-average, but it should be at your level.
Proof
First of all, define

$f:\mathcal P(X)\to\{0,1\}^X=\{g:X\to\{0,1\}\text{ functions}\}$ by setting

$f(A)(x)$ to be

$1$ if

$x\in A$ and

$0$ otherwise. In short terms, we are mapping a subset to its indicator function.

$f$ is a bijection, as can easily be proved. So we can work with that function set for our proof. Suppose that, for all

$x\in X$ , we have

$f_x:X\to\{0,1\}$ . Define

$g(x)=0$ if

$f_x(x)=1$ and

$g(x)=1$ otherwise. This is not

$f_x$ for any

$x\in X$ . In other words, there exists no map

$X\to\{0,1\}^X$ such that it is surjective. Much less a bijection. Now consider

$x\mapsto\{x\}$ . This is clearly an injection

$X\to\mathcal P(X)$ , so our first claim is proved.
For the second claim, pick

$f\in\{0,1\}^n$ and consider the sequence

$f(0),f(1),\dotsc,f(n-1)$ , which uniquely identifies the function

$f$ . How many such sequences are there? We have two choices for

$f(0)$ , two choices for

$f(1)$ , and so on. In the end, we conclude there are

$2^n$ such sequences, which proves the second claim.

Corollary 4: Another continuum. The set of sequences of zeros and ones has the cardinality of continuum.

Proof.
Well, in the last proof, we proved that

$\mathcal P(X)$ is in bijection with

$\{0,1\}^X$ . Apply that to

$\mathbb N$ , and

$\{0,1\}^{\mathbb N}$ has the cardinality of continuum, but that is exactly the set in the statement.

At this point, one wonders: is it possible, without or with choice, to establish whether there is any cardinal between

$\aleph_0$ and continuum?

Continuum hypothesis. There is no cardinal satisfying $\aleph_0<\kappa<|\mathcal P(\mathbb N)|$ .

In fact, why stop there?

Generalized continuum hypothesis. Given a cardinal $\kappa$ , there is no cardinal $kappa'$ satisfying $\kappa<\kappa'<2^\kappa$ , where $2^\kappa$ is the cardinal of the power set of any $X\in\kappa$ .

Turns out, unfortunately, that both of these statements are independent of ZF and of choice, so no, we can't prove any of this in ZF, whether with or without choice. This independence is proved by forcing, a method developed precisely to prove this independence, which I won't get into because this post is already too long as it is.
It is now time to discuss some properties of cardinal addition, which we only defined for finite cardinals, but can obviously be defined the same way for infinite cardinals. Actually, not quite yet. First, I have some other properties to show, and then, since I want to make a single big theorem about the whole of cardinal arithmetic, I will first define products and powers of cardinals.

Theorem 13: Properties of the cardinal ordering. The following hold, where the ones involving choice are marked with "CW" (Choice Warning):

The order of cardinals has the properties of an order relation.
Cardinal equality has the properties of an equivalence relation.
(CW) The statement "Given any two cardinals $\kappa,\lambda$ , they are comparable, i.e. either $\kappa\leq\lambda$ or $\lambda\leq\kappa$ " is equivalent to the axiom of choice.
(CW) $|A|\leq|B|$ is equivalent to the existence of a surjective $f:B\to A$ .

Proof.

Given any cardinal $\kappa$ , it is evident that $\kappa\leq\kappa$ , since this means that $X\in\kappa\implies\exists f:X\to X\text{ injective}$ but we know we can find one that is bijective: the identity. If $\kappa\leq\lambda\wedge\lambda\leq\kappa$ , it means that, given $X\in\kappa,Y\in\lambda$ , we have injections $f:X\to Y,g:Y\to X$ , but by Theorem 8 that implies there is a bijection $h:X\to Y$ , so that $\kappa=\lambda$ . If $\kappa\leq\lambda\wedge\lambda\leq\mu$ , given $X\in\kappa,Y\in\lambda,Z\in\mu$ , we have injections $f:X\to Y,g:Y\to Z$ , but then we have seen in Theorem 1 that $g\circ f:X\to Z$ is an injection, which means $\kappa\leq\mu$ .
$\kappa=\kappa$ is trivial: pick $X\in\kappa$ and consider the identity. If $\kappa=\lambda$ , pick $X\in\kappa,Y\in\lambda$ , then there is a bijection $f:X\to Y$ , but we saw in Theorem 1 that $f$ has an inverse, and that is a bijection $f^{-1}:Y\to X$ , giving $\lambda=\kappa$ . If $\kappa=\lambda\wedge\lambda=\mu$ , then for $X\in\kappa,Y\in\lambda,Z\in\mu$ we find bijections $f:X\to Y,g:Y\to Z$ , but then $g\circ f:X\to Z$ is a bijection by Theorem 1, ad this gives $\kappa=\mu$ .
Assume choice. We saw in last post that choice is equivalent to Zorn's lemma, so let's use that. Pick $A\in\kappa,B\in\lambda$ . Consider the set of all partial injections $A\to B$ , that is all injections from subsets of $A$ into $B$ . We can then unite all those and by the axiom of union we have our set. With that, we can define the following order on that set: $f\leq g\iff\operatorname{Dom}(f)\subseteq\operatorname{Dom}(g)\wedge g|_{\operatorname{Dom}(f)}=f.$ So $f\leq g$ means that if $f(x)$ is defined, so is $g(x)$ , and then $f(x)=g(x)$ . This obviously satisfies the properties of an order, and given a totally ordered subset $\{f_i\}_{i\in I}$ of functions defined on subsets $\{A_i\}_{i\in I}$ , it is easy to see that an upper bound for $\{f_i\}_{i\in I}$ is the function with domain $\bigcup_iA_i$ defined to be equal to $f_i(x)$ whenever $x\in A_i$ . Thus, by Zorn's lemma, there is a maximal element $f:\tilde A\to B$ . If $\tilde A=A$ , $f:A\to B$ is an injection, proving $\kappa\leq\lambda$ . Assume $\tilde A\neq A$ . Then there is $a\in A\smallsetminus\tilde A$ . If $f:\tilde A\to B$ is not surjective, choose $y\in B\smallsetminus f(A)$ . Define $\tilde f:\tilde A\cup\{x\}\to B$ by setting $\tilde f(x)=y$ and $\tilde f(x)=f(x)$ for $x\in\tilde A$ . Then $\tilde f\geq f$ , but $\tilde f\neq f$ , violating the maximality of $f$ . So $f$ must be surjective, but then by Theorem 1 it has a right inverse $g:B\to\tilde A$ . This $g$ has $\tilde f$ as its left inverse, so again by Theorem 1 we have $g$ must be injective, so that $\lambda\leq\kappa$ .
Assume any two cardinals are comparable. Let $\kappa$ be a cardinal, and consider $\aleph(\kappa)$ the least ordinal that cannot be injected into a representative of $\kappa$ . We will see that this definition makes sense further down, just accept this is well-posed for the moment. Then either $\kappa\leq|\aleph(\kappa)|$ or viceversa, because of comparability of cardinals, but $\kappa\geq|\aleph(\kappa)|$ would mean there is an injection $\aleph(\kappa)\to X$ for $X\in\kappa$ , which violates the definition of $\aleph(\kappa)$ . Hence, $\kappa\leq|\aleph(\kappa)$ , but then for $X\in\kappa$ there is an injection $X\to\aleph(\kappa)$ , and since the latter set is an ordinal, in particular a well-ordered set, it means $X$ can be well-ordered. So for any cardinal, all its representatives can be well-ordered, which is the well-ordering theorem, which we have seen in last post is equivalent to the axiom of choice. This proof is inspired by this Stack Exchange post.
Suppose $|A|\leq|B|$ . This means we have an injection $f:A\to B$ . This then has a left inverse $g:B\to A$ . This is clearly a surjection, as we desired. Choice was not involved here (see the proof of the existence of the left inverse). If we have $f:B\to A$ surjective, then we can find a right inverse $g:A\to B$ . When we proved this, we essentially used a choice function for the family $\{f^{-1}(\{b\}):b\in B\}$ , so that is where choice comes in. If we prove $g$ is injective, we are done. But $g$ has $f$ as its left inverse, so that it is necessarily injective, completing our proof.

Time to define product and powers of cardinals.

Definition 12: Cardinal operations. Let $\kappa,\lambda$ be cardinals and $X\in\kappa,Y\in\lambda$ . Then we define:

$\begin{align*} \kappa+\lambda:={}&|(X\times\{1\})\cup(X\times\{\{1\}\})| \\ \kappa\lambda:={}&|X\times Y| \\ \kappa^\lambda:={}&|X^Y|. \end{align*}$
We have a number of things to verify here.

Theorem 14: Properties of cardinal operations. Again, when choice is required, I start the items with "(CW)".

Cardinal addition is well-defined, as are product and power.
When $\kappa,\lambda$ are finite, these are exactly the operations we know from basic arithmetic.
In general, summing $\kappa$ $n$ times is the same as $n\kappa$ .
When $\lambda$ is finite, $\kappa^\lambda=|X^\lambda|$ , where $X^\lambda$ means the cartesian product of $\lambda$ copies of $X$ .
Addition is commutative and associative.
$\kappa+\lambda\geq\kappa$ and $\kappa+\lambda\geq\lambda$ .
(CW) If $\kappa\leq\mu$ , then for any $\nu$ we have $\kappa+\nu\leq\kappa+\nu$ .
$\aleph_0+\aleph_0=\aleph_0$ . In fact, any finite sums of $\aleph_0$ s is $\aleph_0$ , and (CW) the same holds for any infinite cardinal $\kappa$ .
(CW) If $\{A_n\}_{n\in\mathbb N}$ is a family of countable sets, then $\bigcup_nA_n$ is also countable.
(CW) If $\kappa$ or $\lambda$ is infinite, then $\kappa+\lambda=\max\{\kappa,\lambda\}$ .
(CW) If $\sigma$ is infinite, there exists $\kappa:\kappa+\mu=\sigma$ if and only if $\mu\leq\sigma$ . It is unique and equal to $\sigma$ if and only if $\mu<\sigma$ .
0 is the additive identity and 1 the multiplicative one, and $\kappa0=0\kappa=0$ .
Multiplication is associative and commutative, and $\kappa\mu\geq\kappa\wedge\kappa\mu\geq\mu$ for all cardinals.
$\kappa\leq\mu$ implies $\kappa\nu\leq\mu\nu$ for any cardinal $\nu$ .
$\kappa(\mu+\nu)=\kappa\mu+\kappa\nu$ and $(\kappa+\mu)\nu=\kappa\nu+\mu\nu$ .
(CW) If $\kappa$ or $\mu$ is infinite, then $\kappa\mu=\max\{\kappa,\nu\}$ .
(CW) If $\kappa$ is infinite and $\mu\neq0$ , there exists $\nu:\mu\nu=\kappa$ if and only if $\mu\leq\kappa$ . It is unique and equal to $\kappa$ if and only if $\mu<\kappa$ .
$\kappa^0=1$ for any $\kappa$ .
If $\kappa\geq1$ , then $0^\kappa=0$ .
$\kappa^1=\kappa$ and $1^\kappa=1$ .
$\kappa^{\mu+\nu}=\kappa^\mu\kappa^\nu$ , $((\kappa^\mu)^\nu)=\kappa^{\mu\nu}$ , and $(\kappa\mu)^\nu=\kappa^\nu\mu^\nu$ .
$\kappa^\mu\geq\kappa$ but it is possible that $\kappa^\mu<\mu$ .
If $1\leq\nu\wedge\kappa\leq\mu$ , then $\nu^\kappa\leq\nu^\mu\wedge\kappa^\nu\leq\mu^\nu$ . The last statement doesn't require $\nu\geq1$ .
Since $2=\{0,1\}$ , clearly $2^\kappa=|\{0,1\}^X|$ for $X\in\kappa$ , and by Theorem 12's proof we know this is $|\mathcal P(X)|$ .
(CW) If $\kappa,\mu\geq1$ are both finite and $\nu$ is infinite, then $\kappa^\nu=\mu^\nu$ .
(CW) If $\kappa$ is infinite and $\mu\neq0$ is finite, then $\kappa^\mu=\kappa$ .
(CW) If $\kappa\geq2,\mu\geq1$ and at least one of them is infinite, then $\max\{\kappa,2^\mu\}\leq\kappa^\mu\leq\max\{2^\kappa,2^\mu\}$ .
(CW) If $\kappa$ is infinite and $\mu>0$ is finite, then there exists $\nu:\nu^\mu=\kappa$ , and $\nu=\kappa$ .
(CW) If $\kappa$ is infinite and $\mu>0$ is finite, there need not be $\lambda:\mu^\lambda=\kappa$ , but if there is one, it is infinite and $\lambda\leq\kappa$ , and any $\nu\geq1$ will also satisfy $\nu^\lambda=\kappa$ . Since choice implies cardinals are well-ordered, we can pick the least such $\lambda$ and call it $\log_\mu\kappa$ .
(CW) $\aleph_0^{\aleph_0}=2^{\aleph_0}$ . In other words, the sequences of integer numbers are in bijection with the subsets of the naturals.
(CW) $\kappa+\lambda=\kappa\lambda$ if both are infinite.

Most of the above is from Wikipedia.

First of all, I accidentally used

$|X|+|Y|=|(X\times\{1\})\cup(Y\times\{\{1\}\})|$ below whereas IIRC I defined

$|X|+|Y|:=|(X\times1)\cup(Y\times\{1\})|$ , but it's trivial to biject the two versions of the sum, as

$(x,0)\mapsto(x,1),(y,1)\mapsto(y,\{1\})$ is a bijectioni between the two.

Pick $X,X'\in\kappa,Y,Y'\in\lambda$ . We need to prove that $(X\times\{1\})\cup(Y\times\{\{1\}\})$ is in bijection with $(X'\times\{1\})\cup(Y'\times\{\{1\}\})$ , and similarly for $X\times Y$ and $X'\times Y'$ , and for $X^Y$ and $(X')^{Y'}$ . Pick bijections $f:X\to X',g:Y\to Y'$ . The map that sends $(x,1)\mapsto(f(x),1)$ and $(y,\{1\})\mapsto(g(y),\{1\})$ is evidently a bijection between the first two sets, with inverse $(z,1)\mapsto(f^{-1}(z),1)$ and $(t,\{1\})\mapsto(g^{-1}(t),\{1\})\}$ , the map that sends $(x,y)\mapsto(f(x),g(y))$ is one for the second two sets with inverse $(z,t)\mapsto(f^{-1}(z),g^{-1}(t))$ , and the map $h\mapsto f\circ h\circ g^{-1}$ is one for the last two sets with inverse $h\mapsto f^{-1}\circ h\circ g$ .
Looking at Lemma 3, what we did was map $\{0,\dotsc,m-1\}$ to itself viewed as a subset of $\{0,\dotsc,n+m-1\}$ , and then shifting $\{0,\dotsc,n-1\}$ by $m$ to fit it into that same set together with $\{0,\dotsc,m-1\}$ , so that in itself deals with the sum. Combinatorics deals with products and powers.
Let us do this by induction. It is quite obvious that $(X\times\{1\})\cup(X\times\{\{1\}\})$ is in bijection with $X\times\{1,\{1\}\}$ : they're literally the same set. $\{1,\{1\}\}$ is in bijection with $2=\{0,1\}$ by $1\mapsto0,\{1\}\mapsto1$ , so we ended up with $X\times 2$ , proving $\kappa+\kappa=2\kappa$ . Assume $\kappa+\dotso+\kappa=(n-1)\kappa$ with the sum being of $n-1$ copies. Then surely the sum of $n$ copies of $\kappa$ features a set which is the union of $Y\times\{1\}$ with $Y$ from the sum of $n-1$ copies and $X\times\{\{1\}\}$ with $X\in\kappa$ . In other words, it is $(n-1)\kappa+\kappa$ . We used the below associativity of the sum to get to this point, and we use the below distributivity of the product to conclude this is $n\kappa$ .
For $\lambda>2$ , this is by definition of the cartesian product. For $\lambda=0$ or $\lambda=1$ , see below for the powers, and naturally the empty cartesian product makes no sense, so we are left with just $\lambda=1$ out of those two, where $\kappa^\lambda=\kappa$ (see below), and $|X^1|=|X|$ since a function $1\to X$ is uniquely identified by its value on the sole element of 1 which is 0. For $\lambda=2$ , we must prove $|X^2|=|X\times X|$ . Define $f((x,y)):=(0\mapsto x,1\mapsto y)$ . Suppose $f((x,y))=f((x',y'))$ . This implies $x=x',y=y'$ because two functions are equal if they take the same value on all elements of the domain, but then $(x,y)=(x',y')$ , meaning $f$ is injective. For any $h:2\to X$ , note that $f((h(0),h(1)))=h$ , so that $f$ is a bijection.
Set $f((x,1)):=(x,\{1\})$ and $f(y,\{1\}):=(y,1)$ for $x\in X,y\in Y$ , where $X\in\kappa,Y\in\lambda$ . $f$ is evidently a bijection $(X\times\{1\})\cup(Y\times\{\{1\}\})\to(Y\times\{1\})\cup(X\times\{\{1\}\})$ , proving commutativity of the sum.
With $X\in\kappa,Y\in\lambda,Z\in\mu$ , consider $((X\times\{1\})\cup(Y\times\{\{1\}\})\times\{1\})\cup(Z\times\{\{1\}\})$ and $(X\times\{1\})\cup((Y\times\{1\})\cup(Z\times\{\{1\}\}))\times\{\{1\}\}$ . Set $f((x,1,1)):=(x,1),f((y,\{1\},1)):=(y,1,\{1\}),f((z,\{1\})):=(z,\{1\},\{1\})$ , and this is a bijection between those two sets, giving associativity of the sum.
If $X\in\kappa,Y\in\lambda$ , $(X\times\{1\})\cup(Y\times\{\{1\}\})\in\kappa+\lambda$ contains the sets $X\times\{1\},Y\times\{\{1\}\}$ , in bijection with $X$ and $Y$ respectively.
Pick $X\in\kappa,Y\in\mu,Z\in\nu$ . Pick $f:X\to Y$ injective. Define $g:(X\times\{1\})\cup(Z\times\{\{1\}\})\to(Y\times\{1\})\cup(Z\times\{\{1\}\})$ by setting $g((x,1)):=(f(x),1)$ and $f(z,\{1\}):=(z,\{1\})$ with $x\in X,z\in Z$ . This is clearly an injection.
Define $f((n,1)):=2n$ and $f((n,\{1\})):=2n+1$ . This is a bijection $f:(\mathbb N\times\{1\})\cup(\mathbb N\times\{\{1\}\})\to\mathbb N$ , thus showing $\aleph_0+\aleph_0=\aleph_0$ . Since we already know $n$ copies of $\aleph_0$ sum to $n\aleph_0$ , let's prove this is $\aleph_0$ . We know it for $2\aleph_0$ , so let's go by induction. If $(n-1)\aleph_0=\aleph_0$ , then $n\aleph_0=(n-1)\aleph_0+\aleph_0=\aleph_0+\aleph_0=\aleph_0$ . The part "this is true for all infinite cardinals" will have to be made to descend from the sum formula below.
Set $B_n:=\{f:\mathbb N\to A_n\text{ bijective}\}$ . This is a valid set, since it is specified via a valid formula from $\mathbb N^{A_n}$ , which we know to be valid. Use countable choice to find a choice function for that family, which we know consists of nonempty sets. This allows us to order each $A_n$ in a way simultaneously. We can now list $A_1$ in row 1, list $A_2$ in row 2, and so on, and then proceed by zigzagging along this infinite table, going $(1,1),(1,2),(2,1),(3,1),(2,2),(1,3),\dotsc$ . This produces a surjection from $\mathbb N$ to $\bigcup A_n$ , which we proved equates, under choice, to an injection $\bigcup A_n\to\mathbb N$ . Just view a bijection $\mathbb N\to A_1$ as a function taking values in the union, and we have an injection $\mathbb N\to\bigcup A_n$ , thus proving our point. Do we need full choice or can we just use countable choice? AFAICT we need full choice for that injection, though the first use of choice only used countable choice… Let's see what they say here.
The full proof of this requires ordinals, so I'll leave it for later when we introduce those. For the moment, I will prove that, whenever $X$ can be split into two sets both in bijection with it, then $|\mathcal P(X)|+|\mathcal P(X)|=|\mathcal P(X)|$ , which by 7 implies $|\mathcal P(X)|+|Y|\leq|\mathcal P(X)|$ for all $Y\subseteq\mathcal P(X)$ , and the reverse inequality is trivial. So basically this gives us our claim whenever the maximum $\max\{\kappa,\mu\}$ has the power set of such a set as a representative, and in particular this can be seen to work for all cardinals that are $\mathcal P(\mathcal P(\dots(\mathcal P(\mathbb N))))$ . This would be all the accessible cardinals if the generalized continuum hypothesis is assumed, and inaccessible cardinals (Wikipedia and Vsauce) are not a necessity. How do we prove that? Well, we prove that we can split $\mathcal P(X)$ itself into two sets both in bijection with it. Let $X=X_{odd}\cup X_{even}$ be our split. Of course, $\mathbb N$ is split this way according to evenness and oddness, which is the reason for those subscripts. Choose any well-order on $\mathcal P(X)$ . Surely one exists by choice, or rather by the equivalent well-ordering theorem. Consider $P_{even}:=\{Y\subseteq X:\min Y\in X_{even}\}\cup\{\varnothing\}$ and $P_{odd}:=\{Y\subseteq X:\min Y\in X_{odd}\}$ . Constructing a bijection $\mathcal P(X)\to P_{even}$ is easy: pick a bijection $f:X\to X_{even}$ and consider $Y\subseteq X\mapsto f(Y)$ . If we work analogously by picking a bijection $g:X\to X_{odd}$ , we construct a bijection $\mathcal P(X)\to P_{odd}\cup\{\varnothing\}$ . So there is an extra empty set to be gotten rid of. But that is simple: choice gives us (Theorem 10) a countable subset of $P_{odd}$ , assuming $P_{odd}$ is finite, but obviously $X$ is (by the split we assumed to exist) and $\mathcal P(X)>X$ . Say $A$ is that subset. If $\varnothing\notin A$ , add it in, and the result is still countable by simply shifting everything up 1 and making room for $\varnothing$ in the place of 0. So we can assume $\varothing\in A$ . To biject $P_{odd}\cup\{\varnothing\}$ into $P_{odd}$ , fix $P_{odd}\smallsetminus A$ , map $\varnothing\mapsto h(0)$ and $h(i)\mapsto h(i+1)$ , where $h:\mathbb N\to A$ is a bijection. And there you have it. In the case of $\mathbb N$ , the bijections between $X,X_{even},X_{odd}$ are $n\mapsto2n$ and $n\mapsto2n+1$ .
Clearly if there is such a $\kappa$ , then by 6 $\sigma=\kappa+\mu\geq\mu$ . If $\mu\leq\sigma$ , the above (assuming it was proved) tells us $\sigma+\mu=\sigma$ , so one such $\kappa$ is $\sigma$ . So the first equivalence is proved. If $\mu=\sigma$ , again by the above, any $\kappa\leq\sigma$ will do, so non-strict inequality implies non-uniqueness, hence uniqueness implies strict inequality. Assume strict inequality: $\mu<\sigma$ . Then, if $\kappa<\sigma$ , $\kappa+\mu=\max\{\kappa,\mu\}<\sigma$ , using the above again. If $\kappa>\sigma$ , clearly $\kappa+\mu=\kappa>\sigma$ , so the only solution to that equality is $\kappa=\sigma$ .
$\varnothing\times X$ is always empty, because to build an element of the product, we need a function $f:\{0,1\}\to X\cup\varnothing$ such that $f(0)\in\varnothing\wedge f(1)\in X$ (remember we proved above that this is the same as the ordered pairs), but $f(0)\in\varnothing$ is impossible by definition, hence $\varnothing\times X=\varnothing$ . This gives us $0\kappa=0$ because if $X\in\kappa$ , $\varnothing\times X\in0\kappa$ . It also gives us that $(X\times\{1\})\cup(\varnothing\times\{\{1\}\})=X\times\{1\}\cup\varnothing=X\times\{1\}$ , so that $\kappa+0=\kappa$ (just pick $X\in\kappa$ and the equality before gives $\kappa+0=\kappa$ ). Addition is commutative by 5, so $0+\kappa=\kappa+0=\kappa$ . Multiplication is commutative by 13, so $\kappa0=0$ too. Take $X\in\kappa$ . A representative of $1\kappa$ is $\{1\}\times X$ , which is evidently in bijection with $X$ by $(1,x)\mapsto x$ . So $1\kappa=\kappa$ . By commutativity (see item below), $\kappa1=1\kappa=\kappa$ .
Pick $X\in\kappa,Y\in\mu$ . Define $f:X\times Y\to Y\times X$ by $f(x,y):=(y,x)$ . In other words, $f((x,y)):=(\pi_2((x,y)),\pi_1((x,y)))$ . This is definitely a bijection, since it is an involution (i.e. it has itself as its inverse). But $Y\times X$ is a representative of $\mu\kappa$ and $X\times Y$ one of $\kappa\mu$ , so that commutativity of multiplication is proved. Pick $X\in\kappa,Y\in\mu,Z\in\nu$ . We must prove $(X\times Y)\times Z$ and $X\times(Y\times Z)$ are in bijection. Also, we want to biject them with our definition of our triple product. Set $f(((x,y),z)):=(0\mapsto x,1\mapsto y,2\mapsto z)$ . This is a function $f:(X\times Y)\times Z\to X\times Y\times Z$ . It is clearly surjective. Suppose $f(((x,y),z))=f(((x',y'),z'))$ . This means $0\mapsto x,1\mapsto y,2\mapsto z$ is the same function as $0\mapsto x',1\mapsto y',2\mapsto z'$ , and by the extensional definition of function equality we conclude $x=x',y=y',z=z'$ , which means $(x,y)=(x',y')$ and also $((x,y),z)=((x',y'),z')$ . Bijecting $X\times(Y\times Z)$ with the triple product is analogous. Associativity is thus proved. Pick $X\in\kappa,Y\in\kappa$ . $f(x):=(x,y)$ for any fixed $y\in Y$ and $g(y):=(x,y)$ for any fixed $x\in X$ are clearly injections $X\to X\times Y$ and $Y\to X\times Y$ respectively, proving our claim.
Pick $X\in\kappa,Y\in\mu,Z\in\nu$ . Pick an injection $f:X\to Y$ , which exists by $\kappa\leq\mu$ . Set $g((x,z)):=(f(x),z)$ . This is an injection $g:X\times Z\to Y\times Z$ , proving $\kappa\mu\leq\kappa\nu$ .
Pick $X\in\kappa,Y\in\mu,Z\in\nu$ . A representative of $\kappa(\mu+\nu)$ is: $\begin{align*} X\times((Y\times\{1\})\cup(Z\times\{\{1\}\}))={}&\{(x,(y,1)):x\in X,y\in Y\}\cup\{(x,(z,\{1\})):x\in X,z\in Z\}=(X\times(Y\times\{1\}))\cup(X\times(Z\times\{\{1\}\})={} \\ {}={}&(X\times Y\times\{1\})\cup(X\times Z\times\{\{1\}\})=((X\times Y)\times\{1\})\cup((X\times Z)\times\{\{1\}\}), \end{align*}$ which is a representative of $\kappa\mu+\kappa\nu$ , proving one distributivity. Using this and item 13 (commutativity) above, we have $(\kappa+\mu)\nu=\nu(\kappa+\mu)=\nu\kappa+\nu\mu=\kappa\mu+\kappa\nu$ .
Using 13 and 14 above, this reduces to $\kappa\kappa=\kappa$ for all infinite cardinals $\kappa$ , since we can commute and assume $\kappa\geq\mu$ and then $\kappa\kappa\geq\kappa\mu\geq\kappa$ . Suppose $\kappa=|\mathcal P(X)|$ with $X=X_{even}\cup X_{odd}$ and $e:X\to X_{even},o:X\to X_{odd}$ bijections. Then, given $(f,g)\in\{0,1\}^X\times\{0,1\}^X$ , which we know to be in bijection with $\mathcal P(X)\times\mathcal P(X)$ , set $h(x):=f(e^{-1}(x))$ if $x\in X_{even}$ and $h(x):=g(o^{-1}(x))$ if $x\in X_{odd}$ . This is in $X^{\{0,1\}}$ . The map $(f,g)\mapsto h$ just defined has the inverse $h\mapsto(h\circ e,h\circ o)$ , so it is a bijection. Thus, $\kappa\kappa=\kappa$ . Assuming the generalized continuum hypothesis and the non-existence of inaccessible cardinals like in 10, this gives us our proof, provided we deal with $\aleph_0$ . Certainly, this works for the continuum, where $X=\mathbb N$ , the subsets are even and odd numbers, and $e(n):=2n,o(n):=2n+1$ . Let's deal with $\mathbb N$ . Write $\mathbb N\times\mathbb N=\bigcup_n\mathbb N\times\{n\}$ , and that is a countable union of countable sets, so it's countable by 9. So $\aleph_0\aleph_0=\aleph_0$ . The general case will be proved at the end with ordinals. It is in fact equivalent to choice that $\kappa^2=\kappa$ for all infinite $\kappa$ .
Assume such a $\nu$ exists, then $\kappa=\mu\nu\geq\mu$ . If $\kappa\geq\mu$ , by the above item $\kappa\mu=\kappa$ , so that a $\nu$ exists. If $\mu<\kappa$ , $\kappa\mu=\kappa$ , and for any other cardinal we have strict inequality, either $\mu\nu=\nu>\kappa$ if $\nu>\kappa$ , or $\mu\nu=\max\{\mu,\nu\}<\kappa$ if $\nu<\kappa$ . So strict inequality grants uniqueness. If $\mu=\kappa$ , then all $\nu\leq\kappa$ satisfy $\mu\nu=\kappa$ , thus negating uniqueness.
First of all, suppose we have $f,g:\varnothing\to X$ , where $X\in\kappa$ and $\varnothing$ represents the cardinal 0. We have no elements on which they differ, so $f=g$ . Hence, $\kappa^0$ is either 1 or 0. We defined functions as particular relations, so a function $\varnothing\to X$ must be a special subset of $\varnothing\times X=\varnothing$ . This has only one subset: $\varnothing$ . Does this count as a function? Well, we sure as heck have no way of disproving it counts, hence it counts. So $\kappa^0=1$ .
What is $0^\kappa$ ? It is the functions $X\to\varnothing$ for $X\in\kappa$ . That is empty unless $X=\varnothing$ , so we are done.
Pick $X\in\kappa$ and set $x\mapsto(0\mapsto x)$ to have a bijection $X\to X^1$ . As for $1^\kappa$ , clearly $1^X$ has only one element: the constant function $f(x)=0$ for all $x\in X$ . Hence, $1^\kappa=1$ .
Pick $X\in\kappa,Y\in\mu,Z\in\nu$ . We want to biject $X^{(Y\times\{1\})\cup(Z\times\{\{1\}\})}$ with $X^Y\times X^Z$ . The former consists of functions $(Y\times\{1\})\cup(Z\times\{\{1\}\})\to X$ . Define the pair $(g,h)$ associated to such a function $f$ as $g(y):=f((y,1)),h(z):=f((z,\{1\}))$ . You can easily construct the inverse of this, thus concluding the proof of the bijection. We then want to biject $(X^Y)^Z$ with $X^{Y\times Z}$ . Pick $f$ in the second set. Fix $z$ and define $g_z(y):=f(y,z)$ . This defines a function $g_z:Y\to X$ , that is $g_z\in X^Y$ . Now define $g(z):=g_z$ . This $g\in(X^Y)^Z$ . So $f\mapsto g$ is a map $X^{Y\times Z}\to(X^Y)^Z$ . Given $g\in(X^Y)^Z$ , define $f(y,z):=(g(z))(y)$ . This is evidently the inverse of the map before, so we are done. We lastly want to biject $(X\times Y)^Z$ and $X^Z\times Y^Z$ . Pick $f:Z\to X\times Y$ . Define $g(z):=\pi_1(f(z)),h(z):=\pi_2(f(z))$ . The map $f\mapsto(g,h)$ is a bijection as desired, and its inverse is $(g,h)\mapsto[f(z):=(g(z),h(z))]$ .
Pick $X\in\kappa,Y\in\mu$ . Fix $y\in Y,x_0\in X$ . Set $(f(x))(y):=x$ and $(f(x))(y'):=x_0$ for all $y'\neq y$ . This map $f$ is an injection $X\to X^Y$ , proving $\kappa\leq\kappa^\mu$ . If $\kappa=1<\mu$ , then $\kappa^\mu=1<\mu$ .
Pick $X\in\kappa,Y\in\mu,Z\in\nu$ . We want to find injections $Z^X\to Z^Y$ and $X^Z\to Y^Z$ . Pick an injection $f:X\to Y$ , as is guaranteed to exist by $\kappa\leq\mu$ . $f$ has a left inverse $g$ . For $h\in Z^X$ , map it to $[y\mapsto h(g(y))\text{ if }y\in f(X),y\mapsto z_0\text{ otherwise}]$ , where $z_0\in Z$ is fixed. We thus have a map $Z^X\to Z^Y$ . Is this an injection? Well, if $z_0$ exists, then yes, we can compose by $f$ to invert it (or rather, get the left inverse of this thing). So if $\nu\geq1$ as assumed, this construction proves the claim. If $\nu=0$ , this works if $\kappa\geq1$ or if $\kappa=\mu=0$ , because in the former case we have $\nu^\kappa=0^\kappa=0\leq0=0^\mu=\nu^\mu$ , whereas in the latter case $\nu^\kappa=0^0=\nu^\mu$ . If $\kappa=0<\mu$ , then $\nu^\kappa=0^0=1$ but $\nu^\mu=0^\mu=0<1$ . Now define $a:X^Z\to Y^Z$ by $a(h):=f\circ h$ . A left inverse of this is given by $b(h):=g\circ h$ , since $b(a(h))=g\circ(f\circ h)=(g\circ f)\circ h=\operatorname{id}_X\circ h=h$ , so $a$ is an injection.
This bullet is a full proof, so nothing to prove here.
Let's rewrite this: for all $n,m\in\mathbb N\smallsetminus\{0\}$ , and for all infinite cardinals $\kappa$ , we have $n^\kappa=m^\kappa$ . Assume $n\leq m$ , which is always possible modulo renaming. By 23 above, we have $n^\kappa\leq m^\kappa$ . So we must prove the reverse equality by making use of choice. PYTBO.
Work by induction. If $\mu=1$ , $\kappa^\mu=\kappa$ as seen above. A partial proof of $\kappa^2=\kappa$ has been seen above. The rest requires ordinals and is done at the end of the post. Now assume $\kappa^n=\kappa$ . Then by associativity of the product we have $\kappa^{n+1}=\kappa^n\kappa=\kappa^2=\kappa$ . The base case $\kappa^1=\kappa$ has already been proved, so by mathematical induction we have our claim.
Since $\mu\geq1$ and $\kappa\geq2$ , by 23 we know $2^\mu\leq\kappa^\mu$ . By 22, $\kappa\leq\kappa^\mu$ , so that $\max\{\kappa,2^\mu\}\leq\kappa^\mu$ . At least one of $\kappa,\mu$ is infinite. Let's first assume $\kappa$ is infinite and $\mu$ is finite. 26 then gives us $\kappa^\mu=\kappa<2^\kappa$ , the last step being because of Theorem 12. Then, $\kappa^\mu\leq\max\{2^\kappa,2^\mu\}$ . Assume then that $\kappa$ is finite and $\mu$ is infinite. Then 25 gives us $\kappa^\mu=2^\mu$ , so that $\kappa^\mu\leq\max\{2^\kappa,2^\mu\}$ . Assume, finally, that $\kappa,\mu$ are both infinite. PYTBO.
This is just a rehashing of 26 in terms of roots. Note that the $\nu$ is unique and is $\kappa$ by 26.
I'm not sure if "greater than 1" meant $\mu>1$ or $\mu\geq1$ . $\mu=1$ is an easy case where no $\lambda$ exists: $\mu^\lambda=1$ for any cardinal in that case. If $\mu=2$ and $\kappa=3$ , there clearly is no $\lambda$ , but $\kappa$ is supposed to be infinite. Inaccessible $\kappa$ obviously provides a counterexample, if one exists. Since $2^\lambda=\mu^\lambda$ by 25 for any infinite $\lambda$ and any finite $\mu$ , we can assume $\mu=2$ and the nonexistence of a $\lambda$ amounts to saying $\kappa$ is not obtainable by power set. Oh yeah, that takes care of the "any finite $\nu\geq1$ " part, provided we prove $\lambda$ is infinite. Then again, $\kappa$ is infinite and $\mu$ finite, and $\text{finite}^{\text{finite}}$ is finite, so if $\mu^\lambda=\kappa$ then $\lambda$ must be infinite. Given this, we essentially need $2^\lambda=\kappa$ , and $2^\lambda>\lambda$ , so $\kappa>\lambda$ . The last statement in this bullet is that choice implies well-ordering of the cardinal. That is because choice, via the well-ordering theorem, implies we can associate an ordinal to each cardinal in a well-defined way, and since we can prove ordinals are well-ordered, the restricted ordering on the ordinals associated to cardinals is a well-order for the cardinals. Details of this in the part below about ordinals.
$2\leq\aleph_0\leq2^{\aleph_0}$ , so $2^{\aleph_0}\leq\aleph_0^{\aleph_0}$ and $\aleph_0^{\aleph_0}\leq(2^{\aleph_0})^{\aleph_0}=2^{\aleph_0^2}=2^{\aleph_0}$ . Choice is used via $\aleph_0^2=\aleph_0$ , a subcase of 26 which required choice.
It is clear that this follows from 10 and 16 and the transitivity of cardinal equality. However, I want to deduce it differently. Using 26 15 and 13, we see $\kappa+\lambda=(\kappa+\lambda)^2=\kappa^2+\kappa\lambda+\lambda\kappa+\lambda^2=\kappa+2\kappa\lambda+\lambda\geq\kappa\lambda$ . If we prove $\kappa\lambda\geq\kappa+\lambda$ , we are done. Take disjoint sets $K,L:|K|=\kappa,|L|=\lambda$ . Define $F:K\cup L\to K\times L$ by: $F(p):=\left\{\begin{array}{cc} (p,c_1) & p\in K \\ (b_1,p) & p\in C\smallsetminus\{c_1\} \\ (b_2,c_2) & p=c_1 \end{array}\right.$ $b_1,b_2\in K,c_1,c_2\in L$ are fixed and distinct. It is easy to see this is an injection, thus completing the proof. This was taken from this Math Stack Exchange question and its answers.

Let me come back to item 4, specifically "the empty cartesian product makes no sense". Actually, we can define it in a surprising way.

Definition 13: Product of an empty family of sets. If $\mathcal F=\varnothing$ is the empty family of sets, we set $\prod\mathcal F:=1=\{\varnothing\}$ . Why do we do this seemingly nonsensical thing? Well, this should be the set of choice functions for $\mathcal F$ , so it must be a set of functions $\varnothing=\mathcal F\to\cup\mathcal F=\varnothing$ , and we know there is only onw function $\varnothing\to\varnothing$ , and nothing can yield a disproof that this is a choice function for $\mathcal F$ , so $\prod\mathcal F$ should have one element, and we choose to denote it $\varnothing$ as it is the "empty function".

This means that item 4 of the previous theorem now holds also for

$\lambda=0$ , as on one side we have

$\kappa^0$ which we know to be 1, and on the other side we have

$|X^0|$ , which is the empty product which we just defined to be 1.
Now that we are done with cardinals, it's time to tackle the last topic of this post: ordinals. And this is where I bid noobs and meds goodbye, as what follows is all master-level and will be placed in a spoiler.

I will take most of the below from this book, with some clarifications from Math Stack Exchange and some details added in by me. First, a couple definitions.

Definition 13: Transitive sets and ordinals. A set $X$ is called transitive if each of its member is contained in it, or in symbols if $\forall y\in X\forall z\in y\,\,z\in X$ . A transitive set is an ordinal (number) if it is well-ordered under the membership relation $\in$ , that is if $\forall y\subseteq X,\,y\neq\varnothing\implies\exists\min y\in y:\forall z\in y,z\neq\min y\implies\min y\in z$ .

Lemma 5: Equivalent conditions of transitivity.The following are equivalent:

$X$ is transitive;
$X\subseteq\mathcal P(X)$ ;
$\forall y\in X,\mathcal P(y)\subseteq\mathcal P(X)$ ;

Proof.
Assume 1. Pick

$y\in X$ . Then we know

$y\subseteq X$ , meaning

$y\in\mathcal P(X)$ . So 2 is proved.
Assume 2. Pick

$y\in X$ and

$z\subseteq y$ . Since

$y\in\mathcal P(X)$ by 2,

$y\subseteq X$ , so that

$z\subseteq X$ too, meaning

$z\in\mathcal P(X)$ . So 3 is proved.
Assume 3. Pick

$y\in X$ and

$z\in y$ .

$\{z\}\in\mathcal P(y)$ , so

$\{z\}\in\mathcal P(X)$ by 3, but then

$\{z\}\subseteq X$ , meaning

$z\in X$ . Hence, 1 is proved, and our proof is complete.

Lemma 6: Some transitive sets. $\varnothing,\{\varnothing\}$ are transitive while $\{\{\varnothing\}\}$ isn't. For any two transitive sets $X,Y$ , the union $X\cup Y$ is still transitive. For any set $X$ , $S(X)=X\cup\{X\}$ is still transitive.

Proof.

$\varnothing$ is vacuously transitive because it has no elements.

$\{\varnothing\}$ has only one element, which in turn has no elements, so it is also vacuously transitive.

$\{\{\varnothing\}\}$ is not transitive because

$\varnothing\in\{\varnothing\}$ but

$\varnothing\notin\{\{\varnothing\}\}$ . To prove the union of two transitives is transitive, I will use condition 3 of the above lemma. Pick

$z\in X\cup Y$ . Then either

$z\in X$ or

$z\in Y$ . In the former case,

$\mathcal P(z)\subseteq\mathcal P(X)\subseteq\mathcal P(X\cup Y)$ . The latter case can be dealt with analogously. Hence

$X\cup Y$ is transitive by Lemma 5 condition 3. Pick

$z\in X\cup\{X\}$ . Then either

$z=X$ , or

$z\in X$ . In the former case,

$z\subseteq X\cup\{X\}$ because

$X\subseteq X\cup\{X\}$ by definition of union. In the latter case,

$z\subseteq X\subseteq X\cup\{X\}$ by transitivity of

$X$ and definition of union. So the proof is complete.

Theorem 15: Some facts about ordinals.

$\varnothing$ is an ordinal;
If $\alpha$ is an ordinal, then every $\beta\in\alpha$ is an ordinal;
If $\alpha,\beta$ are ordinals and $\alpha\subsetneq\beta$ , then $\alpha\in\beta$ ;
If $\alpha,\beta$ are ordinals, then either $\alpha\subseteq\beta$ or $\beta\subseteq\alpha$ ;

Proof.
We already saw

$\varnothing$ is transitive in Lemma 5 above, and of course the well-ordering part is vacuously true.
That

$\beta$ be well-ordered by membership easily follows from the analogous property of

$\alpha$ . Indeed, the fact

$\in$ is a total order on

$\beta$ is a trivial consequence of the analogous property of

$\alpha$ . Next, take

$S\subseteq\beta$ , then

$S\subseteq\alpha$ by transitivity of

$\alpha$ , so we have a

$\min S=:s_0$ in

$\alpha$ . Now,

$s_0\in S\subseteq\beta$ , so that

$s_0\in\beta$ , meaning we have the well-ordering of

$\beta$ by

$\in$ . Suppose now that

$z\in y\in\beta$ . By transitivity of

$\alpha$ ,

$y\in\alpha$ , and also

$z\in\alpha$ . By well-ordering of

$\alpha$ under membership, which means in particular that this is a total order, we have either

$z\in\beta$ or

$\beta\in z$ . However, if we assume the latter, we get

$z\in y\in\beta\in z$ . Consider now

$\{z,y,\beta\}\subseteq\alpha$ . By well-ordering of

$\alpha$ under membership, this must have a minimum. However, it can't be

$z$ because

$\beta\in z$ , it can't be

$y$ because

$z\in y$ , and it can't be

$\beta$ because

$y\in\beta$ . So we have a contradiction. Hence we must have

$z\in\beta$ , thus proving transitivity of

$\beta$ .
Suppose

$\alpha\subsetneq\beta$ . Then

$\beta\smallsetminus\alpha\neq\varnothing$ , so there must exist

$\gamma\in\beta\smallsetminus\alpha$ such tht

$x\in\beta\smallsetminus\alpha\wedge\gamma\neq x\implies\gamma\in x$ , that is, a minimum of

$\beta\smallsetminus\alpha$ under membership. Suppose

$x\in\alpha$ . Then either

$x\in\gamma$ , or

$x=\gamma$ , or

$\gamma\in x$ . However,

$\gamma\notin\alpha$ , by definition of

$\gamma$ , so that

$x\neq\gamma$ . If

$\gamma\in x$ , then

$\gamma\in\alpha$ by transitivity of

$\alpha$ , but this contradicts the definition of

$\gamma$ . Hence,

$x\in\gamma$ , which means

$\alpha\subseteq\gamma$ . To conclude, we need the reverse inclusion

$\gamma\subseteq\alpha$ . Pick

$x\in\gamma$ . By minimality of

$\gamma$ in

$\beta\smallsetminus\alpha$ ,

$x$ cannot be in it, so that

$x\in\alpha$ necessarily. Hence, the reverse inclusion is obtained.
Suppose

$\alpha\not\subseteq\beta$ . Let

$\gamma:=\min(\alpha\smallsetminus\beta)$ be a minimum under membership. Again,

$\gamma\subseteq\beta$ is obtained as

$\gamma\subseteq\alpha$ was obtained above. If

$\gamma=\beta$ , then

$\beta\in\alpha$ which, by transitivity, gives us

$\beta\subseteq\alpha$ . Note that

$\gamma$ is an ordinal by item 2, since

$\gamma\in\alpha$ and

$\alpha$ is an ordinal. If

$\gamma\subsetneq\beta$ then

$\gamma\in\beta$ by the previous item, contradicting the definition of

$\gamma$ . Since

$\gamma\subseteq\beta$ , we are done.

Definition 14: $\operatorname{Ord}$ and its order. We call the class of all ordinals $\operatorname{Ord}$ and say $\alpha<\beta$ for $\alpha,\beta\in\operatorname{Ord}$ if $\alpha\in\beta$ .

Definition 15: Class. A class, or proper class, is any group of sets which all satisfy a formula.

Of course,

$\operatorname{Ord}$ is a class, because transitivity and

$\in$ -well-ordering can be expressed by formulas, and being an ordinal is just those two formulas linked with an

$\wedge$ .

Theorem 16: Ordinals, well-ordering, unions and successors.

$\operatorname{Ord}$ is totally ordered by $<$ ;
$\operatorname{Ord}$ is well-ordered by $<$ ; in fact, if $C$ is a nonempty class of ordinals, $\bigcap C$ is the least element of the class;
Given any set of ordinals $x$ (that is, a set $x$ such that for all $\alpha\in x$ we have $\alpha\in\operatorname{Ord}$ , $\beta:=\bigcup x$ is the least ordinal such that $\alpha\in\beta$ for all $\alpha\in x$ ;
For every ordinal $\alpha$ , the successor ordinal $\alpha+1:=\alpha\cup\{\alpha\}$ is an ordinal and it is the least ordinal $\alpha$ belongs to.

Proof.
Pick

$\alpha,\beta\in\operatorname{Ord}$ . Then, by the previous theorem point 4,

$\alpha\subseteq\beta\vee\beta\subseteq\alpha$ , but then by point 3 we have

$\alpha=\beta\vee\alpha\in\beta$ in the first case and

$\beta=\alpha\vee\beta\in\alpha$ in the second case, so ultimately we have

$\alpha<\beta\vee\alpha=\beta\vee\beta<\alpha$ , proving

$<$ is indeed a total order.
Take a subclass

$C\subseteq\operatorname{Ord}$ . Then, there will be a formula

$\phi$ such that

$C=\{x:\phi(x)\}$ , and

$\phi(x)\implies x\in\operatorname{Ord}$ . Pick any

$x\in C$ .

$x\cap C=\{y\in x:\phi(y)\}$ is a set of ordinals. If

$x\cap C=\varnothing$ ,

$x$ must be the least element, since contradicting this would mean producing

$y\in C:y\in x$ , but such a

$y$ can't exist. If

$x\cap C\neq\varnothing$ , then it has a

$\in$ -minimal element by well-ordering of the ordinal

$x$ under

$\in$ , and that is the minimal element of

$C$ . Call the minimal element

$z$ . Then,

$z\in y$ for all

$y\in x\cap C$ , meaning

$z\subseteq y$ for all

$y\in x\cap C$ .

$z\in x$ by construction. If

$x\in y\in C$ , by transitivity of

$y$ we have

$z\in y$ . Therefore,

$z\substeq y$ for all

$y\in C$ , meaning

$z\subseteq\bigcap C$ . But of course, since

$z\in C$ ,

$\bigcap C\subseteq z$ , so that

$z=\bigcap C$ .
Let's prove the union is transitive. Now, pick

$z\in\bigcup x$ . Then

$z\in y$ for some

$y\in x$ , but then all its members are in

$y$ because

$y$ is transitive, so that all elements of

$z$ are in the union, which is therefore transitive. Pick

$z,z'\in\bigcup x$ . Then

$z\in y,z'\in y'$ with

$y,y'\in x$ . By total ordering of ordinals, either

$y\in y'$ , or

$y'\in y$ , or

$y=y'$ . In any case, there is a

$y''\in x:z,z'\in y''$ , and

$y''$ is totally ordered so we conclude

$z\in z'\vee z=z'\vee z'\in z$ , thus proving

$\bigcup x$ is totally ordered under

$\in$ . Take

$S\subseteq\bigcup x$ . There must be at least one

$y\in x$ such that

$y\cap S\neq\varnothing$ , given

$S\subseteq\bigcup x$ , unless of course

$S=\varnothing$ which we assume against. Pick one such

$y$ . There must be a minimal element

$z\in y\cap S$ , since

$y$ is well-ordered and

$y\cap S\neq\varnothing$ . For all

$t\in y\cap S:t\neq z$ , we have

$z\in t$ .

$z\in y$ because

$z\in y\cap S\subseteq y$ . If

$y\in t\in S$ , by transitivity of

$t$ we will have

$z\in t$ . Thus,

$z$ is the minimal element of

$S$ , which proves

$\bigcup x$ is

$\in$ -well-ordered. Knowing it's an ordinal, call it

$U_x:=\bigcup x$ . Clearly

$U_x\in C_x$ , so that

$\min C_x=\bigcap C_x\subseteq U_x$ . However, by transitivity of

$\min C_x$ , and since

$\min C_x>y$ for all

$y\in x$ , we conclude

$y\subseteq\min C_x$ for all

$y\in x$ , so that

$U_x\subseteq\min C_x$ , giving us the desired equality.
Since

$\phi(x):=x\in\operatorname{Ord}\wedge x>\alpha$ is a formula (or can be expressed as a formula, that is, is a shorthand of a formula),

$C:=\{x:\phi(x)\}$ is a subclass, so it has a minimal element

$x_0$ . Now, for all

$x>\alpha$ , we must have

$x_0\subseteq x$ , so assuming

$\alpha+1$ is indeed an ordinal in

$C$ , then

$x_0\subseteq\alpha+1$ , which means either

$\alpha+1=x_0$ or

$x_0\in\alpha+1$ . However,

$\alpha+1\subseteq x$ for all

$x\in C$ , since

$\alpha<x$ means

$\alpha\in x$ and the

$x$ considered are transitive. Thus,

$\alpha+1\subseteq x_0=\bigcap C$ , giving us equality. We have already proved in Lemma 6 that

$\alpha+1$ is transitive. Take

$x,y\in\alpha+1$ . Then either both are in

$\alpha$ , in which case

$x\in y\vee y\in x\vee x=y$ by the fact

$\alpha$ is an ordinal, or

$x\in\alpha$ and

$y=\alpha$ , so that

$x\in y$ , or

$x=\alpha$ and

$y\in\alpha$ , so that

$y\in x$ , or both are

$\alpha$ , so that

$x=y$ . Hence,

$\alpha+1$ is totally ordered by

$\in$ . Take

$S\subseteq\alpha+1$ . Then either

$S\subseteq\alpha$ , so that it has a minimum by the fact

$\alpha$ is an ordinal, or

$S=S'\cup\{\alpha\}$ with

$S'\subseteq\alpha$ , and then

$\min S=\min S'$ which exists by

$\alpha$ being an ordinal. So the proof is complete.

So ordinals are well-ordered. But didn't we prove well-orders produce induction principles? Yep, but be careful: "well-ordered" on ordinals is an abuse of terminology, since as far as we know the class of ordinals isn't a set, and well-orders were always on sets. So let us state and prove an induction principle on ordinals.

Principle of Transfinite Induction. If $P$ is a formula, and if the fact that $P(\alpha)$ holds for all ordinals $\alpha\in\beta$ implies $P(\beta)$ holds, then $P$ holds for all ordinals.

Proof.
Let

$C:=\{x\in\operatorname{Ord}:\lnot P(x)\}$ . Assume by contradiction that

$C\neq\varnothing$ . Then there must be, by the above theorem, a minimal ordinal

$\alpha\in C$ . This means, however, that

$P(\beta)$ holds for all

$\beta\in\alpha$ , but then by assumption we have

$P(\alpha)$ , contradicting the definition of

$\alpha$ . Thus,

$C$ must be empty, which is the desired statement.

Now we want to define operations on ordinals like we did with cardinals. However, defining those is much more complicated than with cardinals. For one, we need transfinite recursion, so let me give a few definitions and a theorem.

Definition 16: operations and ordinals. Given a formula $\phi(x,y)$ , we say it defines an operations on sets if for all $x$ there is a unique $y$ such that $\phi(x,y)$ . In other words, if $\forall x\exists y:\forall z(\phi(x,z)\iff y=x)$ . We call the $y:\phi(x,y)$ by the notation $F_\phi(x)$ . Analogously, we say $\phi$ defines an operation on ordinals if the above is true $\forall x\in\operatorname{Ord}$ . Naturally, we will use that $F_\phi$ shorthand often, for example: $\{F_\phi(x):x\in y\}=\{z:\exists x\in y:\forall w\in y(\phi(x,w)\iff w=z\}.$ If $\psi$ is a one-variable formula, we will take $\psi(\{F_\phi(x):x\in y\})$ to mean that $\psi$ holds for the set $\{F_\phi(x):x\in y\}$ , which explicitly means $\exists z:[\forall w(w\in z\iff\exists x:x\in y\wedge\phi(x,w))]\wedge\psi(z)$ .

Naturally, if

$\phi$ defines an operation on sets, it will define a function on any set

$x$ according to Definition 6.

Theorem 17: Transfinite recursion. For any operation $F$ on sets, there is a unique operation $G$ on ordinals such that, for all ordinals $\alpha$ , we have: $G(\alpha)=F(\{G(\beta):\beta\in\alpha\}).$
Proof.
First of all, suppose

$G,G'$ are two such operations. By the property, since

$\{\beta\in\varnothing\}=\varnothing$ and hence

$\{G(\beta):\beta\in\varnothing\}=\varnothing=\{G'(\beta):\beta\in\varnothing\}$ , we must have

$G(\varnothing)=F(\varnothing)=G'(\varnothing)$ . Suppose now

$G\neq G'$ . Then there must be a least ordinal

$\alpha:G(\alpha)\neq G'(\alpha)$ , and it cannot be the empty set. But then

$G(\beta)=G'(\beta)$ for all

$\beta\in\alpha$ , and thus

$G(\alpha)=F(\{G(\beta):\beta\in\alpha\})=F(\{G'(\beta):\beta\in\alpha\})=G'(\alpha)$ , contradicting the definition of

$\alpha$ . Hence,

$G=G'$ , so if an operation as in the statement exists, it is unique. (And yes, that was totally transfinite induction in disguise :).)
To prove such an operation exists, consider the formula:

$\psi(\alpha,x):=\exists y\,\exists f\in y^\alpha:[\forall\xi\in\alpha,\,f(\xi)=F(\{f(\eta):\eta\in\xi\})]\wedge x=F(\{f(\xi):\xi\in\alpha\}).$ Assume that this defines an operation on ordinals. Then,

$G(\alpha)$ is that

$x$ , so that:

$G(\alpha)=F(\{f(\xi):\xi\in\alpha\}),$ but look at what

$f$ satisfies, and it is essentially

$f(\xi)=G(\xi)$ for

$\xi\in\alpha$ , so the operation from this formula, if it exists, is the one we're looking for. Assuming, of course, that the

$f$ s patch together, but we will see they do as we prove, by transfinite induction on

$\alpha$ , that this does indeed define an operation on ordinals.
First, let

$\alpha=\varnothing$ . For any

$y$ , the empty function

$\varnothing\to y$ vacuously satisfies the first condition, and

$F(\varnothing)$ satisfies the second one, so

$\psi(\varnothing,F(\varnothing))$ holds. Also, for

$x\neq F(\varnothing)$ ,

$\psi(\varnothing,x)$ cannot hold because the second condition cannot be satisfied. So

$G(\varnothing)=F(\varnothing)$ as we saw before.
Suppose now that, for all

$\beta\in\alpha$ , we have

$y_\beta,f_\beta\in y_\beta^\beta,x_\beta$ as in the formula, and that the

$f_\beta$ s patch, that is, if

$\gamma\in\beta$ , then

$y_\gamma\subseteq y_\beta$ and

$f_\beta|_{y_\gamma}=f_\gamma$ . Assume first that

$\alpha$ is not a successor ordinal. Then, the map

$\beta\mapsto y_\beta$ is well-defined on

$\alpha$ and gives us that

$\{y_\beta:\beta\in\alpha\}$ is valid, so we can set

$y_\alpha=\bigcup\{y_\beta:\beta\in\alpha\}$ , impose that

$f_\alpha|_\beta=f_\beta$ for all

$\beta\in\alpha$ , and define

$x_\alpha=F(\{f_\alpha(\beta):\beta\in\alpha\})$ . The formula is then satisfied for

$y_\alpha,f_\alpha,x_\alpha$ . If instead

$\alpha=\delta+1$ , then we need to add something to

$y_\delta$ to get the formula to be satisfied. By the first condition, we need

$f_\alpha(\delta)=F(\{f_\delta(\xi):\xi\in\delta\})$ , so we'll just need to set

$y_\alpha=y_\delta\cup\{F(f_\delta(\xi):\xi\in\delta\})\}$ and define

$f_\alpha(\delta\}:=F(\{f_\delta(\xi):\xi\in\delta\})$ , and set

$x_\alpha:=F(\{f_\alpha(\beta):\beta\in\alpha\})$ to satisfy the formula with

$y_\alpha,f_\alpha,x_\alpha$ . Thus, by transfinite induction, for all

$\alpha\in\operatorname{Ord}$ there is

$x_\alpha:\psi(\alpha,x_\alpha)$ . Suppose that the choice was unique for all

$\beta\in\alpha$ . Then, the construction is the unique possible choice, so that the uniqueness holds for

$\alpha$ , and using transfinite induction again we conclude that the uniqueness holds for all ordinals, concluding the proof of this theorem.

Actually, the above proof implicitly assumed that, for

$\alpha\in\operatorname{Ord}$ which is not a successor ordinal,

$\alpha=\bigcup\{\beta\in\operatorname{Ord}:\beta\in\alpha\}=\bigcup\{\beta\in\alpha\}$ . Indeed, in the "not successor ordinal" case of the construction of

$f_\alpha$ , we defined it on each

$\beta\in\alpha$ , and implicitly took that to conclude it was defined on

$\alpha$ , where as it was really only defined on the union of the members of

$\alpha$ . However, that assumption holds.

Lemma 7: Ordinals and their members. If $\alpha$ is an ordinal and there is no $\delta:\alpha=\delta+1$ , then $\alpha$ is the union of its members, that is $\alpha=\bigcup\alpha$ .

Proof.
Let

$y\in\alpha$ . By Theorem 15, we know

$y$ is an ordinal. Thus an ordinal is a set of ordinals. By Theorem 16,

$\bigcup\alpha$ is thus an ordinal. By transitivity of ordinals,

$\bigcup\alpha\subseteq\alpha$ . However, if

$\bigcup\alpha\in\alpha$ , then either

$\alpha=\bigcup\alpha+1$ or

$\bigcup\alpha\in\beta$ for some

$\beta\in\alpha$ . In the latter case, we would have

$\bigcup\alpha<\bigcup\alpha$ , so that

$\bigcup\alpha\in\bigcup\alpha$ , which negates the axiom of regularity. The former case is excluded by assumption. Hence, since Theorem 15 tells us that

$\bigcup\alpha\subsetneq\alpha$ implies

$\bigcup\alpha\in\alpha$ which is not true, we conclude

$\bigcup\alpha=\alpha$ , as desired.

Naturally, if

$\alpha=\delta+1$ , then

$\delta=\bigcup\alpha$ and

$\alpha=\bigcup\alpha+1$ .

Lemma 8: Ordinals and the naturals. All finite cardinals (or rather, the canonical representatives $n$ ) are ordinals. $\mathbb N$ is also an ordinal.

Proof.
We have seen

$\varnothing$ is an ordinal. Assume

$m$ is a cardinal for every

$m\in n$ . Then, in particular,

$n-1$ is, and

$n=n-1\cup\{n-1\}$ is an ordinal by Theorem 16. By induction,

$n$ is an ordinal for every

$n$ . Given that

$\mathbb N$ is the union of all

$n$ s, by Theorem 16 we conclude

$\mathbb N$ is also an ordinal. There is, in fact, another way to obtain

$\mathbb N$ . The axiom of infinity postulates there is a set

$x$ containing

$\varnothing$ which is closed under

$y\mapsto y\cup\{y\}$ , that is,

$\exists x:\varnothing\in x\wedge(y\in x\implies y\cup\{y\}\in x)$ . Take

$\omega$ as the intersection of the subsets

$r\subseteq x$ such that

$\varnothing\in r$ and

$r$ is closed under

$y\mapsto y\cup\{y\}$ , that is:

$\omega:=\bigcap\{r\in\mathcal P(x):\varnothing\in r\wedge(y\in r\implies y\cup\{y\}\in r\}=\{u\in x:\forall r\in\mathcal P(x),[\varnothing\in r\wedge(y\in r\implies y\cup\{y\}\in r)]\implies u\in r\}.$ This is clearly the same as

$\mathbb N$ .

We can now define a sum and a product of ordinals.

Definition 17: Ordinal arithmetic operations. We set:

$\alpha+\beta:=\left\{\begin{array}{cc} \alpha & \beta=0 \\ \bigcup\{(\alpha:\gamma)+1:\gamma\in\beta\} & \beta\neq0 \end{array}\right. \qquad\qquad\qquad \alpha\cdot\beta:=\left\{\begin{array}{cc} 0 & \beta=0 \\ \bigcup\{(\alpha\cdot\gamma)+\alpha:\gamma\in\beta\} & \beta\neq0 \end{array}\right.$
Ad now we have a lot of statements.

Theorem 18: About these operations.

The above operations are well-defined by transfinite recursion;
The above definition of sum coincides with $\alpha\cup\{\alpha\}$ when $\beta=1$ ;
$\gamma<\beta$ implies $\alpha+\gamma<\alpha+\beta$ and $\gamma+\alpha\leq\gamma+\beta$ , but not necessarily $\gamma+\alpha<\beta+\alpha$ ;
$\alpha\neq0\wedge\gamma<\beta$ implies $\alpha\cdot\gamma<\alpha\cdot\beta$ and $\gamma\cdot\alpha\leq\beta\cdot\alpha$ , but not necessarily $\gamma\cdot\alpha<\beta<\alpha$ ;
$0+\beta=\beta$ and $0\cdot\beta=0$ ;
$1\cdot\beta=\beta=\beta\cdot1$ ;
$\alpha+(\beta+1)=(\alpha+\beta)+1$ ;
$\alpha(\beta+1)=\alpha\cdot\beta+\alpha$ ;
For any nonempty set of ordinals $x$ we have $\alpha+\bigcup x=\bigcup\{\alpha+\beta:\beta\in x\}$ ;
For any nonempty set of ordinals $x$ we have $\alpha\cdot\bigcup x=\bigcup\{\alpha\cdot\beta:\beta\in x\}$ ;
$1+\omega=\omega\neq\omega+1$ ; in particular, addition is not commutative;
$2\cdot\omega=\omega\neq\omega\cdot2$ ; in particular, multiplication is not commutative;
Ordinal addition is associative;
Ordinal multiplication is associative;
Ordinal multiplication is distributive when it's from the left, that is $\alpha(\beta+\gamma)=\alpha\beta+\alpha\gamma$ , but not when it's on the right, as $(\omega+100)\cdot2=\omega\cdot2+100\neq\omega\cdot2+100=\omega\cdot2+100\cdot2$ ; example due to Quora post in references;
For any $n\in\omega$ we have $n+\omega=\omega$ .

Proof

Set $F(x):=\bigcup\{y\cup\{y\}:y\in x\}$ , and the operation by recursion on ordinals is ordinal sum. As for multiplication, we would define $F(x):=\bigcup\{y+\alpha:y\in x\}$ . Trouble is, this doesn't define an operation on sets, but only on sets of ordinals. However, the proof of Transfinite recursion works just the same, since we always feed sets of ordinals into $F$ .
Let's see what $\alpha+1$ is according to this definition. We must take all elements in 1, compute $\alpha+y$ , construct the successors, and unite the results. But only 0 is in 1, and $\alpha+\varnothing=\alpha$ , so the only set we must unite is precisely the old $\alpha+1$ .
Item 11 deals with the "not necessarily" part.
Let's try induction on this one. If $\alpha=\varnothing$ , then $\gamma+\alpha=\gamma<\beta=\beta+\alpha$ . If this works for all $\delta\in\beta$ , then by transitivity of ordinals we have $\gamma+\delta\subseteq\beta+\delta$ for all $\delta\in\alpha$ , so that $\gamma+\alpha=\bigcup\{\gamma+\delta:\delta\in\alpha\}\subseteq\bigcup\{\beta+\delta:\delta\in\alpha\}=\beta+\alpha$ , which proves the second part of this statement.
For the first part, let us first assume $\beta=\gamma+1$ and prove it in this case. By item 7 below, $\alpha+\beta=\alpha+(\gamma+1)=(\alpha+\gamma)+1>\alpha+\gamma$ , thus proving the desired inequality. Assume $\beta\neq\gamma+1$ . Then necessarily $\gamma+1<\beta$ , so if we prove $\alpha+(\gamma+1)\subseteq\alpha+\beta$ we are done. But looking at the definition of sum, the family we are uniting on the left is definitely a subfamily of the one for the right, so we are done.
Again, since $2>1$ but $1\cdot\omega=\omega=2\cdot\omega$ by item 12, the "not necessarily" part is done.
An induction argument similar to the one for the second part of the previous item settles the second part of this one. Note that the lax inequalities do not require $\alpha\neq0$ since, for $\alpha=0$ , we have equality by item 5.
Assume, once more, that $\beta=\gamma+1$ . By item 8, we can write $\alpha\cdot\beta=\alpha\cdot(\gamma+1)=\alpha\cdot\gamma+\alpha>\alpha\cdot\gamma$ since $\alpha\neq0$ . This settles this case. Assume otherwise. Then we have $\alpha\cdot\gamma<\alpha\cdot(\gamma+1)$ , so if $\alpha\cdot(\gamma+1)\subseteq\alpha\cdot\beta$ , we are done. But again, just look at the definition of the product, and clearly the left-side family is a subfamily of the right-side family, thus concluding the proof.
You know the drill: induction again. The base case is trivial: $0+0=0$ . If it works for $\alpha\in\beta$ , then $0+\beta=\bigcup\{(0+\alpha)+1:\alpha\in\beta\}=\bigcup\{\alpha+1:\alpha\in\beta\}=\min\{\gamma:\alpha\in\beta\implies\alpha<\gamma\}=\beta$ . The last equality is because, on one side, $\gamma>\alpha\implies\alpha\in\gamma\implies\alpha+1\subseteq\gamma$ su that $\bigcup\{\alpha\in\beta\}\subseteq\gamma$ and therefore in their intersection which is their minimum, but on the other side that union is itself one such $\gamma$ , and must therefore contain that minimum, so the two inclusions for that equality hold. So the first part is proved. Moving on to the next part. Again, the base case is trivial. Suppose it works for all $\alpha\in\beta$ . Then $0\cdot\beta=\bigcup\{0\cdot\alpha+0:\alpha\in\beta\}=\bigcup\{0:\alpha\in\beta\}=0$ , so that was easy.
$\beta\cdot1=\beta$ is by definition, since $\beta\cdot1=\bigcup\{\beta\cdot\alpha+\beta:\alpha\in1\}=\bigcup\{\beta\cdot0+\beta\}=\beta\cdot0+\beta=0+\beta=\beta$ as seen above. Let's use (guess!) induction on $1\cdot\beta$ . Base case trivial again. Assume it works for $\alpha\in\beta$ . Then $1\cdot\beta=\bigcup\{1\cdot\alpha+1:\alpha\in\beta\}=\bigcup\{\alpha+1:\alpha\in\beta\}$ . Since $\alpha+1\in\beta$ whenever $\alpha\in\beta$ by the minimality in Theorem 16 point 4, by transitivity of $\beta$ we also have $\alpha+1\subseteq\beta$ for all $\alpha\in\beta$ , so that $\bigcup\{\alpha+1:\alpha\in\beta\}\subseteq\beta$ . However, for any $\alpha\in\beta$ , we have $\alpha\in\alpha+1\subseteq\bigcup\{\alpha+1:\alpha\in\beta\}$ , so that equality is obtained as desired.
Let's write it out: $\begin{align*} \alpha+(\beta+1)={}&\bigcup\{(\alpha+\delta)+1:\delta\in\beta+1\}=\bigcup\{(\alpha+\delta)+1:\delta\in\beta\}\cup((\alpha+\beta)+1)=(\alpha+\beta)\cup((\alpha+\beta)+1)={} \\ {}={}&(\alpha+\beta)+1, \end{align*}$ and this is done.
Again, let's write it out: $\begin{align*} \alpha(\beta+1)={}&\bigcup\{\alpha\cdot\delta+\alpha:\delta\in\beta+1\}=\bigcup\{\alpha\cdot\delta+\alpha:\delta\in\beta\}\cup(\alpha\cdot\beta+\alpha)=(\alpha\cdot\beta)\cup(\alpha\cdot\beta+\alpha)=\alpha\cdot\beta+\alpha, \end{align*}$ hooray!
If $\bigcup x=\varnothing$ , it means either $x=\varnothing$ or $x=\{\varnothing\}$ . In the first case, our expected RHS is $\varnothing$ , which is why we must exclude this case. In the second case, the expected RHS is just $\alpha+0$ , so this is fine. Assume we are in neither of those cases. We write out the definition explicitly: $\alpha+\bigcup x=\bigcup\{(\alpha+\delta)+1:\delta\in\bigcup x\}\qquad\qquad\qquad\bigcup\{\alpha+\beta:\beta\in x\}=\bigcup\{\bigcup\{(\alpha+\delta)+1:\delta\in\beta\}:\beta\in x\}.$ So $y$ is in the RHS set iff it is in one of the $\bigcup\{(\alpha+\delta)+1:\delta\in\beta\}$ , so iff there are $\beta\in x,\delta\in\beta$ such that $y\in(\alpha+\delta)+1$ , but such a $\delta$ is by definition in $\bigcup x$ so that $y$ is in the RHS set if and only if it is in $(\alpha+\delta)+1$ for $\delta\in\bigcup x$ , that is iff it is in the LHS set, thus proving equality.
Let's do something like the above, but with a single equality chain: $\bigcup\{\alpha\cdot\beta:\beta\in x\}=\bigcup\{\bigcup\{\alpha\cdot\delta+\alpha:\delta\in\beta\}:\beta\in x\}=\bigcup\{\alpha\cdot\delta+\alpha:\delta\in\bigcup x\}=\alpha\cdot\bigcup x.$
That $\omega+1\neq\omega$ is obvious: $\omega\notin\omega$ yet $\omega\in\omega\cup\{\omega\}=\omega+1$ . By definition of sum, we have: $1+\omega=\bigcup\{1+n:n\in\omega\}=\bigcup\{m:m\in\omega\smallsetminus\{0\}\},$ so it is definitely included in $\omega$ , but is there anything outside it that is still in $\omega$ ? I mean, $\omega=\bigcup\{n:n\in\omega\}$ , so all we are missing is uniting the above with 0, that is the empty set, so adding it in is superfluous, which means $1+\omega=\omega$ .
By definition of product we have the following: $\begin{align*} \omega\cdot2={}&(\omega\cdot1+\omega)\cup(\omega\cdot0+\omega)=(\omega+\omega)\cup\omega=\omega+\omega; \\ 2\cdot\omega={}&\bigcup\{2\cdot n+2:n\in\omega\}=\bigcup\{n:n\text{ is even}\wedge n\neq0\}. \end{align*}$ The first line already gives $\omega\cdot2\neq\omega$ , since $\omega+\omega>\omega+1>\omega$ . Given $n\subseteq n+1$ for all $n$ , and for $n$ odd we have $n+1$ is even and thus in the union of the second line, and the $n=0$ case is the empty set so we can just add it in, we conclude that union is just $\omega$ .
We first eliminate the chance of a zero term. If $\alpha=0$ , then: $(\alpha+\beta)+\gamma=(0+\beta)+\gamma=\beta+\gamma=0+(\beta+\gamma)=\alpha+(\beta+\gamma).$ If $\beta=0$ , then: $(\alpha+\beta)+\gamma=(\alpha+0)+\gamma=\alpha+\gamma=\alpha+(0+\gamma)=\alpha+(\beta+\gamma).$ If $\gamma=0$ , then: $(\alpha+\beta)+\gamma=\alpha+\beta=\alpha+(\beta+0)=\alpha+(\beta+\gamma).$ With that out of the way, we proceed by induction on the third addendum. A nice chain of equalities starts our proof: $\begin{align*} (\alpha+\beta)+\gamma={}&\bigcup\{[(\alpha+\beta)+\delta]+1:\delta\in\gamma\}\mathring=\bigcup\{[\alpha+(\beta+\delta)]+1:\delta\in\gamma\}={} \\ {}={}&\bigcup\{(\alpha+\epsilon)+1:\epsilon\in\beta+\gamma\wedge(\beta\in\epsilon\vee\beta=\epsilon)\}. \end{align*}$ The ringed equal is where I used induction. So this is very similar to $\alpha+(\beta+\gamma)$ , there is just that restriction on $\epsilon$ which bothers us: $\alpha+(\beta+\gamma)=\bigcup\{(\alpha+\epsilon)+1:\epsilon\in\beta+\gamma\}.$ So whatever is in the second but not the first must be within $\bigcup\{(\alpha+\epsilon)+1:\epsilon\in\beta\}=\alpha+\beta$ . However, by Theorem 18 point 3, since $\gamma>0$ we have $\alpha+\beta<(\alpha+\beta)+\gamma$ , and in particular our desired inclusion holds. So associativity of ordinal sum is proved.
To prove $\alpha\cdot(\beta\cdot\gamma)=(\alpha\cdot\beta)\cdot\gamma$ , we proceed by induction on $\gamma$ , following ProofWiki. The base case $\gamma=0$ is easy: $(\alpha\cdot\beta)\cdot\gamma=(\alpha\cdot\beta)\cdot0=0=\alpha\cdot0=\alpha\cdot(\beta\cdot0)=\alpha\cdot(\beta\cdot\gamma).$ If $\gamma\neq0$ , it is either a successor or a limit ordinal. Assume $\gamma=\delta+1$ . Then: $\alpha\cdot(\beta\cdot\gamma)=\alpha\cdot(\beta\cdot(\delta+1))=\alpha\cdot(\beta\cdot\delta+\beta)\mathbin{\mathring=}\alpha\cdot(\beta\cdot\delta)+\alpha\cdot\beta\mathbin{\mathring{\mathring=}}(\alpha\cdot\beta)\cdot\delta+\alpha\cdot\beta\mathbin{\mathring=}(\alpha\cdot\beta)\cdot(\delta+1)=(\alpha\cdot\beta)\cdot\gamma,$ where single rings denote use of left distributivity of multiplication (aka the next item) and double rings denote the induction hypothesis in use. If $\gamma$ is a limit ordinal, which means it's nonzero and not a successor, then $\beta\cdot\gamma$ also is a limit ordinal, by Lemma 11 way down below. As can be inferred by the proof of that lemma, we then have: $\alpha\cdot(\beta\cdot\gamma)=\bigcup\{\alpha\cdot\epsilon:\epsilon\in\beta\cdot\gamma\},\qquad\qquad(\alpha\cdot\beta)\cdot\gamma=\bigcup\{(\alpha\cdot\beta)\cdot\epsilon:\epsilon\in\gamma\},$ where the removal of "+ first multiplied ordinal" in the union is thanks to the second multiplied ordinal being a limit ordinal. If $\epsilon\in\beta\cdot\gamma$ , then since $\beta\cdot\gamma=\bigcup\{\beta\cdot\delta:\delta\in\gamma\}$ we must have $\epsilon\in\beta\cdot\delta$ for some $\delta\in\gamma$ . But then, using the appropriate combination of the above points and the induction hypothesis, we have: $\alpha\cdot\epsilon\leq\alpha\cdot(\beta\cdot\delta)=(\alpha\cdot\beta)\cdot\delta\leq(\alpha\cdot\beta)\cdot\gamma.$ But then $\alpha\cdot\epsilon\subseteq(\alpha\cdot\beta)\cdot\gamma$ , which means: $\alpha\cdot(\beta\cdot\gamma)\leq(\alpha\cdot\beta)\cdot\gamma.$ The reverse inequality is what is now left to prove. Pick $\delta\in\gamma$ , and by induction plus points above you have: $(\alpha\cdot\beta)\cdot\delta=\alpha\cdot(\beta\cdot\delta)\leq\alpha\cdot(\beta\cdot\gamma),$ and since this works for every $\delta\in\gamma$ we can conclude the reverse inequality, thus completing the proof of associativity.
This proof was vaguely inspired by the equality chain in this post. We know the claim holds for $\gamma=1$ , and for $\gamma=0$ it's trivial. Let's proceed by induction. Using the induction hypothesis on $\gamma$ and points 8-10 and 13, we can write: $\begin{align*} \alpha(\beta+\gamma)={}&\alpha\cdot\bigcup\{\beta+\delta+1:\delta\in\gamma\}=\bigcup\{\alpha(\beta+\delta+1):\delta\in\gamma\}=\bigcup\{\alpha(\beta+\delta)+\alpha:\delta\in\gamma\}={} \\ {}={}&\bigcup\{(\alpha\beta+\alpha\delta)+\alpha:\delta\in\gamma\}=\bigcup\{\alpha\beta+(\alpha\delta+\alpha):\delta\in\gamma\}=\alpha\beta+\bigcup\{\alpha\delta+\alpha:\delta\in\gamma\}={} \\ {}={}&\alpha\beta+\alpha\gamma, \end{align*}$ which proves our claim and completes the proof of this theorem.

Why did we define ordinals in this weird way, and not with some equivalence relation like cardinals? Well, first let me define that relation.

Definition 18: Order-isomorphism. Let $(X,\leq_X),(Y,\leq_Y)$ be posets. We say they are order-isomorphic if there exists a bijection $f:X\to Y$ such that $f(x)\leq_Yf(x')\iff x\leq_Xx'$ . If two posets are order-isomorphic, we write $(X,\leq_X)\cong(Y,\leq_Y)$ .

It is easy to see that, if

$(X,\leq_X),(Y,\leq_Y)$ are order-isomorphic, then one is well-ordered iff the other is. Let us see if every class of well-order-isomorphisms (order-isomorphisms between well-orders) has at least one ordinal in it.

Theorem 19: Ordinals and order-isomorphims. If $(X,\leq)$ is a well-ordered poset, there exists a unique ordinal $\alpha_{(X,\leq)}$ such that $(X,\leq)\cong(\alpha_{(X,\leq)},\in)$ .

Proof.
Let us prove by induction that for all

$x\in X$ there is a unique

$F(x)\in\operatorname{Ord}$ such that

$(\{y\in X:y<x\},\leq)\cog(F(x),\in)$ . First of all, since

$X$ is well-ordered, we call

$x+1:=\min\{y\in X:y>x\}$ . Pick

$x\in X$ and suppose

$F(y)$ is defined for all

$y<x$ . If

$x=y+1$ for some

$y$ , then clearly

$F(x):=F(y)+1=F(y)\cup\{F(y)\}$ is the right choice. At this point,

$F$ is a function on

$\{y:y<x\}$ , so by replacement

$\{F(y):y<x\}$ is a valid set of ordinals. If

$x$ is not a successor (i.e. there is no

$y:x=y+1$ ), then we will define

$F(x):=\bigcup\{F(y):y<x\}$ , which is the union of a set of ordinals, and thus an ordinal by Theorem 16 point 3. This is the right choice. By induction,

$F$ is now defined on the whole of

$X$ , so we can define

$\bigcup\{F(x):x\in X\}$ , and that is an ordinal isomorphic to

$X$ . We have found one, we must prove uniqueness. Suppose

$\alpha,\beta$ are order-isomorphic. Since any two ordinals are comparable, if we assume by contradiction that

$\alpha\neq\beta$ , then we can, modulo renaming, assume

$\alpha\in\beta$ . Pick an order-isomorphism

$f:\beta\to\alpha$ . By definition,

$f(\varnothing)=\varnothing$ , otherwise there must be

$\gamma\in\beta:f(\gamma)=\varnothing$ and then

$\gamma>\varnothing$ but

$f(\gamma)<f(\varnothing)$ .

$f(1)$ must then be the least element above

$\varnothing$ , so 1. Going on like this,

$f(n)=n$ for all

$n$ . But then

$f(\omega)=\omega$ , because it must be the least element greater than all naturals. In general,

$f(\gamma)=\min\{\delta\in\beta:\epsilon<\gamma\implies f(\epsilon)<\delta\}$ , so by

$\epsilon$ -induction we must conclude

$f(\gamma)=\gamma$ for all

$\gamma$ . But then what about

$f(\alpha)$ ? It must be

$\alpha$ , yet

$\alpha\notin\alpha$ ! Contradiction: such an order-isomorphism cannot exist. Except it does, which means

$\alpha=\beta$ necessarily.

An obvious corollary is the following.

Corollary 5: Order-isomorphisms and ordinals. Whenever $f:\beta\to\alpha$ is an order-isomorphism, $\beta=\alpha$ .

Now I take a statement and proof from Wikipedia.

Hartogs' Lemma. For every set $X$ , there is a least ordinal that cannot be injected into $X$ , that is a least $\beta\in\operatorname{Ord}$ such that there is no injective $f:\beta\to X$ .

Proof. I will present two proofs: one constructs it from above, one from below. The first one, which I just came up with, notes that

$\{\beta\in\operatorname{Ord}:\nexists f:\beta\to X\text{ injective}\}$ can be expressed with a valid formula and is thus a class of ordinals which, by Theorem 16 point 2 has a least element, and we're done.
The second one considers the set:

$\alpha:=\{\beta\in\operatorname{Ord}:\exists f:\beta\to X\text{ injective}\}.$ First we must prove it is a set, and not just a class of ordinals. We already know

$2^{X\times X}$ is a set. Just for the fun of it, let's try defining the

$W$ of Wikipedia's proof explicitly with a formula:

$W:=\left\{\begin{array}{l} R\in2^{X\times X}:\exists S\subseteq X:(s\in s\implies(s,s)\in R)\wedge[(s,s'\in S\wedge(s,s')\in R\wedge(s',s)\in R)\implies s=s']\wedge{} \\ {}\wedge[s,s',s''\in S\wedge(s,s')\in R\wedge(s',s'')\in R)\implies(s,s'')\in S]\wedge[s,s'\in S\implies((s,s')\in R\vee(s',s)\in R)]\wedge[T\subseteq S\implies\exists t\in T:(t'\in t\implies(t,t')\in R)] \end{array}\right\}.$ Whoa. What is that, you say. Well, it is a set of subsets of

$X\times X$ , so it is a set of relations on

$X$ . The hypothesis is that there is

$S$ a subset of

$X$ which satisfies five conditions. The first one is reflexivity. The second one is antisymmetry. The third one is transitivity (in the sense of relations, not in the sense of transitive sets given at the beginning of the spoiler). All three of these are restricted to

$S$ , so this makes

$(S,R)$ a poset. The fourth condition says that for any two elements of

$S$ at least one ordered pair formed from then is in

$R$ , or in other words

$R$ is a total order on

$S$ . The fifth condition says that every subset of

$S$ has a minimal element w.r.t.

$R$ , so that

$(S,R)$ is a well-ordered poset. So the set we defined, which is a valid set by specification, is the set of all relations on

$X$ that are well-orders on some

$S\subseteq X$ . Now consider

$2^W$ , and within it the set of all isomorphic well-orders. Well, let's define "isomorphic". First, given

$w\in W$ , we need to define its domain. Set:

$\operatorname{Dom}(w):=\bigcup\{S\subseteq X:(S,R)\text{ is a well-ordered poset}\}.$ The family being united on the right is a valid set since "is a well-ordered poset" is a valid formula (which is huge as seen above so no way I'm gonna retype it all), so specification gives us our conclusion. Moreover,

$(\operatorname{Dom}(w),w)$ can still be proven to be a well-ordered poset – I leave this to you as an exercise, with the hint that, given

$S\subseteq\operatorname{Dom}(w)$ , to find a minimal element you should consider

$S\cap T$ where

$T$ is one of the sets that unite to

$\operatorname{Dom}(w)$ . Now consider:

$C:=\{(w,w'):\exists f:\operatorname{Dom}(w)\to\operatorname{Dom}(w'):(\forall x\in\operatorname{Dom}(w')\exists!y\in\operatorname{Dom}(w):f(y)=x)\wedge[(x,x'\in\operatorname{Dom}(w))\implies(x<x'\iff f(x)<f(x'))]\}.$ This is a relation on

$W$ , and

$(w,w')$ is in it iff the two are order-isomorphic, that is, the domains are in bijection and a bijection exists that preserves the order. Since we specified

$C$ is specified via a formula, it is a valid set. The equivalence classes of

$W$ w.r.t.

$C$ are valid sets because

$(w,w')\in C$ is a valid formula and

$[w]_C:=\{w'\in W:(w,w')\in C\}$ . Oh and of course

$C$ is an equivalence relation, that is easy. The quotient is also valid, because

$\exists w\in W:Y=[w]_C$ is a valid formula. Because of the previous Theorem, given any class

$[w]_C$ , picking a representative we can find a unique ordinal

$\beta_w$ that is order-isomorphic to the representative, and therefore we have an injective map

$f:{}^W/_C\to\operatorname{Ord}$ whose image is a valid set by replacement, but the image is just

$\alpha$ , which is finally shown to be a valid set. And no, we could not just say "it is defined by a formula so it is a set", because specification only works when starting from sets, and

$\operatorname{Ord}$ is, as far as we know, only a class, not a set. In fact,

$\alpha$ is itself an ordinal. Indeed, we know it is well-ordered by

$\in$ because so is the whole class of ordinals, so we need to show it is transitive. If

$\gamma\in\beta\in\alpha$ , it means there is

$w\in W$ such that

$(\operatorname{Dom}(w),w)$ is order-isomorphic to

$(\beta,\in)$ , but then you can consider

$w'\subseteq w$ defined by

$w':=\{(s,s')\in w':s,s'\in T\}$ where

$T\subseteq\operatorname{Dom}(w)$ is in bijection with

$\gamma$ . In fact, since

$(\operatorname{Dom}(w),w)\cong(\beta,\in)$ , there must be an order-isomorphism

$f:\beta\to\operatorname{Dom}(w)$ , so that our

$T$ can be chosen to be

$f(\gamma)\}. This easily yields \([w']_C$ maps to

$\gamma$ via the injection

${}^X/_C\to\operatorname{Ord}$ we mentioned above, so that

$\gamma\in\alpha$ , which makes

$\alpha$ transitive. Suppose there is

$w\in W:[w]_C\leadsto\alpha$ . That would put

$\alpha\in\alpha$ , which is impossible. This means

$\alpha$ cannot be injected into

$X$ , for if it were, we could induce a well-order

$w$ with domain the image of

$\alpha$ via that injection, which would make

$[w]_C\leadsto\alpha$ , which we proved impossible a second ago. Any ordinal less than

$\alpha$ is in

$\alpha$ and thus can be injected into

$X$ . So

$\alpha$ is the ordinal we were looking for.

Definition 19: Hartogs number. Given a set $X$ , we call Hartogs number of

$X$ the ordinal from the previous Lemma, and denote it $\aleph(X)$ . If $\kappa$ is a cardinal, we call $\aleph(\kappa):=\aleph(X)$ for any $X\in\kappa$ . Which representative we take doesn't matter because it is easy to show that $\aleph(X)=\aleph(Y)$ whenever $X,Y$ are in bijection.

Indeed, the class of ordinals injectable into

$X$ is the same as that of those injectable into

$Y$ because any injection

$\alpha\to X$ gives one

$\alpha\to Y$ by composing with a bijection

$X\to Y$ and any injection

$\alpha\to Y$ gives one into

$X$ by composing with a bijection the other way, so the construction in the proof above gives the same ordinal for both.

Lemma 9: More ordinals. We have seen $n\in\mathbb N$ are ordinals as is $\mathbb N$ , called $\omega$ . Of course, with the sum, we have $\omega+1,\omega+2,\dotsc$ , and $\omega+\omega$ is the smallest ordinal such that $\omega+n$ is in it for every $n$ , and it is the union of all those. We can go on, and wind up with $(\omega+\omega)+\omega=\omega\cdot3$ . Go on like this, and we can define $\omega\cdot\omega=:\omega^2$ . Continue on, and $\omega^3,\omega^4,\dotsc,\omega^\omega$ , and we can go on and on and on, always staying (if choice holds) with countable ordinals! Of course, there are uncountable ordinal, and we call $\omega_1:=\aleph(\mathcal P(\mathbb N))$ , the least uncountable ordinal.

Proof.

$|\omega+1|=|\omega|+|1|=|\omega|$ . Similarly,

$\omeega+n$ are all countable. But then their union is, assuming choice. So

$\omega\cdot2$ is countable. Operate like that on

$\omega\cdot2+1,\dotsc$ , and

$\omega\cdot3$ is countable. Go on like that, and every

$\omega\cdot n$ is countable, so

$\omega^2$ is a countable union of countables, hence countable. And so on and so on. The only thing to prove was that those were countable, so the proof is done.

Exercise. Figure out how to prove $\omega\cdot n$ is countable without using choice.

So now the proof of Theorem 13 point 3 part 2 has no gaps: what was used there is precisely the Hartogs number of a cardinal. Now we must complete the proofs of points 10, 16, and 29 of Theorem 14. Concerning point 29, I just need to expand on «The last statement in this bullet is that choice implies well-ordering of the cardinal. That is because choice, via the well-ordering theorem, implies we can associate an ordinal to each cardinal in a well-defined way, and since we can prove ordinals are well-ordered, the restricted ordering on the ordinals associated to cardinals is a well-order for the cardinals. Details of this in the part below about ordinals.». Here is the statement in question.

Lemma 10: Choice and cardinal ordering. If choice holds, then the cardinals are totally ordered. This means that, for any $X,Y$ , either $|X|\leq|Y|$ or $|Y|\leq|X|$ .

The association mentioned in the quote above is the following.

Definition 20: Initial ordinal of a cardinal. Given a cardinal $\kappa$ , we call initial ordinal of

$\kappa$ the least ordinal $\alpha:|\alpha|=\kappa$ , and denote it: $I(\kappa):=\min\{\alpha\in\operatorname{Ord}:|\alpha|=\kappa\}.$ This definition only makes sense if choice is assumed, because then we can guarantee that the RHS is well-defined since we have a non-empty subclass of $\operatorname{Ord}$ and such a subclass has a minimum by Theorem 16 point 2.

Proof of Lemma 10.
If

$|X|\leq|Y|$ , we are done. Assume then that no injection exists from

$X$ into

$Y$ . Consider

$I(|X|),I(|Y|)$ . Those two ordinals are comparable by Theorem 15 points 3-4, but if

$I(|X|)\leq I(|Y|)$ , then we'd have

$f:I(|X|)\to I(|Y|)$ injective, which composed with bijections

$I(|X|)\to X,I(|Y|)\to Y$ (maybe opportunely inverted) gives an injection

$X\to Y$ , which doesn't exist. Hence,

$I(|Y|)<I(|X|)$ , which with those bijections gives

$|Y|\leq|X|$ . In fact, since

$X=Y$ was excluded, we have

$X<Y$ .

As for point 16, I will prove more. But first I need a definition.

Definition 21: Limit ordinals, initial ordinals, $\omega_\alpha,\aleph_\alpha$ . An ordinal $\alpha$ is called limit ordinal if $\nexists\beta:\alpha=\beta+1$ and $\alpha\neq0$ . Obviously any ordinal is either a limit ordinal, a successor, or zero. An ordinal $\alpha$ is called initial ordinal if $\beta<\alpha\implies|\beta|<|\alpha|$ . Set $\omega_0:=\omega$ . Recursively define $\omega_\alpha$ for $\alpha\in\operatorname{Ord}$ to be the initial ordinal such that the set of all $\omega_\beta$ for $\beta\in\alpha$ has order type $\alpha$ . For any initial ordinal, the initial ordinals within it are a well-ordered poset, so they have an order type. Thus, every initial ordinal is an $\omega_\alpha$ . Of course, $\alpha\mapsto\omega_\alpha$ is a bijection. $\omega_\alpha$ will be called

$\alpha$ th initial ordinal. Its cardinality will be $\aleph_\alpha:=|\omega_\alpha|$ .

We now have a lemma used in one point of Theorem 18.

Lemma 11: Limit ordinals are closed by left products. If $\alpha,\beta\in\operatorname{Ord}$ , $\alpha\neq0$ , and $\beta$ is a limit ordinal, then $\alpha\cdot\beta$ is also a limit ordinal.

Proof.
I take the proof from here.

$\alpha\neq0$ by assumption and

$\beta\neq0$ by definition. Suppose by contradiction that

$\alpha\cdot\beta=0$ . Given

$\alpha\neq0$ , note that

$\alpha\cdot\beta=\alpha\cdot0$ . Now, if

$\beta\neq0$ , we would have

$\beta>0$ , and we showed in point 4 that this yields

$\alpha\cdot0<\alpha\cdot\beta$ , which contradicts

$\alpha\cdot\beta=\alpha\cdot0$ . So

$\alpha\cdot\beta\neq0$ . To contradict that

$\alpha\cdot\beta$ is a limit ordinal, we thus have only one option: assume

$\alpha\cdot\beta=\gamma+1$ . Write out the product:

$\alpha\cdot\beta=\bigcup\{\alpha\cdot\delta+\alpha:\delta\in\beta\}=\bigcup\{\alpha\cdot\epsilon:\epsilon\in\beta\wedge\epsilon\neq0\}=\bigcup\{\alpha\cdot\epsilon:\epsilon\in\beta\}.$ The penultimate equality is because

$\beta$ is closed under successor. Indeed, if

$\delta\in\beta$ ,

$\delta+1=\min\{\sigma:\delta\in\sigma\}\leq\beta$ , but

$\beta=\delta+1$ cannot be because

$\beta$ is a limit ordinal, so

$\delta+1\in\beta$ . The last equality is because

$\alpha\cdot0=0\subseteq\alpha=\alpha\cdot1$ and

$\alpha\cdot1$ is the term for

$\epsilon=1$ in that union and

$\epsilon=1\neq0$ is allowed as

$\beta$ cannot be 1 as otherwise it would be

$0+1$ . If:

$\gamma+1=\bigcup\{\alpha\cdot\epsilon:\epsilon\in\beta\},$ then there is

$\zeta\in\beta:\gamma\in\alpha\cdot\zeta$ , so that

$\gamma+1\in\alpha\cdot\zeta+1\subseteq\alpha\cdot\zeta+\alpha=\alpha\cdot(\zeta+1)\subseteq\alpha\cdot\beta$ , except this means

$\alpha\cdot\beta\in\alpha\cdot\beta$ , which goes against the axiom of regularity. So

$\alpha\cdot\beta$ is a limit ordinal after all.

Lemma 12: Squares of infinite cardinals and choice. $\kappa^2=\kappa$ for all infinite cardinals $\kappa$ if and only if the axiom of choice holds.

Proof.
We first follow this answer to prove choice implies

$\kappa^2=\kappa$ for infinite

$\kappa$ . By induction on

$\alpha$ , we will prove

$\omega_\alpha$ and

$\omega_\alpha\times\omega_\alpha$ , when opportunely ordered, are order-isomorphic. This clearly implies

$|\omega_\alpha|=|\omega_\alpha\times\omega_\alpha|$ . Since choice is assumed, all sets can be well-ordered, meaning every cardinal is the cardinal of an initial ordinal, and thus an

$\aleph_\alpha$ , but we would just have proved

$\aleph_\alpha^2=\aleph_\alpha$ for all

$\alpha\in\operatorname{Ord}$ , so this would complete the proof. Let us now construct the "opportune" well-order of the cartesian square. We say:

$(x,y)<(x',y')\iff\left\{\begin{array}{c} \max\{x,y\}<\max\{x',y'\} \\ \max\{x,y\}=\max\{x',y'\}\wedge x<x' \\ \max\{x,y\}=\max\{x',y'\}\wedge x=x'\wedge y<y' \end{array}\right.$ The system is meant as condition 1 or condition 2 or condition 3. Let's see this is a well-order. First of all, any two pairs are comparable because the maxima are, the first components are, and the seconds are, so if the maxima are different we have an inequality, otherwise we compare the first components, if they differ we have an order, otherwise we compare the second components and whatever happens to them is what happens to the pairs. Pick

$S\subseteq\omega_\alpha\times\omega_\alpha$ . Then

$\{\max\{x,y\}:(x,y)\in S\}$ is a subset of

$\omega_\alpha$ and is nonempty if

$S\neq\varnothing$ , so it must have a minimum

$m$ . Consider

$S_m:=\{(x,y)\in S:\max\{x,y\}=m\}$ . This is nonempty. If it is a singleton, then the single element is the minimum of

$S$ . Otherwise, consider

$\{\pi_1(t):t\in S_m\}$ . This must have a minimum

$m_1$ , because it is a nonempty subset of

$\omega_\alpha$ . Consider

$S_{m_1}:=\{(x,y)\in S:x=m_1\}$ . This is nonempty. If it is a singleton, it contains only the minimum of

$S$ . Otherwise, consider

$\{\pi_2(z):z\in S_{m_1}\}$ . this is again a nonempty subset of

$\omega_\alpha$ , so it must have a minimum

$m_2$ . At this point we know

$(m_1,m_2)\in S$ , and since it is less than anything else in

$S$ , we have the minimum. So this is indeed a well-order. The map

$\alpha\mapsto(\alpha,\alpha)$ shows that

$\omega_\alpha\times\omega_\alpha$ has an order type at least equal to

$\omega_\alpha$ , that is

$\omega_\alpha\leq\omega_\alpha\times\omega_\alpha$ . For

$\alpha=0$ , consider

$S_k:=\{(m,n)\in\omega\times\omega:(m,n)<(k,k)\}$ . The definitory inequality implies

$m<k\vee n<k$ , so

$S_k$ is finite. Pick

$(m,n)\in\omega\times\omega$ . Then

$(m,n)\in S_{\max\{m,n\}+1}$ , so that

$\omega\times\omega=\bigcup_{k\in\omega}S_k$ . But then the order type of

$\omega\times\omega$ is the least order type greater than all finite numbers, that is

$\omega$ . So the basis for the induction is proved. Assume now that

$\alpha$ is the least ordinal such that

$\omega_\alpha\lneq\omega_\alpha\times\omega_\alpha$ . There will then be

$(\beta,\gamma)\in\omega_\alpha\times\omega_\alpha$ such that

$\{(\sigma,\zeta)\in\omega_\alpha\times\omega_\alpha:(\sigma,\zeta)<(\beta,\gamma)\}$ has order type

$\omega_\alpha$ .

$\max\{\gamma,\beta\}\in\omega_\alpha$ , and, since

$\omega_\alpha$ is an initial ordinal,

$\omega_\alpha$ has no predecessor, and in particular

$\max\{\gamma,\beta\}+1\neq\omega_\alpha$ , so we can pick

$\delta:\max\{\gamma,\beta\}<\delta<\omega_\alpha$ .

$(\beta,\gamma)<(\delta,\delta)$ , so

$\{(\sigma,\zeta)<(\delta,\delta)\}$ has an order type of at least

$\omega_\alpha$ , given it contains

$\{(\sigma,\zeta)<(\beta,\gamma)\}$ which has order type

$\omega_\alpha$ . But

$|\delta|<\aleph_\alpha$ , so by inductive hypothesis

$|\delta|=|\delta|^2$ , and

$\{(\sigma,\zeta)<(\delta,\delta)\}$ is essentially

$\delta\times\delta$ , which is a contradiction. So choice implies

$\kappa^2=\kappa$ for infinite cardinals.
The other implication is taken from here. We begin with an intermediate Lemma.

Lemma 13: Cardinal and aleph-shenanigans. Suppose $\kappa$ is a cardinal and $\alpha$ an ordinal, then if: $\kappa\cdot\aleph_\alpha=\kappa+\aleph_\alpha,$ either $\kappa\leq\aleph_\alpha$ or $\aleph_\alpha\leq\kappa$ . In particular, if $\aleph_\alpha=\aleph(\kappa)$ , then any representative of $\kappa$ is well-orderable.

Proof.
Pick

$X\in\kappa,Y\in\aleph_\alpha$ . Call

$A:=X\times\{1\}$ and

$B:=Y\times\{\{1\}\}$ . Those are disjoint sets which represent our two cardinals. Given that

$\kappa+\aleph_\alpha=\kappa\cdot\aleph_\alpha$ , we can assume there are

$A'\in\kappa,B'\in\aleph_\alpha$ disjoint such that

$A\times B=A'\cup B'$ . Consider the set

$S_a:=\{(a,b):b\in B\}\subseteq A\times B$ . If there is an

$a\in A$ such that

$S_a\subseteq A'$ , then

$b\mapsto(a,b)$ is an injection

$B\to A'$ , which makes

$\aleph_\alpha\leq\kappa$ . Otherwise, for every

$a\in A$ , consider

$T_a:=\{(a,b):b\in B,(a,b)\notin A'\}$ . This is nonempty and included in

$B'$ , which can be well-ordered because its cardinal is an aleph, so it has a minimum

$b_a:=\min T_a$ . The map

$a\mapsto(a,b_a)$ is then injective from

$A$ to

$B'$ , which makes

$\kappa\leq\aleph_\alpha$ . If

$\aleph_\alpha=\aleph(\kappa)$ , then

$\aleph_\alpha\leq\kappa$ is excluded, so

$\kappa\leq\aleph_\alpha$ , and well-orderings of representatives of

$\aleph_\alpha$ induce well-orderings of representatives of

$\kappa$ .

Now pick

$\kappa$ an infinite cardinal and consider

$\aleph(\kappa)$ . We are assuming

$\lambda^2=\lambda$ for infinite cardinals, so:

$\kappa+\aleph(\kappa)=(\kappa+\aleph(\kappa))^2=\kappa^2+2\kappa\cdot\aleph(\kappa)+\aleph(\kappa)^2\geq\kappa\cdot\aleph(\kappa)\geq\kappa+\aleph(\kappa),$ where the last

$\geq$ is proved as in Theorem 14 point 31. In the end,

$\kappa\cdot\aleph(\kappa)=\kappa+\aleph(\kappa)$ , which by the intermediate lemma implies representatives of

$\kappa$ are well-orderable. Now pick a set

$X$ . If it is finite, it is easy to well-order it: any total order will do. Besides, a bijection with a finite subset of the naturals per se induces a well-order. If it is infinite, then it belongs to an infinite cardinal, but we proved infinite cardinals have well-orderable representatives, which makes

$X$ well-orderable. So we proved the well-ordering theorem, which is equivalent to the Axiom of Choice.

The last thing left is the formula for cardinal sum.

Lemma 14: Formula for cardinal sum. If choice holds, then $\kappa+\lambda=\max\{\kappa,\lambda\}$ whenever either of the two is infinite and neither is zero.

Proof.
Point is,

$\kappa+\lambda=\kappa\lambda=\max\{\kappa,\lambda\}$ . The first equality is proved in Theorem 14 point 31, whose proof relies on point 26 (which uses the above which doesn't involve the sum formula), on point 15 (which proves one distributivity directly and the other one via commutativity which is in point 13), and on point 13 (which proves its statements directly without relying on any formula). Thanks to commutativity, when proving the second equality, we may rename and assume

$\kappa\geq\lambda$ . So we really need to prove

$\kappa\geq\lambda\implies\kappa\lambda=\kappa$ . By Theorem 14 point 13,

$\kappa\lambda\geq\kappa$ whenever

$\lambda\neq0$ , which we assumed, and by point 14,

$\kappa\geq\lambda\implies\kappa^2\geq\kappa\lambda$ , so if

$\kappa^2=\kappa$ , the multiplication formula is proved. But

$\kappa^2=\kappa$ for infinite

$\kappa$ is Lemma 12 above, so we are done.

To conclude, here is a neat property of infinite ordinals.

Lemma 15: Property of infinite ordinals. Let $\alpha\geq\omega$ be an infinite ordinal. Then $|\alpha+n|=|\alpha|$ for all $n\in\omega$ . So infinite ordinals do not change cardinality when the successor function is applied a finite number of times.

Proof.
For

$n=0$

$\alpha+n=\alpha$ so this is obvious. For

$n=m+1$ , assuming the case

$\alpha+m$ lets us use the argument for

$n=1$ to conclude by transitivity of cardinal equality, so we need only prove the fact for

$n=1$ . Let us define a bijection

$f:\alpha+1\to\alpha$ . Since

$\alpha\geq\omega$ ,

$\omega\subseteq\alpha$ . We define

$f(n):=n+1$ for

$n\in\omega$ ,

$f(\alpha):=0$ , and

$f(\beta):=\beta$ for

$\beta\in\alpha\smallsetminus\omega$ . This is clearly a bijection, and what it does is shift the

$\alpha$ "at the end" of

$\alpha+1$ "to the beginning", making room for it by shifting the naturals up by one, so what we are left with is just

$\alpha$ .

And on these notes, I bid you goodbye, and suggest you stay tuned for more! Like constructing

$\mathbb R$ from

$\mathbb N$ , thus ending up with all our numbers just from some axioms and the empty set. That's pretty cool, right?

Bibliography

P.M. Soardi, Analisi Matematica 1 (Mathematical Analysis 1);
English Wikipedia, Zermelo-Fraenkel set theory;
English Wikipedia, Axiom of regularity;
English Wikipedia, Epsilon-induction (more correctly $\in$ -induction or membership-induction);
English Wikipedia, Axiom of the empty set;
English Wikipedia, Cardinal number;
English Wikipedia, Axiom of infinity;
English Wikipedia, Ordered pair;
English Wikipedia, Schröder-Bernstein theorem;
English Wikipedia, Axiom of countable choice;
English Wikipedia, Axiom of dependent choice;
English Wikipedia, Forcing;
I. Moerdijk, J. Van Oosten, Sets, Models and Proofs;
Math Stack Exchange, How to prove that from “Every infinite cardinal satisfies $a^2=a$ ” we can prove that $b+c=bc$ for any two infinite cardinals $b,c$ ?;
Asaf Karagila, answer to About a paper of Zermelo;
Asaf Karagila, answer to "For every infinite $S$ , $|S|=|S\times S|$ implies the Axiom of choice";
ProofWiki, Ordinal Multiplication is Associative;
Math Student 020, Distributivity of ordinal arithmetic;
Alon Amit, answer to Is ordinal multiplication associative? Does it distribute over ordinal addition?;
Andreas Blass, Answer to Epsilon Induction implies Axiom Of Foundation (Or Regularity);
Michele Gorini, Is countable choice enough to prove a countable family of countable sets has a countable union?.

Monthly Mathematics for Masters and Masses

Friday, April 27, 2040

Introduction and blog index

Saturday, April 20, 2024

Posts on the blog's Facebook page

Tuesday, July 24, 2018

Functions, induction, cardinality, ordinals: more abstract set-theoretical stuff

Introduction and blog index

Report Abuse