Monthly Mathematics for Masters and Masses: Functions, induction, cardinality, ordinals: more abstract set-theoretical stuff

This may be where I actually start using all three levels distinctly. There will definitely be noob-level stuff, and a lot of master-level stuff, but perhaps I will find some med-level stuff too, who knows. The post has yet to be written. This post is also going to feature one or two spoilers for things to omit on first reading which are master-level or even above that, like, for masters among masters who are brave enough to wade through the ocean of abstraction that those things are.

OK, so in the last post we talked about relations on sets. Now consider the relation of ownership. What set is it defined on? If we want a single set to defined it on, it would have to be the set of people, pets, plants, and things. But that sounds like a big mishmash. It would make way more intuitive sense to consider the set of people and the set of ownable entities (i.e. pets plants things). But we do not have a way to define a relation between different sets… just yet. Of course, the definition is just an adaptation of the definition of relation on a set: couldn't be easier.

Definition 1: Relation between sets. Given two sets $X,Y$, a relation between $X$ and $Y$, or of $X$ with $Y$ (in that order), is just a subset of $X\times Y$. Again, we say $x\operatorname{R}y$ if $(x,y)\in R$.

Obviously a relation on $X$ is now a relation of $X$ with itself.
Now, clearly the definitions of particular classes of relations on sets we saw last time cannot be reintroduced here. But there is a thing we can do with this new definition of relations between sets. Now, in the last post (btw sorry I'm posting this late, I sort of got caught up with other stuff to do first, and with research for this post then), we mentioned "functions" without ever defining them. I'm sure all of you have heard the word, and have an idea of a function which probably has nothing to do with relations, and can be formulated as follows.

Definition 2: Function (naïve definition). A function between two sets $X$ and $Y$ is a law that to every $x\in X$ associates one and one only $y\in Y$, typically denoted $f(x)$.

This is a vague term. What is a law, after all? Well, nowhere near as vague as what we gave for "sets", but getting formal there requires a whole new level of abstraction and is way over med-level so we'll get to that later in the post, but I advise you to just skip over it unless you're very well-versed in pure maths, as I consider myself to be. Anyways, it isn't hard to give a more formal definition of a function. If we have that kind of "law", we can easily associate to it a subset $F\subseteq X\times Y$: just say $(x,y)\in R\iff y=f(x)$. So we need to translate the "one and one only" part of the above definition, and it is done as follows.

Definition 3: function (formal definition). A function between two sets $X,Y$ is a relation $F\subseteq X\times Y$ such that for all $x\in X$ there exists a unique $y=:f(x)$ such that $(x,y)\in F$.

When functions are thought of as laws, the set in definition 3 is usually called the graph of the function. So you may say that the formal definition of a function is simply identifying a function with its graph. Which makes total sense, since the graph identifies the law and viceversa, as is obvious. Indeed, the graph is often used to paint a picture of the function, especially if you can draw it on paper, as is done ever since middle school in math classes.
From now on, outside the foundational master-level spoiler, the naïve definition will suffice, so we will use that as a basis for properties of functions. I will now start following chapter 2 of Soardi very closely.

Definition 4: Various terms related to functions. Let $f:X\to Y$. Then:

$f(x)$ will be called image of $x$ via $f$;
Map will be a synonym for function;
Given $A\subseteq X$, $f(A):=\{f(x):x\in A\}=\{y\in Y:\exists x\in A:f(x)=y\}$ will be called image of $A$ via $f$; it goes without saying that $f(\varnothing)=\varnothing$, but let's specify it just for the heck of it; if a function is defined on a set which contains sets as members, it might be wise to distinguish images of sets and of elements via notation other than capitalization, so I predispose the notation $\operatorname{Im}_f(A):=\{f(x):x\in A\}$;
Given $B\subseteq Y$, we call $f^{-1}(B):=\{x\in X:f(x)\in B\}$ the preimage of $B$ via $f$; again, $f^{-1}(\varnothing)=\varnothing$; as in the item above, I predispose $\operatorname{Im}^{-1}_f(B):=\{x:f(x)\in B\}$ as a disambiguative notation;
$\operatorname{Dom}f:=X$ is called the domain or definition set of $f$, and $\operatorname{Cod}f:=Y$ is the codomain; the image of $f$ is $\operatorname{Im}f:=f(X)=\operatorname{Im}_f(\operatorname{Dom}f)$;
$f$ is injective or one-to-one if $f(x)=f(y)\iff x=y$;
$f$ is surjective or onto if $f(X)=Y$, or in other words, $\operatorname{Cod}f=\operatorname{Im}f$;
$f$ is bijective if it is both injective (one-to-one) and surjective (onto);
For any set $X$, the function from $X$ to $X$ which maps each element to itself is called identity on $X$ and denoted by $\operatorname{id}_X$;
If $f:X\to Y$ and $g:Y\to Z$, the composition of $g$ and $f$, denoted $g\circ f$, is the function $X\to Z$ such that $g\circ f(x)=g(f(x))$;
If $f$ is bijective, we will see it has an inverse, that is there exists $g:Y\to X$ such that $f\circ g=\operatorname{id}_Y$ and $g\circ f=\operatorname{id}_X$; such a $g$ is denoted $f^{-1}$, so that $f^{-1}(y)$, which would normally be the set $\{x\}$ (where $f(x)=y$), is also used to mean just the element $x$.
If $A\subseteq X$, the restriction of $f$ to $a$, denoted $f|_A$, is the function $A\to Y$ such that $f|_A(x)=f(x)$ for all $x\in A$;

Now we have some simple properties to prove for functions.

Theorem 1: Elementary properties of functions, images, preimages, and compositions. Let $f:X\to Y$ and $g:Y\to Z$, $A,B\subseteq X$ and $C,D\subseteq Y$. Then:

$f^{-1}(C\cup D)=f^{-1}(C)\cup f^{-1}(D)$;
$f^{-1}(C\cap D)=f^{-1}(C)\cap f^{-1}(D)$;
$f(A\cup B)=f(A)\cup f(B)$;
$f(A\cap B)\subseteq f(A)\cap f(B)$, and if $f$ is injective equality is guaranteed;
$A\subseteq f^{-1}(f(A))$ and equality holds if $f$ is injective;
$B\supseteq f(f^{-1}(B))$ and equality holds if $f$ is surjective;
$f^{-1}(B^C)=(f^{-1}(B))^C$;
If $f$ is injective, $f(A^C)\subseteq(f(A))^C$;
If $f$ is surjective, $(f(A))^C\subseteq f(A^C)$;
If $f$ is bijective, $f(A^C)=(f(A))^C$;
The restriction of an injective $f$ is still injective;
If we replace the codomain with the image, any $f$ becomes surjective; in short, any $f$ is surjective on its image;
If $f,g$ are bijective, then $g\circ f$ is also bijective; in fact, injectivity and surjectivity are separately preserved by composition, that is, $f,g$ are injective, the composition also is, and if $f,g$ are surjective, so is the composition;
Composition of functions is associative, that is $(h\circ g)\circ f=h\circ(g\circ f)$;
$f$ is injective if and only if it has a left inverse, i.e. there is $g:Y\to X$ such that $g\circ f=\operatorname{id}_X$;
$f$ is surjective if and only if it has a right inverse, i.e. there is $g:Y\to X$ such that $f\circ g=\operatorname{id}_Y$; note: this requires the axiom of choice;
$f:X\to X$ is bijective if and only if it has an inverse.

Proof.

Let $x$ be on the left side. Then either $f(x)\in C$, in which case $x\in f^{-1}(C)$, or $f(x)\in D$, in which case $x\in f^{-1}(D)$. In both cases, $x\in f^{-1}(C)\cup f^{-1}(D)$. The converse is similarly easy.
If $x$ is on the left side, $f(x)\in C\cap D$, which implies $f(x)\in C$ and $f(x)\in D$, that is $x\in f^{-1}(C)$ and $x\in f^{-1}(D)$, putting it on the right side and proving $\subseteq$. The reverse inclusion is similarly simple.
Assume $x$ is on the left. Then there is $y\in A\cup B$ such that $f(y)=x$. If $y\in A$, $x\in f(A)$, and if $y\in B$, $x\in f(B)$. In all cases, $x$ is on the right, giving $\subseteq$. If $x$ is on the right, then either there is $y\in A:f(y)=x$, or there is $y\in B:f(y)=x$. In both cases, $x$ is on the left, giving the desired equality.
If $x$ is on the left, there is $y\in A\cap B:f(y)=x$, which puts $x$ in both $f(A)$ and $f(B)$, giving the desired inclusion. If $f$ is not injective, there may be $x\in A,y\in B$ such that $f(x)=f(y)$, so that $f(x)\in f(A)\cap f(B)$, but $x\in A\smallsetminus B,y\in B\smallsetminus A$, and if every element in $f^{-1}(f(x))$ is like that, then $f(x)\notin f(A\cap B)$. This is clearly the only way to avoid an equality here, so injectivity, which excludes this option, guarantees equality.
If $x\in A$, $f(x)\in f(A)$, and any $y:f(y)=f(x)$ will be in $f^{-1}(f(A))$, including $x$, thus proving the desired inclusion. If $f$ is injective, assume $x\in f^{-1}(f(A))$. This means $f(x)\in f(A)$, making it so that there is $y\in A:f(y)=f(x)$, but injectivity then forces $x=y$, thus placing $x\in A$ and proving the reverse inclusion.
If $x$ is on the right, we have $y\in f^{-1}(B):f(y)=x$, but then $f(y)\in B$, which puts $x$ on the left. If $x$ is on the left, as soon as there is $y:f(y)=x$, $y$ ends up in $f^{-1}(B)$, so that $x$ is on the right. This means that the reverse inclusion is guaranteed if $f$ is surjective, or if $B\subseteq f(X)$.
If $x\in f^{-1}(B^C)$, then $f(x)\notin B$, so $x\notin f^{-1}(B)$, hence $f^{-1}(B^C)\subseteq(f^{-1}(B))^C$. If $x$ is on the right, then $x\notin f^{-1}(B)$, so $f(x)\notin B$, hence $f(x)\in B^C$, but then $x\in f^{-1}(\{f(x)\})\subseteq f^{-1}(B^C)$ by 6 above, completing the proof.
If $x$ is on the left, there is $y\notin A:f(y)=x$, but there cannot be $z\in A:f(z)=x$ because any such $z$ would equal $y$ and lie outside $A$ by injectivity, so $x\notin f(A)$, as desired.
If $x$ is on the left, there is no $y\in A:f(y)=x$. However, a $y:f(y)=x$ exists by surjectivity, so such a $y$ must be in $A^C$, placing $x\in f(A^C)$, as desired.
Follows from the two above.
If $f|_A(x)=f|_A(y)$, then $f(x)=f(y)$, but $f$ is injective, so $x=y$, proving $f|_A$ is also injective.
By definition of image, for all $y\in\operatorname{Im}(f)$ there is $x\in X:f(x)=y$, so $f:X\to\operatorname{Im}(f)$ is surjective.
Assume $f,g$ injective. Suppose $g(f(x))=g(f(y))$. By injectivity of $g$, we have $f(x)=f(y)$, but by injectivity of $f$, $x=y$, proving injectivity of $g\circ f$. Assume $f,g$ surjective. Fix $z\in Z$. There is $y\in Y:g(y)=z$, by surjectivity of $g$. We then have $x\in X:f(x)=y$, by surjectivity of $f$. But then $g(f(x))=g(y)=z$, proving surjectivity of $g\circ f$. If $f,g$ are bijective, they are injective and surjective, allowing the previous arguments and making $g\circ f$ injective and surjective, and thus bijective.
$(h\circ g)\circ f(x)=(h\circ g)(f(x))=h(g(f(x)))=h(g\circ f(x))=h\circ(g\circ f)(x)$.
If $f$ has a left inverse, suppose $f(x)=f(y)$, then $x=g(f(x))=g(f(y))=y$, proving injectivity. If $f$ is injective, define $g(y)$ to be the unique element $x:f(x)=y$ if there exists one, and any element of $X$ otherwise. $g$ is a left inverse of $f$.
If $f$ is surjective, for any $y\in Y$ there is $x\in X:f(x)=y$. Set $g(y):=x$ if $f(x)=y$. Then you have a right inverse of $f$. Choice is obviously required here if $X$ is infinite, since you need to choose a $y$ from every $f^{-1}(x)$. If $f$ has a right inverse, pick $y\in Y$, and $f(g(y))=y$, so that $f(X)=Y$, or $f$ is surjective.
Apply item two above with $X=Y=\operatorname{Dom}f$ and you get a left inverse $g$. For all $x\in X$, $g(f(x))=x$. Pick $y\in Y$. What is $f(g(y))$? We know $g$ sends it to $g(y)$, so to conclude it's $y$, and thus that $g$ is a right inverse of $f$ and thus an inverse of $f$, we just need to prove $g$ is injective. Suppose $g(y)=g(y')$. We certainly have $y=f(x),y'=f(x')$ by surjectivity of $f$. By the left inverse property, $x=g(f(x))=g(y)=g(y')=g(f(x'))=x'$, but then $y=f(x)=f(x')=y'$. Hence, $g$ is injective, so $g(f(g(y)))=g(y)\implies f(g(y))=y$, and thus $g$ is the inverse of $f$. Oh yes, the inverse, because if $g,g'$ are inverses of $f$, by associativity of composition we have $g=g\circ(f\circ g')=(g\circ f)\circ g'=g'$.

Definition 5: indexations and sequences. Another term used for functions is indexing or indexation. The domain is then called set of indices, and the images $f(i)$ will be denoted $x_i$, with $\{x_i\}_{i\in\operatorname{Dom}}$ denoting both the image and the function. If the domain is the natural numbers, the function is called sequence.

This is where I stop following Soardi: unions and intersections are already defined in the last post, and I mean to save cardinality for later. So let me talk about Mathematical Induction first, then we will delve into axiomatic set theory (in the master-level spoiler), and finally we will get to cardinality and cardinal arithmetic.
Induction, in its two forms, is a great tool to prove a result one has obtained previously via manipulations or conjectures.

Principle of Induction. Suppose $P(n)$ is a proposition that can be checked for natural numbers. Assume $P(n_0)$ holds for some $n_0\in\mathbb N$, and that whenever $P(n)$ holds, $P(n+1)$ can also be proved. Then, $P(n)$ holds for any $n\geq n_0$.

Principle of Complete Induction. Suppose $P(n)$ is a proposition that can be checked for natural numbers. Assume $P(n_0)$ holds for some $n_0\in\mathbb N$ and that whenever $p(k)$ holds for $n_0\leq k\leq n$, then $P(n+1)$ also holds. Then $P(n)$ holds whenever $n\geq n_0$.

Let us see some applications of these principles. Since this is childplay for an average mathematician (and maybe not only for that kind of person), I thought I'd leave the option to just hide it and skip to more interesting stuff. Clicking the button just below will hide the applications of induction I propose in this post.

Theorem 2: sum of the first natural numbers. For any $n\in\mathbb N$, the sum of the first $n$ natural numbers is $\frac{n(n+1)}{2}$. In other words, $\sum_1^nk=\frac{n(n+1)}{2}$.

Proof.
We use the principle of induction with $n_0=1$. For $n=1$, the sum is just 1, and $\frac{1(1+1)}{2}=\frac{1\cdot2}{2}=\frac22=1$. Suppose the property holds for $n$. Then: $$\sum_{k=1}^{n+1}k=\sum_{k=1}^nk+n+1=\frac{n(n+1)}{2}+n+1=\frac{n^2+n+2n+1}{2}=\frac{n^2+3n+1}{2}=\frac{(n+1)(n+2)}{2},$$ which is what we want. Induction therefore gives us the theorem.

Theorem 3: Sum of the first $i2^i$. Let $n\in\mathbb N$. Then: $$\sum_{k=1}^ni\cdot2^i=2^{n+1}(n-1)+2.$$
Proof.
We use the principle of induction with $n_0=2$. For $n=1$ and $n=2$, this can be easily verified. Suppose it holds for $n$. Then: $$\sum_{k=1}^{n+1}i2^i=\sum_{k=1}^ni2^i+(n+1)2^{n+1}=2^{n+1}(n-1)+2+(n+1)2^{n+1}=2^{n+1}(n-1+n+1)+2=2^{n+2}(n+1)+2,$$ which is what we wanted. Induction then gives us the theorem.

If you're wondering how I got the formula for this sum, here are my manipulations.

Theorem 4: Sums of cubes. Let $n\in\mathbb N$. Then: $$\sum_{k=1}^nk^3=\left(\frac{n(n+1)}{2}\right)^{\!\!2}.$$
I found this online via googling.
Proof.
Let me try with induction. $n_0=1$. That case is easy, just like 0. Suppose it holds for $n$. Then: \begin{align*} \sum_{k=1}^{n+1}k^3={\!\!}&\left(\frac{n(n+1)}{2}\right)^{\!\!2}+(n+1)^3=(n+1)^2\left(\frac{n^2}{4}+n+1\right)=(n+1)^2\frac{n^2+4n+4}{4}=\frac{(n+1)^2(n+2)^2}{4}={} \\ {}={}&\left(\frac{(n+1)(n+2)}{2}\right)^{\!\!2}, \end{align*} as we wanted.

The last application of induction we see is one of complete induction.

Fundamental Theorem of Arithmetic. Let $n\in\mathbb N$. Then there is a way to decompose $n$ as a product of prime numbers.

For algebraists: 1) I am aware of the distinction between prime and irreducible, and that what I should write here is irreducible, but we'll get to that in another post; 2) Yes, the FTOA has a uniqueness part, but induction won't prove that, or at least not as easily as apt for this post.
Proof.
For $n_0=2$, the statement is obvious. We use complete induction here. Suppose any $2\leq k\leq n$ admits a decomposition. Look at $n+1$. If it is prime, it is already decomposed. Otherwise, it can be written $n+1=hk$ with $h,k<n+1$. If any of those is less than 2, it will be one, and the other will be $n+1$, contradicting the inequality $h,k<n+1$. Hence, by the induction step, $h,k$ can be decomposed. The decomposition of $n+1$ is then the concatenation of those of $h$ and $k$. Complete induction then gives the theorem.

But can we prove those two forms of induction? Let's start by equating them with a property of the natural numbers.

Theorem 5: Induction and the well-ordering of the naturals. Both the above forms of induction are equivalent to the statement that $\mathbb N$ is well-ordered under the normal ordering (a statement I'll call WON, Well-Ordering of the Naturals). More specifically, WON implies induction which implies complete induction which implies WON.

Proof.
Assume WON. Suppose the hypotheses of induction are verified for a property $P$, with base case $n_0$. By contradiction, suppose induction doesn't hold. The set $S:=\{n\in\mathbb N:P(n)\text{ is false}\wedge n\geq n_0\}$ is then nonempty, and by WON there is a minimal element $m\in S$. Suppose $m>n_0$. This implies $P(m-1)$ holds, otherwise $m-1\in S$, which contradicts the minimality of $m$. But the assumption is that $P(m-1)\implies P(m)$, so $m\notin S$, contradiction. Thus, $S=\varnothing$, which is what induction would have us conclude. Induction is thus proved.
If induction holds, given that complete induction has stronger assumptions than induction, complete induction holds too.
If complete induction holds, consider $S\subseteq\mathbb N$ a nonempty subset of the naturals. Let $P(n)$ be "$n\notin S$". If $P(0)$ doesn't hold, we have our minimum: zero. If $P(0)$ holds, let's prove the induction step. Assume $k\notin S$ for $k\leq n$. If $n+1\in S$, then we have our minimum. If $n+1\notin S$, then we just showed that, if $P(k)$ holds for $k\leq n$, then $P(n+1)$ holds. By complete induction, $P(n)$ holds for all $n$. But then $S$ is empty, contradiction. The contradiction stemmed from eliminating every chance we had to find a minimum. This implies a minimum must exist, and WON is thus proved.

Theorem 6: Well-ordering of the naturals. The natural numbers are well-ordered under the usual relation. In other words, given $S\subseteq\mathbb N$ nonempty, there is an element $x\in S$, called minimum of $S$ and denoted $\min S$, such that $x\leq s$ for all $s\in S$.

Proof.
$S$ is nonempty, so there exists $y\in S$. Consider $S_y:=\{s\in S:s\leq y\}$. Suppose $0\in S_y$. Then we have a minimum. Otherwise, if $1\in S_y$, we have a minimum. Go on like that for at most $y+1$ steps, and you will find a minimum. Indeed, after $y+1$ tests, you will be testing $y$, and you know $y\in S$, so if $y$ is not the minimum, you will have found another element of $S$ before reaching $y$. This means you will have stopped before getting to $y$, because as soon as a number is in $S$, you stop because you have the minimum. So for any $S$, after a finite number of tests and a single choice, you have a minimum. WON is proved.

At this point, you may be wondering: can we not formulate a "principle of induction" with any well-ordered set? Oh yes we can. In fact, we don't even need a well-order.

Definition 6: Well-founded relation. A relation $R\subseteq X\times X$ is called well-founded if, for any $S\subseteq X$ nonempty, there is $\min S$, that is an element $s\in S$ such that $s\operatorname{R}x$ for all $x\in S$.

Theorem 7: Well-founded induction. Let $R$ be a well-founded relation on a set $X$. Then, if a property $P(x)$ can be proven for $x_0$, and moreover it can be proven that whenever $P(y)$ holds for all $y\neq x:x_0\operatorname{R}y\operatorname{R}x$, then it holds for $x$, we can conclude that $P(x)$ holds for all $x:x_0\operatorname{R}x$.

The proof below is just a rehashing of WON implies induction.

Proof.
Suppose there is a set $X$ with a well-founded relation $R$ that satisfies the assumptions of well-founded induction but falsifies well-founded induction. Then, $S:=x:x_0\operatorname{R}x\wedge\lnot P(x)\}$ is nonempty, so it must have a minimal element $y_0$. It cannot be $x_0$ because the assumptions include $P(x_0)$. Since any $y\neq x:x_0\operatorname{R}y\operatorname{R}y_0$ is outside $S$ by definition of minimal element, by the induction assumption we must conclude $P(y_0)$ holds, which contradicts the fact $y_0\in S$. This implies $S=\varnothing$, so that well-founded induction is proved.

Now there are two "relations" that we have been using all the time and haven't talked about at all: inclusion and membership ($\subseteq$ and $\in$). Are those well-founded? Wait wait. What are those defined on again? Well, the set of all sets, of course. Wait. First of all, membership is not defined there… is it? There are elements that are not sets, and sets are not elements, right? Wrong. Or not. It depends. The second things is definitely wrong: any set is a member of its power set. The first thing may be right or wrong depending on whether or not your theory of sets postulates urelements. You will probably be thinking: what about numbers? Those are bound to be urelements, right? Wrong! Feel mindblown yet? You have no idea what I'm going into now. Let Vegeta warn you about it. But don't be scared just yet: there is still one thing that should be noob-level here, and it is why "set of all sets" can lead to problems. You see, when we defined a set in the last post, we were very vague. As far as we know, any collection of things we can either enumeerate or characterize is valid as a set. So let me tell you why this is no bueno.

Russel's paradox. Let $S$ be the set of all sets that are not a member of themselves. Then $S\in S\iff S\notin S$.

No, I'm not going crazy. No, I'm not joking or mistyping. This set evidently cannot be a valid set. Indeed, let's prove the paradox.
Proof.
Suppose $S\in S$. Then $S$ doesn't contain itself as a member, by definition of $S$, so $S\notin S$. Viceversa, if $S\notin S$, by definition of $S$ it has to satisfy $S\in S$, or it would be in $S$.

And yeah, that last phrase very much illustrates the level of contradictoriness this set has by definition: "it must be a member of itself, otherwise it would belong to itself".
A real-life version of the paradox is the following. Suppose a certain barber decides to shave every person in his village who doesn't shave themselves. Will he shave himself? And the answer is: he will if and only if he won't, just like in the mathematical version $S\in S\iff S\notin S$.
So how do we get out of this paradox? The answer is axioms. You may have heard this word in Geometry: they are truths that are assumed as self-evident and serve as a starting point for a theory. The same way points and lines are not defined in Geometry, but are assumed as primitive and given relations they satisfy by assumption, the same is done with sets.

And this is where noobs and meds join back in. So the spoiler can be summed up as "Let there be nothing! Let there be possibilities of shenanigans with sets! Oh hello natural numbers!". What follows should be noob-level, if a little more complicated than the average noob-level stuff. While we did those things, we defined the natural numbers in such a way that $0=\varnothing$ and $n=\{0,\dotsc,n-1\}$ for any $n$. Keep that in mind. We are also trying to define addition by $n+m$ being the $k$ such that there is a bijection $f:k\to(n\times1)\cup(m\times\{1\})$, where 1 means $\{\varnothing\}$ and $\{1\}=\{\{\varnothing\}\}$. The notation $S(x)$ means $x\cup\{x\}$, which is just $x+1$ when $x$ is a natural.
Two loose ends are left here (that is, in what I was doing just before the spoiler ended): that for all $m,n$ there is $k$ in bijection with $(n\times1)\cup(m\times\{1\})$, and that it is unique.
To prove uniqueness, note that, if $k\neq h$, we can assume modulo renaming that $h\in k$, and then via a sufficient number of applications of $S$ we get $k=S(S(\dots(S(h))))$, meaning $h\subseteq k$ and $h\in k$ but $h\notin h$ as seen above. OK, that is an interesting property, but we don't need that. Let me just leave it there because why not :).

Lemma 2: Naturals and bijections. Let $h,k\in\mathbb N$. Then there exists a bijection $f:k_1\to k_2$ if and only if $k_1=k_2$.

Proof.
If $h=k$, pick the identity as a bijection.
Suppose, by way of contradiction, that there are $h\neq k$ in bijection. Define $P(k):=\exists h\neq k:h,k\text{ are in bijection}$. This is a valid property, so the set $\{k\in\mathbb N:P(k)\}$ is valid, it is nonempty by assumption, and so, by the well-ordering we proved above, it must have a minimal element $k_0$. If $k_0$ were empty, it could not be in bijection with anything other than itself, but by assumption this is not the case, so $k_0\neq\varnothing$. Thus, we have $k_0':k_0=S(k_0')$, and there exists $\tilde k\neq k_0$ such that $\tilde k$ and $k_0$ are in bijection, and of course $\tilde k=S(k')$. Restrict the bijection $f:k_0\to\tilde k$ to one $f|_{k_0'}:k_0'\to\tilde k\smallsetminus\{f(k_0')\}$, which is valid because $\tilde k\smallsetminus\{f(k_0')\}=\{x\in\tilde k:x\neq f(k_0')\}$. Define now $g:\tilde k\smallsetminus\{f(k_0')\}\to k'-1$ by setting $g(x)=x$ if $x\in f(k_0-1)$ and $g(x)=S^{-1}(x)$ otherwise. Now, $g\circ f:k_0'\to k'$ is a bijection, which violates the minimality of $k_0$, given that $k_0'=k'$ would imply $k_0=S(k_0')=S(k')=\tilde k$ which we assumed not to be true. So, we have a contradiction, which gives us that, whenever $k,h$ are in bijection, they must be equal.
Simplified sum-up: Suppose there are two distinct "numbers" in bijection with each other. If one is 0, we clearly have a contradiction. Otherwise, we can construct a bijection between their predecessors (that is, those numbers minus 1), which means that, if there is no bijection between $k-1$ and something different than it, $k$ satisfies the same property. By induction, we conclude our claim holds true.

Lemma 3: Existence of $k$ in bijection with $(n\times1)\cup(m\times\{1\})$. Suppose $n,m\in\mathbb N$. Then there is $k=:m+n$ such that $k$ is in bijection with $(n\times1)\cup(m\times\{1\})$.

Proof.
Define $f:n\times1\to n$ by $f((k,1)):=k$. Define $f((m,\{1\})):=S(S(S(\dots(S(m)))))$, the successor being taken $n$ times (that is, the number of times needed to make $f((0,\{1\}))=n$). This is clearly an injective function $f:(n\times1)\cup(m\times\{1\})\to\mathbb N$, and its image is, by construction, $k\in\mathbb N$.

Let us now define the cartesian product of a family.

Definition 7: cartesian products of families. Let $\mathcal F$ be a family of sets. We define: $$\prod\mathcal F:=\left\{f:\mathcal F\to\bigcup\mathcal F:x\in\mathcal F\implies f(x)\in x\right\},$$ and call $\prod\mathcal F$ the (cartesian) product of the family $\mathcal F$.

And now the question arises: is this empty? Clearly if $\varnothing\in\mathcal F$ it is, because $f(\varnothing)$ would need to be in $\varnothing$, but that is impossible. But what if $\varnothing\notin\mathcal F$? If you read last post, you will no doubt be thinking "isn't an element of the product thus defined simply a choice function for $\mathcal F$"? And you are totally right. So we finally get to introduce the axiom of choice into this post.

Axiom of Choice - revisited. Let $\mathcal F$ be a family of sets such that $\varnothing\notin\mathcal F$. Then $\prod\mathcal F\neq\varnothing$.

Clearly if $\mathcal F$ is in bijection with some $n$, aka is finite, we can construct an element of the product by choosing a finite number of elements. The axiom of choice is required for generality. Of course, we can have a weaker form of choice for "countable" families.

Definition 8: Countably infinite set. Let $x$ be a set. We say $x$ is countably infinite if it is in bijection with $\mathbb N$.

It is easy to see any subset $\mathbb N_k:=\{x\in\mathbb N:k\in x\}$ (naturals no smaller than $k$) is countably infinite: just shift up $k$ times and you have your bijection. Obviously all sets of numbers with fixed remainder $r$ when divided by $k$ are countably infinite by $f_{k,r}(x):=kx+r$. Can we prove that any subset of $\mathbb N$ is countably infinite (or finite)? As far as I know, we need the following extra axiom, and the Schröder-Bernstein theorem.

Axiom of Countable Choice. Let $\mathcal F$ be a countably infinite family of sets. Then $\prod\mathcal F\neq\varnothing$.

In the terminology of last post, all countable families of sets admit at least a choice function.

Theorem 8: Schröder-Bernstein theorem. Let $X,Y$ be sets such that there exist $f:X\to Y$ and $g:Y\to X$ both injective. Then there exists $h:X\to Y$ bijective.

Proof.
I simply copy the Wikipedia proof attributed to Julius König.
For any $a\in X$ or $b\in Y$, we can produce a unique double-sided sequence: $$\dots,f^{-1}(g^{-1}(a)),g^{-1}(a),a,f(a),g(f(a)),\dots$$ Such a sequence has three options:

Terminate on the left with an element of $X\smallsetminus g(Y)$, in which case we call it an $X$-stopper.
Terminate on the left with an element of $Y\smallsetminus f(X)$, in which case we call it a $Y$-stopper.
Repeat itself, in which case we call it cyclic.
Not terminate on the left and not repeat itself, in which case we call it doubly infinite.

By injectivity of $f,g$, for any $a\in x$ or $b\in y$ there is a unique such sequence to which $a$ or $b$ belongs. The sequences then form a partition of the disjoint union of $X,Y$, or of $(X\times1)\cup(Y\times\{1\})$ if we want. To produce our bijection $h$, we just have to define it on the elements of $X$ and $Y$ of each sequence. For an $X$-stopper, $f$ defines a bijection between elements of $X$ and elements of $Y$, so if $a\in X$ belongs to an $X$-stopper, we define $h(a):=f(a)$. With analogous arguments, for $a\in X$ belonging to a $Y$-stopper, we can define $h(a):=g^{-1}(a)$, and for $a$ in a doubly infinite or cyclic sequence we can use either $h(a):=f(a)$ or $h(a):=g^{-1}(a)$, provided we make a single choice for each such sequence.

Theorem 9: Subsets of the naturals. Let $A\subseteq\mathbb N$. Then $A$ is either finite or countably infinite.

Proof
If $A$ is finite, we are done. If $A$ is infinite, clearly $\operatorname{id}_{\mathbb N}|_A:A\to\mathbb N$ is an injection, so by Schröder-Bernstein we simply need to construct an injection $f:\mathbb N\to A$ to conclude $A$ is countably infinite. Since $A$ is not finite, it is not empty, so we can pick $f(1)\in A$. Since $A$ is not finite, we can pick $f(2)\in A\smallsetminus\{f(1)\}$, for if this were empty, $A=\{f(1)\}$ would be finite, which it isn't by assumption. Rinse and repeat, and for any $n$ you have $f(n)$. Here is your injection.

There is a more straightforward way of proving the above, but it requires a stronger form of choice. And noobs can skip ahead of this, which is going to be med-level.

Definition 9: Entire binary relation. A binary relation $R\subseteq X\times X$ on a set $X$ is called entire (or serial or total) if for all $x\in X$ there is at least one element $y\in X:x\operatorname{R}y$. In other words, if every element is related to at least one element.

Clearly, a reflexive relation is serial: just take $y=x$. The converse isn't true: $<$ is serial on $\mathbb N$ (take y:=S(x)=x+1), but it is not reflexive. However, a serial relation that is symmetric and transitive can, according to Wikipedia, be shown to be reflexive.

Axiom of Dependent Choice. Given a set $X$ with a relation $\preceq$, if $\preceq$ is serial, there exists a sequence $\{x_i\}_{i\in\mathbb N}$ such that $x_i\preceq x_{i+1}$ for all $i\in\mathbb N$.

Lemma 4: The various forms of choice. The full Axiom of Choice implies the Axiom of Dependent Choice which in turn implies the Axiom of Countable Choice.

Proof.
Assume Choice. Let $X$ be a set and $R$ an entire relation on $X$. For all $x\in X$, we know $R_x:=\{t\in X:x\operatorname{R}y\}$ is nonempty. Therefore, by Choice, there is a choice function for $\{R_x:x\in X\}$. We now have $f:X\to X$ such that $x\operatorname{R}f(x)$ for all $x$: just map $x$ to $R_x$ first, and then use the choice function. With this, we consider $x,f(x),f(f(x)),\dots$. This is our sequence. Dependent Choice is them proved.
Assume Dependent Choice, and suppose we have that $A(n)$ is a set for all $n\in\mathbb N$. Consider the set of all finite choice functions $f:n\to\cup\{A(0),\dotsc,A(n-1)\}$. This is valid because the functions $n\to\bigcup\{A(n)\}$ are valid (note that $\phi(x,y):=x=n\wedge y=A(n)$ is valid and the $\phi$-image of $\mathbb N$ is $\{A(n):n\in\mathbb N\}$ which makes $\bigcup\{A(n)\}$ a valid set by union and specification), so choice functions defined on $n$ are a valid set by specification, and by union we get all finite chioce functions fall into a set, so by specification the set of all finite choice functions is valid. Define the relation $f\preceq g$ if $f$ is defined on $m$, $g$ is defined on $n\geq m$ and $g|_m=f$. This relation is clearly serial because, given $f$ defined on $n$, we can always define $g$ on $n+1$ so that $g|_n=f$. Hence, we have a sequence $f_0\preceq f_1\preceq\dotso$. Now define $f(i):=f_{i+1}(i)$. This is a choice function for $\{A(n)\}$. Countable Choice is thus proved.

I introduced Dependent Choice as a simpler way to prove Theorem 9. I will do much more with it, though the much more can be done with Countable Choice too.

Theorem 10: Every infinite set is at least countable. For any $X$ that is not finite, there exists an injection $f:\mathbb N\to X$.

Noobs, you can try cracking Dependent Choice and the Lemma above, if you want, and then you should be able to understand the following proof. Of course, you can simply work this out in a way similar to what I did in Theorem 9.

Proof.
We know we can build injections from $n$ to $X$ for any $n$ by the first argument in Theorem 9. Consider the set of all such injections, which is valid by arguments similar to those used for finite choice functions in Lemma 4. Define on it the relation $f\preceq g$ if $g|_{\operatorname{Dom}f}=f$. By Dependent Choice we have a sequence of such functions, and the conclusion is obtained like in Lemma 4.

Corollary 1: Infinity and bijections. Let $X$ be an infinite set. Then:

There is a countable subset $A\subseteq X$ such that $X\smallsetminus A$ is infinite;
For any countable $A\subseteq X$, $X$ is in bijection with $X\smallsetminus A$; in particular, every infinite set is in bijection with a proper subset of its;
A set$X$ is infinite if and only if it is in bijection with a proper subset of its.

This proof is from Soardi's book (see Bibliography), Lemma 2.8.1 and Corollary 2.8.2.
Proof.
We know there is an infinite (countable) subset $A\subseteq X$. If $X\smallsetminus A$ is not infinite, fix a bijection $f:\mathbb N\to A$ and consider $B:=f(2\mathbb N)$, the image of even numbers. $X\smallsetminus B$ contains $A\smallsetminus B$, which is in bijection with the odd numbers, which are countable. Hence, 1 is proved.
Take $A$ as in 1. By Theorem 10, there is $B\subseteq X\smallsetminus A$ which is countable. Fix a bijection $f:\mathbb N\to A$ and one $g:\mathbb N\to B$. Define: $$h(x):=\left\{\begin{array}{cc} x & x\in X\smallsetminus(A\cup B) \\ g(\frac{n}{2}) & x=g(n)\text{ with $n$ even} \\ f(\frac{n-1}{2}) & x=g(n)\text{ with $n$ odd} \end{array}\right..$$ This is a bijection $h:X\smallsetminus A\to X$, as desired for 2.
If $X$ is infinite, there is a proper subset $A\subset X$ in bijection with $X$ by 2. If $X$ is finite, it is in bijection with $n$, and any subset $A\subseteq X$ is in bijection with $m\leq n$ (run over the numbers from 0 to $n-1$ and, whenever $A$ skips one, consider the function that shifts all bigger numbers down by 1, compose all the functions you considered, and the image of $A$ via that composition will have no gaps, and the composition will be injective over $A$), so that by Lemma 2 a bijection of $X$ with a subset of $X$ would imply the subset is the whole $X$, since it implies a bijection between $n$ and the $m$ that's in bijection with $A$, and then those must be equal. So the subset can't be proper, and the corollary is thus proved.

It is now time to define cardinals. And also, to define finite and infinite sets, because I don't think I've done that just yet.

Definition 10: Equal cardinality and cardinals. If $X,Y$ are sets, we say they are equinumerous or equipotent or equipollent, or that they have the same cardinality, if there exists a bijection $f:X\to Y$. A cardinal is a class of sets with the same cardinality. We write $|X|=|Y|$ for same cardinality. We say $|X|\leq|Y|$ if there exists an injection $f:X\to Y$. We say $|X|<|Y|$ if $|X|\leq |Y|$ and $|X|\neq|Y|$. We say two cardinals $\kappa,\kappa'$ satisfy $\kappa\leq\kappa'$ if for all $X\in\kappa,Y\in\kappa'$ we have $|X|\leq|Y|$. The same trick extends $<$ and $=$ to cardinals. Naturally, this is the same as having the relation for some $X\in\kappa$ and some $Y\in\kappa'$, since then it holds for all such $X,Y$ by using the bijections between elements of the same cardinal.

Naturally, we will call the cardinals of natural numbers the same as the natural numbers. So $n$ is both the natural number and the matching cardinal. We have proved in Lemma 2 that each natural number has its own cardinal.

Definition 11: Finite cardinals, the cardinal of countability, the cardinal of continuum. The cardinals associated to natural numbers are called finite cardinals, and sets belonging to finite cardinals are called finite. The cardinal of $\mathbb N$ is called cardinal of countability and denoted $\aleph_0$ (aleph-zero or aleph-naught or aleph-null), and sets in $\aleph_0$ are said to be countable (Definition 7) or to have cardinality $\aleph_0$. The cardinal of $2^{\mathbb N}$ is called cardinal of continuum, and sets belonging to it have the cardinality of continuum.

Corollary 2: Subsets of countable sets are finite or countable. If $|X|=\aleph_0$ and $Y\subseteq X$, then either $Y$ is finite or $|Y|=\aleph_0$.

Proof.
Pick a bijection $f:X\to\mathbb N$, which exists by $|X|=\aleph_0$, and set $A:=f(Y)$. Then, $A\subseteq\mathbb N$. By Theorem 9, either $A$ is finite, or it is countable. $f|_Y:Y\to A$ is a bijection, so $|Y|=|A|$, completing the proof.

Corollary 3: $\aleph_0$ and order. $\aleph_0$ is the smallest infinite cardinal. In other words, if a cardinal $\kappa$ satisfies $\kappa\geq n$ for all $n$ finite cardinal, then $\kappa\geq\aleph_0$.

Proof.
Given $\kappa\geq n$ for all $n$, it cannot be finite, as if $\kappa=n$, we would need $n=\kappa\geq n+1$ which is not the case. Theorem 10 then tells us any $X\in\kappa$ has a countable subset $A$. Pick a bijection $f:\mathbb N\to A$. This is an injection $\mathbb N\to X$, proving $|X|\geq|\mathbb N|$, and therefore $\kappa\geq\aleph_0$.

So from Theorem 9 on, all the proofs depended on some form of choice, typically via Theorem 9. So the question comes natural: is choice provable in ZF? And the answer is, sadly, no.

Theorem 11: Choice is independent of ZF. The Zermelo-Fraenkel axioms introduced in the spoiler imply neither the axiom of choice nor its opposite. In fact, one can build models of set theory that satisfy ZF, but either satisfy or not satisfy the full Axiom of Choice.

And of course, the proof is done by constructing said models, but that is getting a bit too complex for me, so I just leave you this reference.
Another question comes to mind now: can there be sets with cardinality strictly bigger than countable? And is $\mathcal P(\mathbb N)$ of a different cardinality than $\mathbb N$? Well, yes to both. In fact, there are cardinals as big as you want.

Theorem 12: Power set and cardinality. $\mathcal P(X)$ always has a strictly greater cardinality than $X$, whatever set $X$ we pick. In the case of $X$ finite, $|\mathcal P(X)|=2^{|X|}$.

The proof is taken again from Soardi, Theorem 2.8.6 and Corollary 2.8.7. Warning for noobs: this might be a bit above-average, but it should be at your level.
Proof
First of all, define $f:\mathcal P(X)\to\{0,1\}^X=\{g:X\to\{0,1\}\text{ functions}\}$ by setting $f(A)(x)$ to be $1$ if $x\in A$ and $0$ otherwise. In short terms, we are mapping a subset to its indicator function. $f$ is a bijection, as can easily be proved. So we can work with that function set for our proof. Suppose that, for all $x\in X$, we have $f_x:X\to\{0,1\}$. Define $g(x)=0$ if $f_x(x)=1$ and $g(x)=1$ otherwise. This is not $f_x$ for any $x\in X$. In other words, there exists no map $X\to\{0,1\}^X$ such that it is surjective. Much less a bijection. Now consider $x\mapsto\{x\}$. This is clearly an injection $X\to\mathcal P(X)$, so our first claim is proved.
For the second claim, pick $f\in\{0,1\}^n$ and consider the sequence $f(0),f(1),\dotsc,f(n-1)$, which uniquely identifies the function $f$. How many such sequences are there? We have two choices for $f(0)$, two choices for $f(1)$, and so on. In the end, we conclude there are $2^n$ such sequences, which proves the second claim.

Corollary 4: Another continuum. The set of sequences of zeros and ones has the cardinality of continuum.

Proof.
Well, in the last proof, we proved that $\mathcal P(X)$ is in bijection with $\{0,1\}^X$. Apply that to $\mathbb N$, and $\{0,1\}^{\mathbb N}$ has the cardinality of continuum, but that is exactly the set in the statement.

At this point, one wonders: is it possible, without or with choice, to establish whether there is any cardinal between $\aleph_0$ and continuum?

Continuum hypothesis. There is no cardinal satisfying $\aleph_0<\kappa<|\mathcal P(\mathbb N)|$.

In fact, why stop there?

Generalized continuum hypothesis. Given a cardinal $\kappa$, there is no cardinal $kappa'$ satisfying $\kappa<\kappa'<2^\kappa$, where $2^\kappa$ is the cardinal of the power set of any $X\in\kappa$.

Turns out, unfortunately, that both of these statements are independent of ZF and of choice, so no, we can't prove any of this in ZF, whether with or without choice. This independence is proved by forcing, a method developed precisely to prove this independence, which I won't get into because this post is already too long as it is.
It is now time to discuss some properties of cardinal addition, which we only defined for finite cardinals, but can obviously be defined the same way for infinite cardinals. Actually, not quite yet. First, I have some other properties to show, and then, since I want to make a single big theorem about the whole of cardinal arithmetic, I will first define products and powers of cardinals.

Theorem 13: Properties of the cardinal ordering. The following hold, where the ones involving choice are marked with "CW" (Choice Warning):

The order of cardinals has the properties of an order relation.
Cardinal equality has the properties of an equivalence relation.
(CW) The statement "Given any two cardinals $\kappa,\lambda$, they are comparable, i.e. either $\kappa\leq\lambda$ or $\lambda\leq\kappa$" is equivalent to the axiom of choice.
(CW) $|A|\leq|B|$ is equivalent to the existence of a surjective $f:B\to A$.

Proof.

Given any cardinal $\kappa$, it is evident that $\kappa\leq\kappa$, since this means that $X\in\kappa\implies\exists f:X\to X\text{ injective}$ but we know we can find one that is bijective: the identity. If $\kappa\leq\lambda\wedge\lambda\leq\kappa$, it means that, given $X\in\kappa,Y\in\lambda$, we have injections $f:X\to Y,g:Y\to X$, but by Theorem 8 that implies there is a bijection $h:X\to Y$, so that $\kappa=\lambda$. If $\kappa\leq\lambda\wedge\lambda\leq\mu$, given $X\in\kappa,Y\in\lambda,Z\in\mu$, we have injections $f:X\to Y,g:Y\to Z$, but then we have seen in Theorem 1 that $g\circ f:X\to Z$ is an injection, which means $\kappa\leq\mu$.
$\kappa=\kappa$ is trivial: pick $X\in\kappa$ and consider the identity. If $\kappa=\lambda$, pick $X\in\kappa,Y\in\lambda$, then there is a bijection $f:X\to Y$, but we saw in Theorem 1 that $f$ has an inverse, and that is a bijection $f^{-1}:Y\to X$, giving $\lambda=\kappa$. If $\kappa=\lambda\wedge\lambda=\mu$, then for $X\in\kappa,Y\in\lambda,Z\in\mu$ we find bijections $f:X\to Y,g:Y\to Z$, but then $g\circ f:X\to Z$ is a bijection by Theorem 1, ad this gives $\kappa=\mu$.
Assume choice. We saw in last post that choice is equivalent to Zorn's lemma, so let's use that. Pick $A\in\kappa,B\in\lambda$. Consider the set of all partial injections $A\to B$, that is all injections from subsets of $A$ into $B$. Is this a valid set? For any subset $C\subseteq A$, the functions $C\to B$ are a valid set (subset of cartesian product which we saw is valid), and the subset of injective functions is also valid by either specification or power set. We can then unite all those and by the axiom of union we have our set. With that, we can define the following order on that set: $$f\leq g\iff\operatorname{Dom}(f)\subseteq\operatorname{Dom}(g)\wedge g|_{\operatorname{Dom}(f)}=f.$$ So $f\leq g$ means that if $f(x)$ is defined, so is $g(x)$, and then $f(x)=g(x)$. This obviously satisfies the properties of an order, and given a totally ordered subset $\{f_i\}_{i\in I}$ of functions defined on subsets $\{A_i\}_{i\in I}$, it is easy to see that an upper bound for $\{f_i\}_{i\in I}$ is the function with domain $\bigcup_iA_i$ defined to be equal to $f_i(x)$ whenever $x\in A_i$. Thus, by Zorn's lemma, there is a maximal element $f:\tilde A\to B$. If $\tilde A=A$, $f:A\to B$ is an injection, proving $\kappa\leq\lambda$. Assume $\tilde A\neq A$. Then there is $a\in A\smallsetminus\tilde A$. If $f:\tilde A\to B$ is not surjective, choose $y\in B\smallsetminus f(A)$. Define $\tilde f:\tilde A\cup\{x\}\to B$ by setting $\tilde f(x)=y$ and $\tilde f(x)=f(x)$ for $x\in\tilde A$. Then $\tilde f\geq f$, but $\tilde f\neq f$, violating the maximality of $f$. So $f$ must be surjective, but then by Theorem 1 it has a right inverse $g:B\to\tilde A$. This $g$ has $\tilde f$ as its left inverse, so again by Theorem 1 we have $g$ must be injective, so that $\lambda\leq\kappa$.
Assume any two cardinals are comparable. Let $\kappa$ be a cardinal, and consider $\aleph(\kappa)$ the least ordinal that cannot be injected into a representative of $\kappa$. We will see that this definition makes sense further down, just accept this is well-posed for the moment. Then either $\kappa\leq|\aleph(\kappa)|$ or viceversa, because of comparability of cardinals, but $\kappa\geq|\aleph(\kappa)|$ would mean there is an injection $\aleph(\kappa)\to X$ for $X\in\kappa$, which violates the definition of $\aleph(\kappa)$. Hence, $\kappa\leq|\aleph(\kappa)$, but then for $X\in\kappa$ there is an injection $X\to\aleph(\kappa)$, and since the latter set is an ordinal, in particular a well-ordered set, it means $X$ can be well-ordered. So for any cardinal, all its representatives can be well-ordered, which is the well-ordering theorem, which we have seen in last post is equivalent to the axiom of choice. This proof is inspired by this Stack Exchange post.
Suppose $|A|\leq|B|$. This means we have an injection $f:A\to B$. This then has a left inverse $g:B\to A$. This is clearly a surjection, as we desired. Choice was not involved here (see the proof of the existence of the left inverse). If we have $f:B\to A$ surjective, then we can find a right inverse $g:A\to B$. When we proved this, we essentially used a choice function for the family $\{f^{-1}(\{b\}):b\in B\}$, so that is where choice comes in. If we prove $g$ is injective, we are done. But $g$ has $f$ as its left inverse, so that it is necessarily injective, completing our proof.

Time to define product and powers of cardinals.

Definition 12: Cardinal operations. Let $\kappa,\lambda$ be cardinals and $X\in\kappa,Y\in\lambda$. Then we define: \begin{align*} \kappa+\lambda:={}&|(X\times\{1\})\cup(X\times\{\{1\}\})| \\ \kappa\lambda:={}&|X\times Y| \\ \kappa^\lambda:={}&|X^Y|. \end{align*}
We have a number of things to verify here.

Theorem 14: Properties of cardinal operations. Again, when choice is required, I start the items with "(CW)".

Cardinal addition is well-defined, as are product and power.
When $\kappa,\lambda$ are finite, these are exactly the operations we know from basic arithmetic.
In general, summing $\kappa$ $n$ times is the same as $n\kappa$.
When $\lambda$ is finite, $\kappa^\lambda=|X^\lambda|$, where $X^\lambda$ means the cartesian product of $\lambda$ copies of $X$.
Addition is commutative and associative.
$\kappa+\lambda\geq\kappa$ and $\kappa+\lambda\geq\lambda$.
(CW) If $\kappa\leq\mu$, then for any $\nu$ we have $\kappa+\nu\leq\kappa+\nu$.
$\aleph_0+\aleph_0=\aleph_0$. In fact, any finite sums of $\aleph_0$s is $\aleph_0$, and (CW) the same holds for any infinite cardinal $\kappa$.
(CW) If $\{A_n\}_{n\in\mathbb N}$ is a family of countable sets, then $\bigcup_nA_n$ is also countable.
(CW) If $\kappa$ or $\lambda$ is infinite, then $\kappa+\lambda=\max\{\kappa,\lambda\}$.
(CW) If $\sigma$ is infinite, there exists $\kappa:\kappa+\mu=\sigma$ if and only if $\mu\leq\sigma$. It is unique and equal to $\sigma$ if and only if $\mu<\sigma$.
0 is the additive identity and 1 the multiplicative one, and $\kappa0=0\kappa=0$.
Multiplication is associative and commutative, and $\kappa\mu\geq\kappa\wedge\kappa\mu\geq\mu$ for all cardinals.
$\kappa\leq\mu$ implies $\kappa\nu\leq\mu\nu$ for any cardinal $\nu$.
$\kappa(\mu+\nu)=\kappa\mu+\kappa\nu$ and $(\kappa+\mu)\nu=\kappa\nu+\mu\nu$.
(CW) If $\kappa$ or $\mu$ is infinite, then $\kappa\mu=\max\{\kappa,\nu\}$.
(CW) If $\kappa$ is infinite and $\mu\neq0$, there exists $\nu:\mu\nu=\kappa$ if and only if $\mu\leq\kappa$. It is unique and equal to $\kappa$ if and only if $\mu<\kappa$.
$\kappa^0=1$ for any $\kappa$.
If $\kappa\geq1$, then $0^\kappa=0$.
$\kappa^1=\kappa$ and $1^\kappa=1$.
$\kappa^{\mu+\nu}=\kappa^\mu\kappa^\nu$, $((\kappa^\mu)^\nu)=\kappa^{\mu\nu}$, and $(\kappa\mu)^\nu=\kappa^\nu\mu^\nu$.
$\kappa^\mu\geq\kappa$ but it is possible that $\kappa^\mu<\mu$.
If $1\leq\nu\wedge\kappa\leq\mu$, then $\nu^\kappa\leq\nu^\mu\wedge\kappa^\nu\leq\mu^\nu$. The last statement doesn't require $\nu\geq1$.
Since $2=\{0,1\}$, clearly $2^\kappa=|\{0,1\}^X|$ for $X\in\kappa$, and by Theorem 12's proof we know this is $|\mathcal P(X)|$.
(CW) If $\kappa,\mu\geq1$ are both finite and $\nu$ is infinite, then $\kappa^\nu=\mu^\nu$.
(CW) If $\kappa$ is infinite and $\mu\neq0$ is finite, then $\kappa^\mu=\kappa$.
(CW) If $\kappa\geq2,\mu\geq1$ and at least one of them is infinite, then $\max\{\kappa,2^\mu\}\leq\kappa^\mu\leq\max\{2^\kappa,2^\mu\}$.
(CW) If $\kappa$ is infinite and $\mu>0$ is finite, then there exists $\nu:\nu^\mu=\kappa$, and $\nu=\kappa$.
(CW) If $\kappa$ is infinite and $\mu>0$ is finite, there need not be $\lambda:\mu^\lambda=\kappa$, but if there is one, it is infinite and $\lambda\leq\kappa$, and any $\nu\geq1$ will also satisfy $\nu^\lambda=\kappa$. Since choice implies cardinals are well-ordered, we can pick the least such $\lambda$ and call it $\log_\mu\kappa$.
(CW) $\aleph_0^{\aleph_0}=2^{\aleph_0}$. In other words, the sequences of integer numbers are in bijection with the subsets of the naturals.
(CW) $\kappa+\lambda=\kappa\lambda$ if both are infinite.

Most of the above is from Wikipedia.

First of all, I accidentally used $|X|+|Y|=|(X\times\{1\})\cup(Y\times\{\{1\}\})|$ below whereas IIRC I defined $|X|+|Y|:=|(X\times1)\cup(Y\times\{1\})|$, but it's trivial to biject the two versions of the sum, as $(x,0)\mapsto(x,1),(y,1)\mapsto(y,\{1\})$ is a bijectioni between the two.

Pick $X,X'\in\kappa,Y,Y'\in\lambda$. We need to prove that $(X\times\{1\})\cup(Y\times\{\{1\}\})$ is in bijection with $(X'\times\{1\})\cup(Y'\times\{\{1\}\})$, and similarly for $X\times Y$ and $X'\times Y'$, and for $X^Y$ and $(X')^{Y'}$. Pick bijections $f:X\to X',g:Y\to Y'$. The map that sends $(x,1)\mapsto(f(x),1)$ and $(y,\{1\})\mapsto(g(y),\{1\})$ is evidently a bijection between the first two sets, with inverse $(z,1)\mapsto(f^{-1}(z),1)$ and $(t,\{1\})\mapsto(g^{-1}(t),\{1\})\}$, the map that sends $(x,y)\mapsto(f(x),g(y))$ is one for the second two sets with inverse $(z,t)\mapsto(f^{-1}(z),g^{-1}(t))$, and the map $h\mapsto f\circ h\circ g^{-1}$ is one for the last two sets with inverse $h\mapsto f^{-1}\circ h\circ g$.
Looking at Lemma 3, what we did was map $\{0,\dotsc,m-1\}$ to itself viewed as a subset of $\{0,\dotsc,n+m-1\}$, and then shifting $\{0,\dotsc,n-1\}$ by $m$ to fit it into that same set together with $\{0,\dotsc,m-1\}$, so that in itself deals with the sum. Combinatorics deals with products and powers.
Let us do this by induction. It is quite obvious that $(X\times\{1\})\cup(X\times\{\{1\}\})$ is in bijection with $X\times\{1,\{1\}\}$: they're literally the same set. $\{1,\{1\}\}$ is in bijection with $2=\{0,1\}$ by $1\mapsto0,\{1\}\mapsto1$, so we ended up with $X\times 2$, proving $\kappa+\kappa=2\kappa$. Assume $\kappa+\dotso+\kappa=(n-1)\kappa$ with the sum being of $n-1$ copies. Then surely the sum of $n$ copies of $\kappa$ features a set which is the union of $Y\times\{1\}$ with $Y$ from the sum of $n-1$ copies and $X\times\{\{1\}\}$ with $X\in\kappa$. In other words, it is $(n-1)\kappa+\kappa$. We used the below associativity of the sum to get to this point, and we use the below distributivity of the product to conclude this is $n\kappa$.
For $\lambda>2$, this is by definition of the cartesian product. For $\lambda=0$ or $\lambda=1$, see below for the powers, and naturally the empty cartesian product makes no sense, so we are left with just $\lambda=1$ out of those two, where $\kappa^\lambda=\kappa$ (see below), and $|X^1|=|X|$ since a function $1\to X$ is uniquely identified by its value on the sole element of 1 which is 0. For $\lambda=2$, we must prove $|X^2|=|X\times X|$. Define $f((x,y)):=(0\mapsto x,1\mapsto y)$. Suppose $f((x,y))=f((x',y'))$. This implies $x=x',y=y'$ because two functions are equal if they take the same value on all elements of the domain, but then $(x,y)=(x',y')$, meaning $f$ is injective. For any $h:2\to X$, note that $f((h(0),h(1)))=h$, so that $f$ is a bijection.
Set $f((x,1)):=(x,\{1\})$ and $f(y,\{1\}):=(y,1)$ for $x\in X,y\in Y$, where $X\in\kappa,Y\in\lambda$. $f$ is evidently a bijection $(X\times\{1\})\cup(Y\times\{\{1\}\})\to(Y\times\{1\})\cup(X\times\{\{1\}\})$, proving commutativity of the sum.
With $X\in\kappa,Y\in\lambda,Z\in\mu$, consider $((X\times\{1\})\cup(Y\times\{\{1\}\})\times\{1\})\cup(Z\times\{\{1\}\})$ and $(X\times\{1\})\cup((Y\times\{1\})\cup(Z\times\{\{1\}\}))\times\{\{1\}\}$. Set $f((x,1,1)):=(x,1),f((y,\{1\},1)):=(y,1,\{1\}),f((z,\{1\})):=(z,\{1\},\{1\})$, and this is a bijection between those two sets, giving associativity of the sum.
If $X\in\kappa,Y\in\lambda$, $(X\times\{1\})\cup(Y\times\{\{1\}\})\in\kappa+\lambda$ contains the sets $X\times\{1\},Y\times\{\{1\}\}$, in bijection with $X$ and $Y$ respectively.
Pick $X\in\kappa,Y\in\mu,Z\in\nu$. Pick $f:X\to Y$ injective. Define $g:(X\times\{1\})\cup(Z\times\{\{1\}\})\to(Y\times\{1\})\cup(Z\times\{\{1\}\})$ by setting $g((x,1)):=(f(x),1)$ and $f(z,\{1\}):=(z,\{1\})$ with $x\in X,z\in Z$. This is clearly an injection.
Define $f((n,1)):=2n$ and $f((n,\{1\})):=2n+1$. This is a bijection $f:(\mathbb N\times\{1\})\cup(\mathbb N\times\{\{1\}\})\to\mathbb N$, thus showing $\aleph_0+\aleph_0=\aleph_0$. Since we already know $n$ copies of $\aleph_0$ sum to $n\aleph_0$, let's prove this is $\aleph_0$. We know it for $2\aleph_0$, so let's go by induction. If $(n-1)\aleph_0=\aleph_0$, then $n\aleph_0=(n-1)\aleph_0+\aleph_0=\aleph_0+\aleph_0=\aleph_0$. The part "this is true for all infinite cardinals" will have to be made to descend from the sum formula below.
Set $B_n:=\{f:\mathbb N\to A_n\text{ bijective}\}$. This is a valid set, since it is specified via a valid formula from $\mathbb N^{A_n}$, which we know to be valid. Use countable choice to find a choice function for that family, which we know consists of nonempty sets. This allows us to order each $A_n$ in a way simultaneously. We can now list $A_1$ in row 1, list $A_2$ in row 2, and so on, and then proceed by zigzagging along this infinite table, going $(1,1),(1,2),(2,1),(3,1),(2,2),(1,3),\dotsc$. This produces a surjection from $\mathbb N$ to $\bigcup A_n$, which we proved equates, under choice, to an injection $\bigcup A_n\to\mathbb N$. Just view a bijection $\mathbb N\to A_1$ as a function taking values in the union, and we have an injection $\mathbb N\to\bigcup A_n$, thus proving our point. Do we need full choice or can we just use countable choice? AFAICT we need full choice for that injection, though the first use of choice only used countable choice… Let's see what they say here.
The full proof of this requires ordinals, so I'll leave it for later when we introduce those. For the moment, I will prove that, whenever $X$ can be split into two sets both in bijection with it, then $|\mathcal P(X)|+|\mathcal P(X)|=|\mathcal P(X)|$, which by 7 implies $|\mathcal P(X)|+|Y|\leq|\mathcal P(X)|$ for all $Y\subseteq\mathcal P(X)$, and the reverse inequality is trivial. So basically this gives us our claim whenever the maximum $\max\{\kappa,\mu\}$ has the power set of such a set as a representative, and in particular this can be seen to work for all cardinals that are $\mathcal P(\mathcal P(\dots(\mathcal P(\mathbb N))))$. This would be all the accessible cardinals if the generalized continuum hypothesis is assumed, and inaccessible cardinals (Wikipedia and Vsauce) are not a necessity. How do we prove that? Well, we prove that we can split $\mathcal P(X)$ itself into two sets both in bijection with it. Let $X=X_{odd}\cup X_{even}$ be our split. Of course, $\mathbb N$ is split this way according to evenness and oddness, which is the reason for those subscripts. Choose any well-order on $\mathcal P(X)$. Surely one exists by choice, or rather by the equivalent well-ordering theorem. Consider $P_{even}:=\{Y\subseteq X:\min Y\in X_{even}\}\cup\{\varnothing\}$ and $P_{odd}:=\{Y\subseteq X:\min Y\in X_{odd}\}$. Constructing a bijection $\mathcal P(X)\to P_{even}$ is easy: pick a bijection $f:X\to X_{even}$ and consider $Y\subseteq X\mapsto f(Y)$. If we work analogously by picking a bijection $g:X\to X_{odd}$, we construct a bijection $\mathcal P(X)\to P_{odd}\cup\{\varnothing\}$. So there is an extra empty set to be gotten rid of. But that is simple: choice gives us (Theorem 10) a countable subset of $P_{odd}$, assuming $P_{odd}$ is finite, but obviously $X$ is (by the split we assumed to exist) and $\mathcal P(X)>X$. Say $A$ is that subset. If $\varnothing\notin A$, add it in, and the result is still countable by simply shifting everything up 1 and making room for $\varnothing$ in the place of 0. So we can assume $\varothing\in A$. To biject $P_{odd}\cup\{\varnothing\}$ into $P_{odd}$, fix $P_{odd}\smallsetminus A$, map $\varnothing\mapsto h(0)$ and $h(i)\mapsto h(i+1)$, where $h:\mathbb N\to A$ is a bijection. And there you have it. In the case of $\mathbb N$, the bijections between $X,X_{even},X_{odd}$ are $n\mapsto2n$ and $n\mapsto2n+1$.
Clearly if there is such a $\kappa$, then by 6 $\sigma=\kappa+\mu\geq\mu$. If $\mu\leq\sigma$, the above (assuming it was proved) tells us $\sigma+\mu=\sigma$, so one such $\kappa$ is $\sigma$. So the first equivalence is proved. If $\mu=\sigma$, again by the above, any $\kappa\leq\sigma$ will do, so non-strict inequality implies non-uniqueness, hence uniqueness implies strict inequality. Assume strict inequality: $\mu<\sigma$. Then, if $\kappa<\sigma$, $\kappa+\mu=\max\{\kappa,\mu\}<\sigma$, using the above again. If $\kappa>\sigma$, clearly $\kappa+\mu=\kappa>\sigma$, so the only solution to that equality is $\kappa=\sigma$.
$\varnothing\times X$ is always empty, because to build an element of the product, we need a function $f:\{0,1\}\to X\cup\varnothing$ such that $f(0)\in\varnothing\wedge f(1)\in X$ (remember we proved above that this is the same as the ordered pairs), but $f(0)\in\varnothing$ is impossible by definition, hence $\varnothing\times X=\varnothing$. This gives us $0\kappa=0$ because if $X\in\kappa$, $\varnothing\times X\in0\kappa$. It also gives us that $(X\times\{1\})\cup(\varnothing\times\{\{1\}\})=X\times\{1\}\cup\varnothing=X\times\{1\}$, so that $\kappa+0=\kappa$ (just pick $X\in\kappa$ and the equality before gives $\kappa+0=\kappa$). Addition is commutative by 5, so $0+\kappa=\kappa+0=\kappa$. Multiplication is commutative by 13, so $\kappa0=0$ too. Take $X\in\kappa$. A representative of $1\kappa$ is $\{1\}\times X$, which is evidently in bijection with $X$ by $(1,x)\mapsto x$. So $1\kappa=\kappa$. By commutativity (see item below), $\kappa1=1\kappa=\kappa$.
Pick $X\in\kappa,Y\in\mu$. Define $f:X\times Y\to Y\times X$ by $f(x,y):=(y,x)$. In other words, $f((x,y)):=(\pi_2((x,y)),\pi_1((x,y)))$. This is definitely a bijection, since it is an involution (i.e. it has itself as its inverse). But $Y\times X$ is a representative of $\mu\kappa$ and $X\times Y$ one of $\kappa\mu$, so that commutativity of multiplication is proved. Pick $X\in\kappa,Y\in\mu,Z\in\nu$. We must prove $(X\times Y)\times Z$ and $X\times(Y\times Z)$ are in bijection. Also, we want to biject them with our definition of our triple product. Set $f(((x,y),z)):=(0\mapsto x,1\mapsto y,2\mapsto z)$. This is a function $f:(X\times Y)\times Z\to X\times Y\times Z$. It is clearly surjective. Suppose $f(((x,y),z))=f(((x',y'),z'))$. This means $0\mapsto x,1\mapsto y,2\mapsto z$ is the same function as $0\mapsto x',1\mapsto y',2\mapsto z'$, and by the extensional definition of function equality we conclude $x=x',y=y',z=z'$, which means $(x,y)=(x',y')$ and also $((x,y),z)=((x',y'),z')$. Bijecting $X\times(Y\times Z)$ with the triple product is analogous. Associativity is thus proved. Pick $X\in\kappa,Y\in\kappa$. $f(x):=(x,y)$ for any fixed $y\in Y$ and $g(y):=(x,y)$ for any fixed $x\in X$ are clearly injections $X\to X\times Y$ and $Y\to X\times Y$ respectively, proving our claim.
Pick $X\in\kappa,Y\in\mu,Z\in\nu$. Pick an injection $f:X\to Y$, which exists by $\kappa\leq\mu$. Set $g((x,z)):=(f(x),z)$. This is an injection $g:X\times Z\to Y\times Z$, proving $\kappa\mu\leq\kappa\nu$.
Pick $X\in\kappa,Y\in\mu,Z\in\nu$. A representative of $\kappa(\mu+\nu)$ is: \begin{align*} X\times((Y\times\{1\})\cup(Z\times\{\{1\}\}))={}&\{(x,(y,1)):x\in X,y\in Y\}\cup\{(x,(z,\{1\})):x\in X,z\in Z\}=(X\times(Y\times\{1\}))\cup(X\times(Z\times\{\{1\}\})={} \\ {}={}&(X\times Y\times\{1\})\cup(X\times Z\times\{\{1\}\})=((X\times Y)\times\{1\})\cup((X\times Z)\times\{\{1\}\}), \end{align*} which is a representative of $\kappa\mu+\kappa\nu$, proving one distributivity. Using this and item 13 (commutativity) above, we have $(\kappa+\mu)\nu=\nu(\kappa+\mu)=\nu\kappa+\nu\mu=\kappa\mu+\kappa\nu$.
Using 13 and 14 above, this reduces to $\kappa\kappa=\kappa$ for all infinite cardinals $\kappa$, since we can commute and assume $\kappa\geq\mu$ and then $\kappa\kappa\geq\kappa\mu\geq\kappa$. Suppose $\kappa=|\mathcal P(X)|$ with $X=X_{even}\cup X_{odd}$ and $e:X\to X_{even},o:X\to X_{odd}$ bijections. Then, given $(f,g)\in\{0,1\}^X\times\{0,1\}^X$, which we know to be in bijection with $\mathcal P(X)\times\mathcal P(X)$, set $h(x):=f(e^{-1}(x))$ if $x\in X_{even}$ and $h(x):=g(o^{-1}(x))$ if $x\in X_{odd}$. This is in $X^{\{0,1\}}$. The map $(f,g)\mapsto h$ just defined has the inverse $h\mapsto(h\circ e,h\circ o)$, so it is a bijection. Thus, $\kappa\kappa=\kappa$. Assuming the generalized continuum hypothesis and the non-existence of inaccessible cardinals like in 10, this gives us our proof, provided we deal with $\aleph_0$. Certainly, this works for the continuum, where $X=\mathbb N$, the subsets are even and odd numbers, and $e(n):=2n,o(n):=2n+1$. Let's deal with $\mathbb N$. Write $\mathbb N\times\mathbb N=\bigcup_n\mathbb N\times\{n\}$, and that is a countable union of countable sets, so it's countable by 9. So $\aleph_0\aleph_0=\aleph_0$. The general case will be proved at the end with ordinals. It is in fact equivalent to choice that $\kappa^2=\kappa$ for all infinite $\kappa$.
Assume such a $\nu$ exists, then $\kappa=\mu\nu\geq\mu$. If $\kappa\geq\mu$, by the above item $\kappa\mu=\kappa$, so that a $\nu$ exists. If $\mu<\kappa$, $\kappa\mu=\kappa$, and for any other cardinal we have strict inequality, either $\mu\nu=\nu>\kappa$ if $\nu>\kappa$, or $\mu\nu=\max\{\mu,\nu\}<\kappa$ if $\nu<\kappa$. So strict inequality grants uniqueness. If $\mu=\kappa$, then all $\nu\leq\kappa$ satisfy $\mu\nu=\kappa$, thus negating uniqueness.
First of all, suppose we have $f,g:\varnothing\to X$, where $X\in\kappa$ and $\varnothing$ represents the cardinal 0. We have no elements on which they differ, so $f=g$. Hence, $\kappa^0$ is either 1 or 0. We defined functions as particular relations, so a function $\varnothing\to X$ must be a special subset of $\varnothing\times X=\varnothing$. This has only one subset: $\varnothing$. Does this count as a function? Well, we sure as heck have no way of disproving it counts, hence it counts. So $\kappa^0=1$.
What is $0^\kappa$? It is the functions $X\to\varnothing$ for $X\in\kappa$. That is empty unless $X=\varnothing$, so we are done.
Pick $X\in\kappa$ and set $x\mapsto(0\mapsto x)$ to have a bijection $X\to X^1$. As for $1^\kappa$, clearly $1^X$ has only one element: the constant function $f(x)=0$ for all $x\in X$. Hence, $1^\kappa=1$.
Pick $X\in\kappa,Y\in\mu,Z\in\nu$. We want to biject $X^{(Y\times\{1\})\cup(Z\times\{\{1\}\})}$ with $X^Y\times X^Z$. The former consists of functions $(Y\times\{1\})\cup(Z\times\{\{1\}\})\to X$. Define the pair $(g,h)$ associated to such a function $f$ as $g(y):=f((y,1)),h(z):=f((z,\{1\}))$. You can easily construct the inverse of this, thus concluding the proof of the bijection. We then want to biject $(X^Y)^Z$ with $X^{Y\times Z}$. Pick $f$ in the second set. Fix $z$ and define $g_z(y):=f(y,z)$. This defines a function $g_z:Y\to X$, that is $g_z\in X^Y$. Now define $g(z):=g_z$. This $g\in(X^Y)^Z$. So $f\mapsto g$ is a map $X^{Y\times Z}\to(X^Y)^Z$. Given $g\in(X^Y)^Z$, define $f(y,z):=(g(z))(y)$. This is evidently the inverse of the map before, so we are done. We lastly want to biject $(X\times Y)^Z$ and $X^Z\times Y^Z$. Pick $f:Z\to X\times Y$. Define $g(z):=\pi_1(f(z)),h(z):=\pi_2(f(z))$. The map $f\mapsto(g,h)$ is a bijection as desired, and its inverse is $(g,h)\mapsto[f(z):=(g(z),h(z))]$.
Pick $X\in\kappa,Y\in\mu$. Fix $y\in Y,x_0\in X$. Set $(f(x))(y):=x$ and $(f(x))(y'):=x_0$ for all $y'\neq y$. This map $f$ is an injection $X\to X^Y$, proving $\kappa\leq\kappa^\mu$. If $\kappa=1<\mu$, then $\kappa^\mu=1<\mu$.
Pick $X\in\kappa,Y\in\mu,Z\in\nu$. We want to find injections $Z^X\to Z^Y$ and $X^Z\to Y^Z$. Pick an injection $f:X\to Y$, as is guaranteed to exist by $\kappa\leq\mu$. $f$ has a left inverse $g$. For $h\in Z^X$, map it to $[y\mapsto h(g(y))\text{ if }y\in f(X),y\mapsto z_0\text{ otherwise}]$, where $z_0\in Z$ is fixed. We thus have a map $Z^X\to Z^Y$. Is this an injection? Well, if $z_0$ exists, then yes, we can compose by $f$ to invert it (or rather, get the left inverse of this thing). So if $\nu\geq1$ as assumed, this construction proves the claim. If $\nu=0$, this works if $\kappa\geq1$ or if $\kappa=\mu=0$, because in the former case we have $\nu^\kappa=0^\kappa=0\leq0=0^\mu=\nu^\mu$, whereas in the latter case $\nu^\kappa=0^0=\nu^\mu$. If $\kappa=0<\mu$, then $\nu^\kappa=0^0=1$ but $\nu^\mu=0^\mu=0<1$. Now define $a:X^Z\to Y^Z$ by $a(h):=f\circ h$. A left inverse of this is given by $b(h):=g\circ h$, since $b(a(h))=g\circ(f\circ h)=(g\circ f)\circ h=\operatorname{id}_X\circ h=h$, so $a$ is an injection.
This bullet is a full proof, so nothing to prove here.
Let's rewrite this: for all $n,m\in\mathbb N\smallsetminus\{0\}$, and for all infinite cardinals $\kappa$, we have $n^\kappa=m^\kappa$. Assume $n\leq m$, which is always possible modulo renaming. By 23 above, we have $n^\kappa\leq m^\kappa$. So we must prove the reverse equality by making use of choice. PYTBO.
Work by induction. If $\mu=1$, $\kappa^\mu=\kappa$ as seen above. A partial proof of $\kappa^2=\kappa$ has been seen above. The rest requires ordinals and is done at the end of the post. Now assume $\kappa^n=\kappa$. Then by associativity of the product we have $\kappa^{n+1}=\kappa^n\kappa=\kappa^2=\kappa$. The base case $\kappa^1=\kappa$ has already been proved, so by mathematical induction we have our claim.
Since $\mu\geq1$ and $\kappa\geq2$, by 23 we know $2^\mu\leq\kappa^\mu$. By 22, $\kappa\leq\kappa^\mu$, so that $\max\{\kappa,2^\mu\}\leq\kappa^\mu$. At least one of $\kappa,\mu$ is infinite. Let's first assume $\kappa$ is infinite and $\mu$ is finite. 26 then gives us $\kappa^\mu=\kappa<2^\kappa$, the last step being because of Theorem 12. Then, $\kappa^\mu\leq\max\{2^\kappa,2^\mu\}$. Assume then that $\kappa$ is finite and $\mu$ is infinite. Then 25 gives us $\kappa^\mu=2^\mu$, so that $\kappa^\mu\leq\max\{2^\kappa,2^\mu\}$. Assume, finally, that $\kappa,\mu$ are both infinite. PYTBO.
This is just a rehashing of 26 in terms of roots. Note that the $\nu$ is unique and is $\kappa$ by 26.
I'm not sure if "greater than 1" meant $\mu>1$ or $\mu\geq1$. $\mu=1$ is an easy case where no $\lambda$ exists: $\mu^\lambda=1$ for any cardinal in that case. If $\mu=2$ and $\kappa=3$, there clearly is no $\lambda$, but $\kappa$ is supposed to be infinite. Inaccessible $\kappa$ obviously provides a counterexample, if one exists. Since $2^\lambda=\mu^\lambda$ by 25 for any infinite $\lambda$ and any finite $\mu$, we can assume $\mu=2$ and the nonexistence of a $\lambda$ amounts to saying $\kappa$ is not obtainable by power set. Oh yeah, that takes care of the "any finite $\nu\geq1$" part, provided we prove $\lambda$ is infinite. Then again, $\kappa$ is infinite and $\mu$ finite, and $\text{finite}^{\text{finite}}$ is finite, so if $\mu^\lambda=\kappa$ then $\lambda$ must be infinite. Given this, we essentially need $2^\lambda=\kappa$, and $2^\lambda>\lambda$, so $\kappa>\lambda$. The last statement in this bullet is that choice implies well-ordering of the cardinal. That is because choice, via the well-ordering theorem, implies we can associate an ordinal to each cardinal in a well-defined way, and since we can prove ordinals are well-ordered, the restricted ordering on the ordinals associated to cardinals is a well-order for the cardinals. Details of this in the part below about ordinals.
$2\leq\aleph_0\leq2^{\aleph_0}$, so $2^{\aleph_0}\leq\aleph_0^{\aleph_0}$ and $\aleph_0^{\aleph_0}\leq(2^{\aleph_0})^{\aleph_0}=2^{\aleph_0^2}=2^{\aleph_0}$. Choice is used via $\aleph_0^2=\aleph_0$, a subcase of 26 which required choice.
It is clear that this follows from 10 and 16 and the transitivity of cardinal equality. However, I want to deduce it differently. Using 26 15 and 13, we see $\kappa+\lambda=(\kappa+\lambda)^2=\kappa^2+\kappa\lambda+\lambda\kappa+\lambda^2=\kappa+2\kappa\lambda+\lambda\geq\kappa\lambda$. If we prove $\kappa\lambda\geq\kappa+\lambda$, we are done. Take disjoint sets $K,L:|K|=\kappa,|L|=\lambda$. Define $F:K\cup L\to K\times L$ by: $$F(p):=\left\{\begin{array}{cc} (p,c_1) & p\in K \\ (b_1,p) & p\in C\smallsetminus\{c_1\} \\ (b_2,c_2) & p=c_1 \end{array}\right.$$ $b_1,b_2\in K,c_1,c_2\in L$ are fixed and distinct. It is easy to see this is an injection, thus completing the proof. This was taken from this Math Stack Exchange question and its answers.

Let me come back to item 4, specifically "the empty cartesian product makes no sense". Actually, we can define it in a surprising way.

Definition 13: Product of an empty family of sets. If $\mathcal F=\varnothing$ is the empty family of sets, we set $\prod\mathcal F:=1=\{\varnothing\}$. Why do we do this seemingly nonsensical thing? Well, this should be the set of choice functions for $\mathcal F$, so it must be a set of functions $\varnothing=\mathcal F\to\cup\mathcal F=\varnothing$, and we know there is only onw function $\varnothing\to\varnothing$, and nothing can yield a disproof that this is a choice function for $\mathcal F$, so $\prod\mathcal F$ should have one element, and we choose to denote it $\varnothing$ as it is the "empty function".

This means that item 4 of the previous theorem now holds also for $\lambda=0$, as on one side we have $\kappa^0$ which we know to be 1, and on the other side we have $|X^0|$, which is the empty product which we just defined to be 1.
Now that we are done with cardinals, it's time to tackle the last topic of this post: ordinals. And this is where I bid noobs and meds goodbye, as what follows is all master-level and will be placed in a spoiler.

And on these notes, I bid you goodbye, and suggest you stay tuned for more! Like constructing $\mathbb R$ from $\mathbb N$, thus ending up with all our numbers just from some axioms and the empty set. That's pretty cool, right?

Bibliography

P.M. Soardi, Analisi Matematica 1 (Mathematical Analysis 1);
English Wikipedia, Zermelo-Fraenkel set theory;
English Wikipedia, Axiom of regularity;
English Wikipedia, Epsilon-induction (more correctly $\in$-induction or membership-induction);
English Wikipedia, Axiom of the empty set;
English Wikipedia, Cardinal number;
English Wikipedia, Axiom of infinity;
English Wikipedia, Ordered pair;
English Wikipedia, Schröder-Bernstein theorem;
English Wikipedia, Axiom of countable choice;
English Wikipedia, Axiom of dependent choice;
English Wikipedia, Forcing;
I. Moerdijk, J. Van Oosten, Sets, Models and Proofs;
Math Stack Exchange, How to prove that from “Every infinite cardinal satisfies $a^2=a$” we can prove that $b+c=bc$ for any two infinite cardinals $b,c$?;
Asaf Karagila, answer to About a paper of Zermelo;
Asaf Karagila, answer to "For every infinite $S$, $|S|=|S\times S|$ implies the Axiom of choice";
ProofWiki, Ordinal Multiplication is Associative;
Math Student 020, Distributivity of ordinal arithmetic;
Alon Amit, answer to Is ordinal multiplication associative? Does it distribute over ordinal addition?;
Andreas Blass, Answer to Epsilon Induction implies Axiom Of Foundation (Or Regularity);
Michele Gorini, Is countable choice enough to prove a countable family of countable sets has a countable union?.

2 comments:

BhmmeAugust 8, 2021 at 11:32 PM
It is a privilege to share this miraculous testimony with the world. This is Beth Walters from England. My husband divorced me 4 months ago and I was full of remorse because I didn't know what to do to change my husband's problems. I searched the internet for help on how I could get help in my marriage and discovered wonderful witnesses to the wight love Spell that was progressive with its spells. I contacted him and Dr. Wight told me that he would prepare a spell for me to bring my husband back. I was skeptical, but I had no choice but to work with him. 2 days later, my husband called me to come home from that day and until now, I have lived peacefully. He returned now with so much love and care. today I am glad to announce to you all that this spell player has the power to bring back lovers and the most surprising thing is that our love is very strong now, every day is happiness and joy. and There is nothing like being with the man you love. I will recommend Dr.Wight to anyone who needs help. If you have any problems, contact Dr.Wight, I guarantee 100% that he will help you !!.
Email: wightmagicmaster@gmail.com

WhatsApp:+17168691327
Ship passing in the nightJanuary 28, 2023 at 3:31 AM
Hi Michele,

Thanks for the great summary, nice to see you citing Alon Amit.

> Principle of Induction.

Can it be declared a "Theorem of Proof by Induction"? Is it provable by reductio ad absurdum, or is it more an axiomatic principle?

I was never much taken by set theory myself, though seeing a recent article on "set holes" or was it "negative sets", which remove items from a host set, and use of this in some new propositions, did catch my eye.

Even though I have huge respect for Georg Cantor, his definition of Aleph_0 and Aleph_1 as cardinal numbers doesn't seem useful to me - I far prefer defining a set of infinities based on something like analytic continuation of normal functions (like division, exponentiation) .

Hey - have you got this great book
The Princeton companion to mathematics / Timothy Gowers, editor ;
June Barrow-Green, Imre Leader, associate editors.
2008

Set theory contributors - keep an eye on these people

Joan Bagaria, ICREA Research Professor, University of Barcelona
set theory

Imre Leader, University of Cambridge (assuming Dept Mathematics)
cardinals , countable and uncountable sets , models of set theory , the independence of the continuum hypothesis

- Nick Nicholas aka SinuCosu

Monthly Mathematics for Masters and Masses

Tuesday, July 24, 2018

Functions, induction, cardinality, ordinals: more abstract set-theoretical stuff

2 comments:

Introduction and blog index

Report Abuse