Tuesday, July 24, 2018

Functions, induction, cardinality, ordinals: more abstract set-theoretical stuff

This may be where I actually start using all three levels distinctly. There will definitely be noob-level stuff, and a lot of master-level stuff, but perhaps I will find some med-level stuff too, who knows. The post has yet to be written. This post is also going to feature one or two spoilers for things to omit on first reading which are master-level or even above that, like, for masters among masters who are brave enough to wade through the ocean of abstraction that those things are.


OK, so in the last post we talked about relations on sets. Now consider the relation of ownership. What set is it defined on? If we want a single set to defined it on, it would have to be the set of people, pets, plants, and things. But that sounds like a big mishmash. It would make way more intuitive sense to consider the set of people and the set of ownable entities (i.e. pets plants things). But we do not have a way to define a relation between different sets… just yet. Of course, the definition is just an adaptation of the definition of relation on a set: couldn't be easier.

Definition 1: Relation between sets. Given two sets \(X,Y\), a relation between \(X\) and \(Y\), or of \(X\) with \(Y\) (in that order), is just a subset of \(X\times Y\). Again, we say \(x\operatorname{R}y\) if \((x,y)\in R\).

Obviously a relation on \(X\) is now a relation of \(X\) with itself.
Now, clearly the definitions of particular classes of relations on sets we saw last time cannot be reintroduced here. But there is a thing we can do with this new definition of relations between sets. Now, in the last post (btw sorry I'm posting this late, I sort of got caught up with other stuff to do first, and with research for this post then), we mentioned "functions" without ever defining them. I'm sure all of you have heard the word, and have an idea of a function which probably has nothing to do with relations, and can be formulated as follows.

Definition 2: Function (naïve definition). A function between two sets \(X\) and \(Y\) is a law that to every \(x\in X\) associates one and one only \(y\in Y\), typically denoted \(f(x)\).

This is a vague term. What is a law, after all? Well, nowhere near as vague as what we gave for "sets", but getting formal there requires a whole new level of abstraction and is way over med-level so we'll get to that later in the post, but I advise you to just skip over it unless you're very well-versed in pure maths, as I consider myself to be. Anyways, it isn't hard to give a more formal definition of a function. If we have that kind of "law", we can easily associate to it a subset \(F\subseteq X\times Y\): just say \((x,y)\in R\iff y=f(x)\). So we need to translate the "one and one only" part of the above definition, and it is done as follows.

Definition 3: function (formal definition). A function between two sets \(X,Y\) is a relation \(F\subseteq X\times Y\) such that for all \(x\in X\) there exists a unique \(y=:f(x)\) such that \((x,y)\in F\).

When functions are thought of as laws, the set in definition 3 is usually called the graph of the function. So you may say that the formal definition of a function is simply identifying a function with its graph. Which makes total sense, since the graph identifies the law and viceversa, as is obvious. Indeed, the graph is often used to paint a picture of the function, especially if you can draw it on paper, as is done ever since middle school in math classes.
From now on, outside the foundational master-level spoiler, the naïve definition will suffice, so we will use that as a basis for properties of functions. I will now start following chapter 2 of Soardi very closely.

Definition 4: Various terms related to functions. Let \(f:X\to Y\). Then:
  • \(f(x)\) will be called image of \(x\) via \(f\);
  • Map will be a synonym for function;
  • Given \(A\subseteq X\), \(f(A):=\{f(x):x\in A\}=\{y\in Y:\exists x\in A:f(x)=y\}\) will be called image of \(A\) via \(f\); it goes without saying that \(f(\varnothing)=\varnothing\), but let's specify it just for the heck of it; if a function is defined on a set which contains sets as members, it might be wise to distinguish images of sets and of elements via notation other than capitalization, so I predispose the notation \(\operatorname{Im}_f(A):=\{f(x):x\in A\}\);
  • Given \(B\subseteq Y\), we call \(f^{-1}(B):=\{x\in X:f(x)\in B\}\) the preimage of \(B\) via \(f\); again, \(f^{-1}(\varnothing)=\varnothing\); as in the item above, I predispose \(\operatorname{Im}^{-1}_f(B):=\{x:f(x)\in B\}\) as a disambiguative notation;
  • \(\operatorname{Dom}f:=X\) is called the domain or definition set of \(f\), and \(\operatorname{Cod}f:=Y\) is the codomain; the image of \(f\) is \(\operatorname{Im}f:=f(X)=\operatorname{Im}_f(\operatorname{Dom}f)\);
  • \(f\) is injective or one-to-one if \(f(x)=f(y)\iff x=y\);
  • \(f\) is surjective or onto if \(f(X)=Y\), or in other words, \(\operatorname{Cod}f=\operatorname{Im}f\);
  • \(f\) is bijective if it is both injective (one-to-one) and surjective (onto);
  • For any set \(X\), the function from \(X\) to \(X\) which maps each element to itself is called identity on \(X\) and denoted by \(\operatorname{id}_X\);
  • If \(f:X\to Y\) and \(g:Y\to Z\), the composition of \(g\) and \(f\), denoted \(g\circ f\), is the function \(X\to Z\) such that \(g\circ f(x)=g(f(x))\);
  • If \(f\) is bijective, we will see it has an inverse, that is there exists \(g:Y\to X\) such that \(f\circ g=\operatorname{id}_Y\) and \(g\circ f=\operatorname{id}_X\); such a \(g\) is denoted \(f^{-1}\), so that \(f^{-1}(y)\), which would normally be the set \(\{x\}\) (where \(f(x)=y\)), is also used to mean just the element \(x\).
  • If \(A\subseteq X\), the restriction of \(f\) to \(a\), denoted \(f|_A\), is the function \(A\to Y\) such that \(f|_A(x)=f(x)\) for all \(x\in A\);
Now we have some simple properties to prove for functions.

Theorem 1: Elementary properties of functions, images, preimages, and compositions. Let \(f:X\to Y\) and \(g:Y\to Z\), \(A,B\subseteq X\) and \(C,D\subseteq Y\). Then:
  1. \(f^{-1}(C\cup D)=f^{-1}(C)\cup f^{-1}(D)\);
  2. \(f^{-1}(C\cap D)=f^{-1}(C)\cap f^{-1}(D)\);
  3. \(f(A\cup B)=f(A)\cup f(B)\);
  4. \(f(A\cap B)\subseteq f(A)\cap f(B)\), and if \(f\) is injective equality is guaranteed;
  5. \(A\subseteq f^{-1}(f(A))\) and equality holds if \(f\) is injective;
  6. \(B\supseteq f(f^{-1}(B))\) and equality holds if \(f\) is surjective;
  7. \(f^{-1}(B^C)=(f^{-1}(B))^C\);
  8. If \(f\) is injective, \(f(A^C)\subseteq(f(A))^C\);
  9. If \(f\) is surjective, \((f(A))^C\subseteq f(A^C)\);
  10. If \(f\) is bijective, \(f(A^C)=(f(A))^C\);
  11. The restriction of an injective \(f\) is still injective;
  12. If we replace the codomain with the image, any \(f\) becomes surjective; in short, any \(f\) is surjective on its image;
  13. If \(f,g\) are bijective, then \(g\circ f\) is also bijective; in fact, injectivity and surjectivity are separately preserved by composition, that is, \(f,g\) are injective, the composition also is, and if \(f,g\) are surjective, so is the composition;
  14. Composition of functions is associative, that is \((h\circ g)\circ f=h\circ(g\circ f)\);
  15. \(f\) is injective if and only if it has a left inverse, i.e. there is \(g:Y\to X\) such that \(g\circ f=\operatorname{id}_X\);
  16. \(f\) is surjective if and only if it has a right inverse, i.e. there is \(g:Y\to X\) such that \(f\circ g=\operatorname{id}_Y\); note: this requires the axiom of choice;
  17. \(f:X\to X\) is bijective if and only if it has an inverse.
Proof.
  1. Let \(x\) be on the left side. Then either \(f(x)\in C\), in which case \(x\in f^{-1}(C)\), or \(f(x)\in D\), in which case \(x\in f^{-1}(D)\). In both cases, \(x\in f^{-1}(C)\cup f^{-1}(D)\). The converse is similarly easy.
  2. If \(x\) is on the left side, \(f(x)\in C\cap D\), which implies \(f(x)\in C\) and \(f(x)\in D\), that is \(x\in f^{-1}(C)\) and \(x\in f^{-1}(D)\), putting it on the right side and proving \(\subseteq\). The reverse inclusion is similarly simple.
  3. Assume \(x\) is on the left. Then there is \(y\in A\cup B\) such that \(f(y)=x\). If \(y\in A\), \(x\in f(A)\), and if \(y\in B\), \(x\in f(B)\). In all cases, \(x\) is on the right, giving \(\subseteq\). If \(x\) is on the right, then either there is \(y\in A:f(y)=x\), or there is \(y\in B:f(y)=x\). In both cases, \(x\) is on the left, giving the desired equality.
  4. If \(x\) is on the left, there is \(y\in A\cap B:f(y)=x\), which puts \(x\) in both \(f(A)\) and \(f(B)\), giving the desired inclusion. If \(f\) is not injective, there may be \(x\in A,y\in B\) such that \(f(x)=f(y)\), so that \(f(x)\in f(A)\cap f(B)\), but \(x\in A\smallsetminus B,y\in B\smallsetminus A\), and if every element in \(f^{-1}(f(x))\) is like that, then \(f(x)\notin f(A\cap B)\). This is clearly the only way to avoid an equality here, so injectivity, which excludes this option, guarantees equality.
  5. If \(x\in A\), \(f(x)\in f(A)\), and any \(y:f(y)=f(x)\) will be in \(f^{-1}(f(A))\), including \(x\), thus proving the desired inclusion. If \(f\) is injective, assume \(x\in f^{-1}(f(A))\). This means \(f(x)\in f(A)\), making it so that there is \(y\in A:f(y)=f(x)\), but injectivity then forces \(x=y\), thus placing \(x\in A\) and proving the reverse inclusion.
  6. If \(x\) is on the right, we have \(y\in f^{-1}(B):f(y)=x\), but then \(f(y)\in B\), which puts \(x\) on the left. If \(x\) is on the left, as soon as there is \(y:f(y)=x\), \(y\) ends up in \(f^{-1}(B)\), so that \(x\) is on the right. This means that the reverse inclusion is guaranteed if \(f\) is surjective, or if \(B\subseteq f(X)\).
  7. If \(x\in f^{-1}(B^C)\), then \(f(x)\notin B\), so \(x\notin f^{-1}(B)\), hence \(f^{-1}(B^C)\subseteq(f^{-1}(B))^C\). If \(x\) is on the right, then \(x\notin f^{-1}(B)\), so \(f(x)\notin B\), hence \(f(x)\in B^C\), but then \(x\in f^{-1}(\{f(x)\})\subseteq f^{-1}(B^C)\) by 6 above, completing the proof.
  8. If \(x\) is on the left, there is \(y\notin A:f(y)=x\), but there cannot be \(z\in A:f(z)=x\) because any such \(z\) would equal \(y\) and lie outside \(A\) by injectivity, so \(x\notin f(A)\), as desired.
  9. If \(x\) is on the left, there is no \(y\in A:f(y)=x\). However, a \(y:f(y)=x\) exists by surjectivity, so such a \(y\) must be in \(A^C\), placing \(x\in f(A^C)\), as desired.
  10. Follows from the two above.
  11. If \(f|_A(x)=f|_A(y)\), then \(f(x)=f(y)\), but \(f\) is injective, so \(x=y\), proving \(f|_A\) is also injective.
  12. By definition of image, for all \(y\in\operatorname{Im}(f)\) there is \(x\in X:f(x)=y\), so \(f:X\to\operatorname{Im}(f)\) is surjective.
  13. Assume \(f,g\) injective. Suppose \(g(f(x))=g(f(y))\). By injectivity of \(g\), we have \(f(x)=f(y)\), but by injectivity of \(f\), \(x=y\), proving injectivity of \(g\circ f\). Assume \(f,g\) surjective. Fix \(z\in Z\). There is \(y\in Y:g(y)=z\), by surjectivity of \(g\). We then have \(x\in X:f(x)=y\), by surjectivity of \(f\). But then \(g(f(x))=g(y)=z\), proving surjectivity of \(g\circ f\). If \(f,g\) are bijective, they are injective and surjective, allowing the previous arguments and making \(g\circ f\) injective and surjective, and thus bijective.
  14. \((h\circ g)\circ f(x)=(h\circ g)(f(x))=h(g(f(x)))=h(g\circ f(x))=h\circ(g\circ f)(x)\).
  15. If \(f\) has a left inverse, suppose \(f(x)=f(y)\), then \(x=g(f(x))=g(f(y))=y\), proving injectivity. If \(f\) is injective, define \(g(y)\) to be the unique element \(x:f(x)=y\) if there exists one, and any element of \(X\) otherwise. \(g\) is a left inverse of \(f\).
  16. If \(f\) is surjective, for any \(y\in Y\) there is \(x\in X:f(x)=y\). Set \(g(y):=x\) if \(f(x)=y\). Then you have a right inverse of \(f\). Choice is obviously required here if \(X\) is infinite, since you need to choose a \(y\) from every \(f^{-1}(x)\). If \(f\) has a right inverse, pick \(y\in Y\), and \(f(g(y))=y\), so that \(f(X)=Y\), or \(f\) is surjective.
  17. Apply item two above with \(X=Y=\operatorname{Dom}f\) and you get a left inverse \(g\). For all \(x\in X\), \(g(f(x))=x\). Pick \(y\in Y\). What is \(f(g(y))\)? We know \(g\) sends it to \(g(y)\), so to conclude it's \(y\), and thus that \(g\) is a right inverse of \(f\) and thus an inverse of \(f\), we just need to prove \(g\) is injective. Suppose \(g(y)=g(y')\). We certainly have \(y=f(x),y'=f(x')\) by surjectivity of \(f\). By the left inverse property, \(x=g(f(x))=g(y)=g(y')=g(f(x'))=x'\), but then \(y=f(x)=f(x')=y'\). Hence, \(g\) is injective, so \(g(f(g(y)))=g(y)\implies f(g(y))=y\), and thus \(g\) is the inverse of \(f\). Oh yes, the inverse, because if \(g,g'\) are inverses of \(f\), by associativity of composition we have \(g=g\circ(f\circ g')=(g\circ f)\circ g'=g'\).
Definition 5: indexations and sequences. Another term used for functions is indexing or indexation. The domain is then called set of indices, and the images \(f(i)\) will be denoted \(x_i\), with \(\{x_i\}_{i\in\operatorname{Dom}}\) denoting both the image and the function. If the domain is the natural numbers, the function is called sequence.

This is where I stop following Soardi: unions and intersections are already defined in the last post, and I mean to save cardinality for later. So let me talk about Mathematical Induction first, then we will delve into axiomatic set theory (in the master-level spoiler), and finally we will get to cardinality and cardinal arithmetic.
Induction, in its two forms, is a great tool to prove a result one has obtained previously via manipulations or conjectures.

Principle of Induction. Suppose \(P(n)\) is a proposition that can be checked for natural numbers. Assume \(P(n_0)\) holds for some \(n_0\in\mathbb N\), and that whenever \(P(n)\) holds, \(P(n+1)\) can also be proved. Then, \(P(n)\) holds for any \(n\geq n_0\).

Principle of Complete Induction. Suppose \(P(n)\) is a proposition that can be checked for natural numbers. Assume \(P(n_0)\) holds for some \(n_0\in\mathbb N\) and that whenever \(p(k)\) holds for \(n_0\leq k\leq n\), then \(P(n+1)\) also holds. Then \(P(n)\) holds whenever \(n\geq n_0\).

Let us see some applications of these principles. Since this is childplay for an average mathematician (and maybe not only for that kind of person), I thought I'd leave the option to just hide it and skip to more interesting stuff. Clicking the button just below will hide the applications of induction I propose in this post.

Theorem 2: sum of the first natural numbers. For any \(n\in\mathbb N\), the sum of the first \(n\) natural numbers is \(\frac{n(n+1)}{2}\). In other words, \(\sum_1^nk=\frac{n(n+1)}{2}\).

Proof.
We use the principle of induction with \(n_0=1\). For \(n=1\), the sum is just 1, and \(\frac{1(1+1)}{2}=\frac{1\cdot2}{2}=\frac22=1\). Suppose the property holds for \(n\). Then: $$\sum_{k=1}^{n+1}k=\sum_{k=1}^nk+n+1=\frac{n(n+1)}{2}+n+1=\frac{n^2+n+2n+1}{2}=\frac{n^2+3n+1}{2}=\frac{(n+1)(n+2)}{2},$$ which is what we want. Induction therefore gives us the theorem.

Theorem 3: Sum of the first \(i2^i\). Let \(n\in\mathbb N\). Then: $$\sum_{k=1}^ni\cdot2^i=2^{n+1}(n-1)+2.$$
Proof.
We use the principle of induction with \(n_0=2\). For \(n=1\) and \(n=2\), this can be easily verified. Suppose it holds for \(n\). Then: $$\sum_{k=1}^{n+1}i2^i=\sum_{k=1}^ni2^i+(n+1)2^{n+1}=2^{n+1}(n-1)+2+(n+1)2^{n+1}=2^{n+1}(n-1+n+1)+2=2^{n+2}(n+1)+2,$$ which is what we wanted. Induction then gives us the theorem.

If you're wondering how I got the formula for this sum, here are my manipulations.

Theorem 4: Sums of cubes. Let \(n\in\mathbb N\). Then: $$\sum_{k=1}^nk^3=\left(\frac{n(n+1)}{2}\right)^{\!\!2}.$$
I found this online via googling.
Proof.
Let me try with induction. \(n_0=1\). That case is easy, just like 0. Suppose it holds for \(n\). Then: \begin{align*} \sum_{k=1}^{n+1}k^3={\!\!}&\left(\frac{n(n+1)}{2}\right)^{\!\!2}+(n+1)^3=(n+1)^2\left(\frac{n^2}{4}+n+1\right)=(n+1)^2\frac{n^2+4n+4}{4}=\frac{(n+1)^2(n+2)^2}{4}={} \\ {}={}&\left(\frac{(n+1)(n+2)}{2}\right)^{\!\!2}, \end{align*} as we wanted.

The last application of induction we see is one of complete induction.

Fundamental Theorem of Arithmetic. Let \(n\in\mathbb N\). Then there is a way to decompose \(n\) as a product of prime numbers.

For algebraists: 1) I am aware of the distinction between prime and irreducible, and that what I should write here is irreducible, but we'll get to that in another post; 2) Yes, the FTOA has a uniqueness part, but induction won't prove that, or at least not as easily as apt for this post.
Proof.
For \(n_0=2\), the statement is obvious. We use complete induction here. Suppose any \(2\leq k\leq n\) admits a decomposition. Look at \(n+1\). If it is prime, it is already decomposed. Otherwise, it can be written \(n+1=hk\) with \(h,k<n+1\). If any of those is less than 2, it will be one, and the other will be \(n+1\), contradicting the inequality \(h,k<n+1\). Hence, by the induction step, \(h,k\) can be decomposed. The decomposition of \(n+1\) is then the concatenation of those of \(h\) and \(k\). Complete induction then gives the theorem.

But can we prove those two forms of induction? Let's start by equating them with a property of the natural numbers.

Theorem 5: Induction and the well-ordering of the naturals. Both the above forms of induction are equivalent to the statement that \(\mathbb N\) is well-ordered under the normal ordering (a statement I'll call WON, Well-Ordering of the Naturals). More specifically, WON implies induction which implies complete induction which implies WON.

Proof.
Assume WON. Suppose the hypotheses of induction are verified for a property \(P\), with base case \(n_0\). By contradiction, suppose induction doesn't hold. The set \(S:=\{n\in\mathbb N:P(n)\text{ is false}\wedge n\geq n_0\}\) is then nonempty, and by WON there is a minimal element \(m\in S\). Suppose \(m>n_0\). This implies \(P(m-1)\) holds, otherwise \(m-1\in S\), which contradicts the minimality of \(m\). But the assumption is that \(P(m-1)\implies P(m)\), so \(m\notin S\), contradiction. Thus, \(S=\varnothing\), which is what induction would have us conclude. Induction is thus proved.
If induction holds, given that complete induction has stronger assumptions than induction, complete induction holds too.
If complete induction holds, consider \(S\subseteq\mathbb N\) a nonempty subset of the naturals. Let \(P(n)\) be "\(n\notin S\)". If \(P(0)\) doesn't hold, we have our minimum: zero. If \(P(0)\) holds, let's prove the induction step. Assume \(k\notin S\) for \(k\leq n\). If \(n+1\in S\), then we have our minimum. If \(n+1\notin S\), then we just showed that, if \(P(k)\) holds for \(k\leq n\), then \(P(n+1)\) holds. By complete induction, \(P(n)\) holds for all \(n\). But then \(S\) is empty, contradiction. The contradiction stemmed from eliminating every chance we had to find a minimum. This implies a minimum must exist, and WON is thus proved.

Theorem 6: Well-ordering of the naturals. The natural numbers are well-ordered under the usual relation. In other words, given \(S\subseteq\mathbb N\) nonempty, there is an element \(x\in S\), called minimum of \(S\) and denoted \(\min S\), such that \(x\leq s\) for all \(s\in S\).

Proof.
\(S\) is nonempty, so there exists \(y\in S\). Consider \(S_y:=\{s\in S:s\leq y\}\). Suppose \(0\in S_y\). Then we have a minimum. Otherwise, if \(1\in S_y\), we have a minimum. Go on like that for at most \(y+1\) steps, and you will find a minimum. Indeed, after \(y+1\) tests, you will be testing \(y\), and you know \(y\in S\), so if \(y\) is not the minimum, you will have found another element of \(S\) before reaching \(y\). This means you will have stopped before getting to \(y\), because as soon as a number is in \(S\), you stop because you have the minimum. So for any \(S\), after a finite number of tests and a single choice, you have a minimum. WON is proved.

At this point, you may be wondering: can we not formulate a "principle of induction" with any well-ordered set? Oh yes we can. In fact, we don't even need a well-order.

Definition 6: Well-founded relation. A relation \(R\subseteq X\times X\) is called well-founded if, for any \(S\subseteq X\) nonempty, there is \(\min S\), that is an element \(s\in S\) such that \(s\operatorname{R}x\) for all \(x\in S\).

Theorem 7: Well-founded induction. Let \(R\) be a well-founded relation on a set \(X\). Then, if a property \(P(x)\) can be proven for \(x_0\), and moreover it can be proven that whenever \(P(y)\) holds for all \(y\neq x:x_0\operatorname{R}y\operatorname{R}x\), then it holds for \(x\), we can conclude that \(P(x)\) holds for all \(x:x_0\operatorname{R}x\).

The proof below is just a rehashing of WON implies induction.
Proof.
Suppose there is a set \(X\) with a well-founded relation \(R\) that satisfies the assumptions of well-founded induction but falsifies well-founded induction. Then, \(S:=x:x_0\operatorname{R}x\wedge\lnot P(x)\}\) is nonempty, so it must have a minimal element \(y_0\). It cannot be \(x_0\) because the assumptions include \(P(x_0)\). Since any \(y\neq x:x_0\operatorname{R}y\operatorname{R}y_0\) is outside \(S\) by definition of minimal element, by the induction assumption we must conclude \(P(y_0)\) holds, which contradicts the fact \(y_0\in S\). This implies \(S=\varnothing\), so that well-founded induction is proved.

Now there are two "relations" that we have been using all the time and haven't talked about at all: inclusion and membership (\(\subseteq\) and \(\in\)). Are those well-founded? Wait wait. What are those defined on again? Well, the set of all sets, of course. Wait. First of all, membership is not defined there… is it? There are elements that are not sets, and sets are not elements, right? Wrong. Or not. It depends. The second things is definitely wrong: any set is a member of its power set. The first thing may be right or wrong depending on whether or not your theory of sets postulates urelements. You will probably be thinking: what about numbers? Those are bound to be urelements, right? Wrong! Feel mindblown yet? You have no idea what I'm going into now. Let Vegeta warn you about it. But don't be scared just yet: there is still one thing that should be noob-level here, and it is why "set of all sets" can lead to problems. You see, when we defined a set in the last post, we were very vague. As far as we know, any collection of things we can either enumeerate or characterize is valid as a set. So let me tell you why this is no bueno.

Russel's paradox. Let \(S\) be the set of all sets that are not a member of themselves. Then \(S\in S\iff S\notin S\).

No, I'm not going crazy. No, I'm not joking or mistyping. This set evidently cannot be a valid set. Indeed, let's prove the paradox.
Proof.
Suppose \(S\in S\). Then \(S\) doesn't contain itself as a member, by definition of \(S\), so \(S\notin S\). Viceversa, if \(S\notin S\), by definition of \(S\) it has to satisfy \(S\in S\), or it would be in \(S\).

And yeah, that last phrase very much illustrates the level of contradictoriness this set has by definition: "it must be a member of itself, otherwise it would belong to itself".
A real-life version of the paradox is the following. Suppose a certain barber decides to shave every person in his village who doesn't shave themselves. Will he shave himself? And the answer is: he will if and only if he won't, just like in the mathematical version \(S\in S\iff S\notin S\).
So how do we get out of this paradox? The answer is axioms. You may have heard this word in Geometry: they are truths that are assumed as self-evident and serve as a starting point for a theory. The same way points and lines are not defined in Geometry, but are assumed as primitive and given relations they satisfy by assumption, the same is done with sets.

And this is where noobs and meds join back in. So the spoiler can be summed up as "Let there be nothing! Let there be possibilities of shenanigans with sets! Oh hello natural numbers!". What follows should be noob-level, if a little more complicated than the average noob-level stuff. While we did those things, we defined the natural numbers in such a way that \(0=\varnothing\) and \(n=\{0,\dotsc,n-1\}\) for any \(n\). Keep that in mind. We are also trying to define addition by \(n+m\) being the \(k\) such that there is a bijection \(f:k\to(n\times1)\cup(m\times\{1\})\), where 1 means \(\{\varnothing\}\) and \(\{1\}=\{\{\varnothing\}\}\). The notation \(S(x)\) means \(x\cup\{x\}\), which is just \(x+1\) when \(x\) is a natural.
Two loose ends are left here (that is, in what I was doing just before the spoiler ended): that for all \(m,n\) there is \(k\) in bijection with \((n\times1)\cup(m\times\{1\})\), and that it is unique.
To prove uniqueness, note that, if \(k\neq h\), we can assume modulo renaming that \(h\in k\), and then via a sufficient number of applications of \(S\) we get \(k=S(S(\dots(S(h))))\), meaning \(h\subseteq k\) and \(h\in k\) but \(h\notin h\) as seen above. OK, that is an interesting property, but we don't need that. Let me just leave it there because why not :).

Lemma 2: Naturals and bijections. Let \(h,k\in\mathbb N\). Then there exists a bijection \(f:k_1\to k_2\) if and only if \(k_1=k_2\).

Proof.
If \(h=k\), pick the identity as a bijection.
Suppose, by way of contradiction, that there are \(h\neq k\) in bijection. Define \(P(k):=\exists h\neq k:h,k\text{ are in bijection}\). This is a valid property, so the set \(\{k\in\mathbb N:P(k)\}\) is valid, it is nonempty by assumption, and so, by the well-ordering we proved above, it must have a minimal element \(k_0\). If \(k_0\) were empty, it could not be in bijection with anything other than itself, but by assumption this is not the case, so \(k_0\neq\varnothing\). Thus, we have \(k_0':k_0=S(k_0')\), and there exists \(\tilde k\neq k_0\) such that \(\tilde k\) and \(k_0\) are in bijection, and of course \(\tilde k=S(k')\). Restrict the bijection \(f:k_0\to\tilde k\) to one \(f|_{k_0'}:k_0'\to\tilde k\smallsetminus\{f(k_0')\}\), which is valid because \(\tilde k\smallsetminus\{f(k_0')\}=\{x\in\tilde k:x\neq f(k_0')\}\). Define now \(g:\tilde k\smallsetminus\{f(k_0')\}\to k'-1\) by setting \(g(x)=x\) if \(x\in f(k_0-1)\) and \(g(x)=S^{-1}(x)\) otherwise. Now, \(g\circ f:k_0'\to k'\) is a bijection, which violates the minimality of \(k_0\), given that \(k_0'=k'\) would imply \(k_0=S(k_0')=S(k')=\tilde k\) which we assumed not to be true. So, we have a contradiction, which gives us that, whenever \(k,h\) are in bijection, they must be equal.

Simplified sum-up: Suppose there are two distinct "numbers" in bijection with each other. If one is 0, we clearly have a contradiction. Otherwise, we can construct a bijection between their predecessors (that is, those numbers minus 1), which means that, if there is no bijection between \(k-1\) and something different than it, \(k\) satisfies the same property. By induction, we conclude our claim holds true.

Lemma 3: Existence of \(k\) in bijection with \((n\times1)\cup(m\times\{1\})\). Suppose \(n,m\in\mathbb N\). Then there is \(k=:m+n\) such that \(k\) is in bijection with \((n\times1)\cup(m\times\{1\})\).

Proof.
Define \(f:n\times1\to n\) by \(f((k,1)):=k\). Define \(f((m,\{1\})):=S(S(S(\dots(S(m)))))\), the successor being taken \(n\) times (that is, the number of times needed to make \(f((0,\{1\}))=n\)). This is clearly an injective function \(f:(n\times1)\cup(m\times\{1\})\to\mathbb N\), and its image is, by construction, \(k\in\mathbb N\).

Let us now define the cartesian product of a family.

Definition 7: cartesian products of families. Let \(\mathcal F\) be a family of sets. We define: $$\prod\mathcal F:=\left\{f:\mathcal F\to\bigcup\mathcal F:x\in\mathcal F\implies f(x)\in x\right\},$$ and call \(\prod\mathcal F\) the (cartesian) product of the family \(\mathcal F\).

And now the question arises: is this empty? Clearly if \(\varnothing\in\mathcal F\) it is, because \(f(\varnothing)\) would need to be in \(\varnothing\), but that is impossible. But what if \(\varnothing\notin\mathcal F\)? If you read last post, you will no doubt be thinking "isn't an element of the product thus defined simply a choice function for \(\mathcal F\)"? And you are totally right. So we finally get to introduce the axiom of choice into this post.

Axiom of Choice - revisited. Let \(\mathcal F\) be a family of sets such that \(\varnothing\notin\mathcal F\). Then \(\prod\mathcal F\neq\varnothing\).

Clearly if \(\mathcal F\) is in bijection with some \(n\), aka is finite, we can construct an element of the product by choosing a finite number of elements. The axiom of choice is required for generality. Of course, we can have a weaker form of choice for "countable" families.

Definition 8: Countably infinite set. Let \(x\) be a set. We say \(x\) is countably infinite if it is in bijection with \(\mathbb N\).

It is easy to see any subset \(\mathbb N_k:=\{x\in\mathbb N:k\in x\}\) (naturals no smaller than \(k\)) is countably infinite: just shift up \(k\) times and you have your bijection. Obviously all sets of numbers with fixed remainder \(r\) when divided by \(k\) are countably infinite by \(f_{k,r}(x):=kx+r\). Can we prove that any subset of \(\mathbb N\) is countably infinite (or finite)? As far as I know, we need the following extra axiom, and the Schröder-Bernstein theorem.

Axiom of Countable Choice. Let \(\mathcal F\) be a countably infinite family of sets. Then \(\prod\mathcal F\neq\varnothing\).

In the terminology of last post, all countable families of sets admit at least a choice function.

Theorem 8: Schröder-Bernstein theorem. Let \(X,Y\) be sets such that there exist \(f:X\to Y\) and \(g:Y\to X\) both injective. Then there exists \(h:X\to Y\) bijective.

Proof.
I simply copy the Wikipedia proof attributed to Julius König.
For any \(a\in X\) or \(b\in Y\), we can produce a unique double-sided sequence: $$\dots,f^{-1}(g^{-1}(a)),g^{-1}(a),a,f(a),g(f(a)),\dots$$ Such a sequence has three options:
  • Terminate on the left with an element of \(X\smallsetminus g(Y)\), in which case we call it an \(X\)-stopper.
  • Terminate on the left with an element of \(Y\smallsetminus f(X)\), in which case we call it a \(Y\)-stopper.
  • Repeat itself, in which case we call it cyclic.
  • Not terminate on the left and not repeat itself, in which case we call it doubly infinite.
By injectivity of \(f,g\), for any \(a\in x\) or \(b\in y\) there is a unique such sequence to which \(a\) or \(b\) belongs. The sequences then form a partition of the disjoint union of \(X,Y\), or of \((X\times1)\cup(Y\times\{1\})\) if we want. To produce our bijection \(h\), we just have to define it on the elements of \(X\) and \(Y\) of each sequence. For an \(X\)-stopper, \(f\) defines a bijection between elements of \(X\) and elements of \(Y\), so if \(a\in X\) belongs to an \(X\)-stopper, we define \(h(a):=f(a)\). With analogous arguments, for \(a\in X\) belonging to a \(Y\)-stopper, we can define \(h(a):=g^{-1}(a)\), and for \(a\) in a doubly infinite or cyclic sequence we can use either \(h(a):=f(a)\) or \(h(a):=g^{-1}(a)\), provided we make a single choice for each such sequence.

Theorem 9: Subsets of the naturals. Let \(A\subseteq\mathbb N\). Then \(A\) is either finite or countably infinite.

Proof
If \(A\) is finite, we are done. If \(A\) is infinite, clearly \(\operatorname{id}_{\mathbb N}|_A:A\to\mathbb N\) is an injection, so by Schröder-Bernstein we simply need to construct an injection \(f:\mathbb N\to A\) to conclude \(A\) is countably infinite. Since \(A\) is not finite, it is not empty, so we can pick \(f(1)\in A\). Since \(A\) is not finite, we can pick \(f(2)\in A\smallsetminus\{f(1)\}\), for if this were empty, \(A=\{f(1)\}\) would be finite, which it isn't by assumption. Rinse and repeat, and for any \(n\) you have \(f(n)\). Here is your injection.


There is a more straightforward way of proving the above, but it requires a stronger form of choice. And noobs can skip ahead of this, which is going to be med-level.

Definition 9: Entire binary relation. A binary relation \(R\subseteq X\times X\) on a set \(X\) is called entire (or serial or total) if for all \(x\in X\) there is at least one element \(y\in X:x\operatorname{R}y\). In other words, if every element is related to at least one element.

Clearly, a reflexive relation is serial: just take \(y=x\). The converse isn't true: \(<\) is serial on \(\mathbb N\) (take y:=S(x)=x+1), but it is not reflexive. However, a serial relation that is symmetric and transitive can, according to Wikipedia, be shown to be reflexive.

Axiom of Dependent Choice. Given a set \(X\) with a relation \(\preceq\), if \(\preceq\) is serial, there exists a sequence \(\{x_i\}_{i\in\mathbb N}\) such that \(x_i\preceq x_{i+1}\) for all \(i\in\mathbb N\).

Lemma 4: The various forms of choice. The full Axiom of Choice implies the Axiom of Dependent Choice which in turn implies the Axiom of Countable Choice.

Proof.
Assume Choice. Let \(X\) be a set and \(R\) an entire relation on \(X\). For all \(x\in X\), we know \(R_x:=\{t\in X:x\operatorname{R}y\}\) is nonempty. Therefore, by Choice, there is a choice function for \(\{R_x:x\in X\}\). We now have \(f:X\to X\) such that \(x\operatorname{R}f(x)\) for all \(x\): just map \(x\) to \(R_x\) first, and then use the choice function. With this, we consider \(x,f(x),f(f(x)),\dots\). This is our sequence. Dependent Choice is them proved.
Assume Dependent Choice, and suppose we have that \(A(n)\) is a set for all \(n\in\mathbb N\). Consider the set of all finite choice functions \(f:n\to\cup\{A(0),\dotsc,A(n-1)\}\). This is valid because the functions \(n\to\bigcup\{A(n)\}\) are valid (note that \(\phi(x,y):=x=n\wedge y=A(n)\) is valid and the \(\phi\)-image of \(\mathbb N\) is \(\{A(n):n\in\mathbb N\}\) which makes \(\bigcup\{A(n)\}\) a valid set by union and specification), so choice functions defined on \(n\) are a valid set by specification, and by union we get all finite chioce functions fall into a set, so by specification the set of all finite choice functions is valid. Define the relation \(f\preceq g\) if \(f\) is defined on \(m\), \(g\) is defined on \(n\geq m\) and \(g|_m=f\). This relation is clearly serial because, given \(f\) defined on \(n\), we can always define \(g\) on \(n+1\) so that \(g|_n=f\). Hence, we have a sequence \(f_0\preceq f_1\preceq\dotso\). Now define \(f(i):=f_{i+1}(i)\). This is a choice function for \(\{A(n)\}\). Countable Choice is thus proved.

I introduced Dependent Choice as a simpler way to prove Theorem 9. I will do much more with it, though the much more can be done with Countable Choice too.

Theorem 10: Every infinite set is at least countable. For any \(X\) that is not finite, there exists an injection \(f:\mathbb N\to X\).

Noobs, you can try cracking Dependent Choice and the Lemma above, if you want, and then you should be able to understand the following proof. Of course, you can simply work this out in a way similar to what I did in Theorem 9.
Proof.
We know we can build injections from \(n\) to \(X\) for any \(n\) by the first argument in Theorem 9. Consider the set of all such injections, which is valid by arguments similar to those used for finite choice functions in Lemma 4. Define on it the relation \(f\preceq g\) if \(g|_{\operatorname{Dom}f}=f\). By Dependent Choice we have a sequence of such functions, and the conclusion is obtained like in Lemma 4.

Corollary 1: Infinity and bijections. Let \(X\) be an infinite set. Then:
  1. There is a countable subset \(A\subseteq X\) such that \(X\smallsetminus A\) is infinite;
  2. For any countable \(A\subseteq X\), \(X\) is in bijection with \(X\smallsetminus A\); in particular, every infinite set is in bijection with a proper subset of its;
  3. A set\(X\) is infinite if and only if it is in bijection with a proper subset of its.

This proof is from Soardi's book (see Bibliography), Lemma 2.8.1 and Corollary 2.8.2.
Proof.
We know there is an infinite (countable) subset \(A\subseteq X\). If \(X\smallsetminus A\) is not infinite, fix a bijection \(f:\mathbb N\to A\) and consider \(B:=f(2\mathbb N)\), the image of even numbers. \(X\smallsetminus B\) contains \(A\smallsetminus B\), which is in bijection with the odd numbers, which are countable. Hence, 1 is proved.
Take \(A\) as in 1. By Theorem 10, there is \(B\subseteq X\smallsetminus A\) which is countable. Fix a bijection \(f:\mathbb N\to A\) and one \(g:\mathbb N\to B\). Define: $$h(x):=\left\{\begin{array}{cc} x & x\in X\smallsetminus(A\cup B) \\ g(\frac{n}{2}) & x=g(n)\text{ with \(n\) even} \\ f(\frac{n-1}{2}) & x=g(n)\text{ with \(n\) odd} \end{array}\right..$$ This is a bijection \(h:X\smallsetminus A\to X\), as desired for 2.
If \(X\) is infinite, there is a proper subset \(A\subset X\) in bijection with \(X\) by 2. If \(X\) is finite, it is in bijection with \(n\), and any subset \(A\subseteq X\) is in bijection with \(m\leq n\) (run over the numbers from 0 to \(n-1\) and, whenever \(A\) skips one, consider the function that shifts all bigger numbers down by 1, compose all the functions you considered, and the image of \(A\) via that composition will have no gaps, and the composition will be injective over \(A\)), so that by Lemma 2 a bijection of \(X\) with a subset of \(X\) would imply the subset is the whole \(X\), since it implies a bijection between \(n\) and the \(m\) that's in bijection with \(A\), and then those must be equal. So the subset can't be proper, and the corollary is thus proved.

It is now time to define cardinals. And also, to define finite and infinite sets, because I don't think I've done that just yet.

Definition 10: Equal cardinality and cardinals. If \(X,Y\) are sets, we say they are equinumerous or equipotent or equipollent, or that they have the same cardinality, if there exists a bijection \(f:X\to Y\). A cardinal is a class of sets with the same cardinality. We write \(|X|=|Y|\) for same cardinality. We say \(|X|\leq|Y|\) if there exists an injection \(f:X\to Y\). We say \(|X|<|Y|\) if \(|X|\leq |Y|\) and \(|X|\neq|Y|\). We say two cardinals \(\kappa,\kappa'\) satisfy \(\kappa\leq\kappa'\) if for all \(X\in\kappa,Y\in\kappa'\) we have \(|X|\leq|Y|\). The same trick extends \(<\) and \(=\) to cardinals. Naturally, this is the same as having the relation for some \(X\in\kappa\) and some \(Y\in\kappa'\), since then it holds for all such \(X,Y\) by using the bijections between elements of the same cardinal.

Naturally, we will call the cardinals of natural numbers the same as the natural numbers. So \(n\) is both the natural number and the matching cardinal. We have proved in Lemma 2 that each natural number has its own cardinal.

Definition 11: Finite cardinals, the cardinal of countability, the cardinal of continuum. The cardinals associated to natural numbers are called finite cardinals, and sets belonging to finite cardinals are called finite. The cardinal of \(\mathbb N\) is called cardinal of countability and denoted \(\aleph_0\) (aleph-zero or aleph-naught or aleph-null), and sets in \(\aleph_0\) are said to be countable (Definition 7) or to have cardinality \(\aleph_0\). The cardinal of \(2^{\mathbb N}\) is called cardinal of continuum, and sets belonging to it have the cardinality of continuum.

Corollary 2: Subsets of countable sets are finite or countable. If \(|X|=\aleph_0\) and \(Y\subseteq X\), then either \(Y\) is finite or \(|Y|=\aleph_0\).

Proof.
Pick a bijection \(f:X\to\mathbb N\), which exists by \(|X|=\aleph_0\), and set \(A:=f(Y)\). Then, \(A\subseteq\mathbb N\). By Theorem 9, either \(A\) is finite, or it is countable. \(f|_Y:Y\to A\) is a bijection, so \(|Y|=|A|\), completing the proof.

Corollary 3: \(\aleph_0\) and order. \(\aleph_0\) is the smallest infinite cardinal. In other words, if a cardinal \(\kappa\) satisfies \(\kappa\geq n\) for all \(n\) finite cardinal, then \(\kappa\geq\aleph_0\).

Proof.
Given \(\kappa\geq n\) for all \(n\), it cannot be finite, as if \(\kappa=n\), we would need \(n=\kappa\geq n+1\) which is not the case. Theorem 10 then tells us any \(X\in\kappa\) has a countable subset \(A\). Pick a bijection \(f:\mathbb N\to A\). This is an injection \(\mathbb N\to X\), proving \(|X|\geq|\mathbb N|\), and therefore \(\kappa\geq\aleph_0\).

So from Theorem 9 on, all the proofs depended on some form of choice, typically via Theorem 9. So the question comes natural: is choice provable in ZF? And the answer is, sadly, no.

Theorem 11: Choice is independent of ZF. The Zermelo-Fraenkel axioms introduced in the spoiler imply neither the axiom of choice nor its opposite. In fact, one can build models of set theory that satisfy ZF, but either satisfy or not satisfy the full Axiom of Choice.

And of course, the proof is done by constructing said models, but that is getting a bit too complex for me, so I just leave you this reference.
Another question comes to mind now: can there be sets with cardinality strictly bigger than countable? And is \(\mathcal P(\mathbb N)\) of a different cardinality than \(\mathbb N\)? Well, yes to both. In fact, there are cardinals as big as you want.

Theorem 12: Power set and cardinality. \(\mathcal P(X)\) always has a strictly greater cardinality than \(X\), whatever set \(X\) we pick. In the case of \(X\) finite, \(|\mathcal P(X)|=2^{|X|}\).

The proof is taken again from Soardi, Theorem 2.8.6 and Corollary 2.8.7. Warning for noobs: this might be a bit above-average, but it should be at your level.
Proof
First of all, define \(f:\mathcal P(X)\to\{0,1\}^X=\{g:X\to\{0,1\}\text{ functions}\}\) by setting \(f(A)(x)\) to be \(1\) if \(x\in A\) and \(0\) otherwise. In short terms, we are mapping a subset to its indicator function. \(f\) is a bijection, as can easily be proved. So we can work with that function set for our proof. Suppose that, for all \(x\in X\), we have \(f_x:X\to\{0,1\}\). Define \(g(x)=0\) if \(f_x(x)=1\) and \(g(x)=1\) otherwise. This is not \(f_x\) for any \(x\in X\). In other words, there exists no map \(X\to\{0,1\}^X\) such that it is surjective. Much less a bijection. Now consider \(x\mapsto\{x\}\). This is clearly an injection \(X\to\mathcal P(X)\), so our first claim is proved.
For the second claim, pick \(f\in\{0,1\}^n\) and consider the sequence \(f(0),f(1),\dotsc,f(n-1)\), which uniquely identifies the function \(f\). How many such sequences are there? We have two choices for \(f(0)\), two choices for \(f(1)\), and so on. In the end, we conclude there are \(2^n\) such sequences, which proves the second claim.

Corollary 4: Another continuum. The set of sequences of zeros and ones has the cardinality of continuum.

Proof.
Well, in the last proof, we proved that \(\mathcal P(X)\) is in bijection with \(\{0,1\}^X\). Apply that to \(\mathbb N\), and \(\{0,1\}^{\mathbb N}\) has the cardinality of continuum, but that is exactly the set in the statement.

At this point, one wonders: is it possible, without or with choice, to establish whether there is any cardinal between \(\aleph_0\) and continuum?

Continuum hypothesis. There is no cardinal satisfying \(\aleph_0<\kappa<|\mathcal P(\mathbb N)|\).

In fact, why stop there?

Generalized continuum hypothesis. Given a cardinal \(\kappa\), there is no cardinal \(kappa'\) satisfying \(\kappa<\kappa'<2^\kappa\), where \(2^\kappa\) is the cardinal of the power set of any \(X\in\kappa\).

Turns out, unfortunately, that both of these statements are independent of ZF and of choice, so no, we can't prove any of this in ZF, whether with or without choice. This independence is proved by forcing, a method developed precisely to prove this independence, which I won't get into because this post is already too long as it is.
It is now time to discuss some properties of cardinal addition, which we only defined for finite cardinals, but can obviously be defined the same way for infinite cardinals. Actually, not quite yet. First, I have some other properties to show, and then, since I want to make a single big theorem about the whole of cardinal arithmetic, I will first define products and powers of cardinals.

Theorem 13: Properties of the cardinal ordering. The following hold, where the ones involving choice are marked with "CW" (Choice Warning):
  1. The order of cardinals has the properties of an order relation.
  2. Cardinal equality has the properties of an equivalence relation.
  3. (CW) The statement "Given any two cardinals \(\kappa,\lambda\), they are comparable, i.e. either \(\kappa\leq\lambda\) or \(\lambda\leq\kappa\)" is equivalent to the axiom of choice.
  4. (CW) \(|A|\leq|B|\) is equivalent to the existence of a surjective \(f:B\to A\).

Proof.
  1. Given any cardinal \(\kappa\), it is evident that \(\kappa\leq\kappa\), since this means that \(X\in\kappa\implies\exists f:X\to X\text{ injective}\) but we know we can find one that is bijective: the identity. If \(\kappa\leq\lambda\wedge\lambda\leq\kappa\), it means that, given \(X\in\kappa,Y\in\lambda\), we have injections \(f:X\to Y,g:Y\to X\), but by Theorem 8 that implies there is a bijection \(h:X\to Y\), so that \(\kappa=\lambda\). If \(\kappa\leq\lambda\wedge\lambda\leq\mu\), given \(X\in\kappa,Y\in\lambda,Z\in\mu\), we have injections \(f:X\to Y,g:Y\to Z\), but then we have seen in Theorem 1 that \(g\circ f:X\to Z\) is an injection, which means \(\kappa\leq\mu\).
  2. \(\kappa=\kappa\) is trivial: pick \(X\in\kappa\) and consider the identity. If \(\kappa=\lambda\), pick \(X\in\kappa,Y\in\lambda\), then there is a bijection \(f:X\to Y\), but we saw in Theorem 1 that \(f\) has an inverse, and that is a bijection \(f^{-1}:Y\to X\), giving \(\lambda=\kappa\). If \(\kappa=\lambda\wedge\lambda=\mu\), then for \(X\in\kappa,Y\in\lambda,Z\in\mu\) we find bijections \(f:X\to Y,g:Y\to Z\), but then \(g\circ f:X\to Z\) is a bijection by Theorem 1, ad this gives \(\kappa=\mu\).
  3. Assume choice. We saw in last post that choice is equivalent to Zorn's lemma, so let's use that. Pick \(A\in\kappa,B\in\lambda\). Consider the set of all partial injections \(A\to B\), that is all injections from subsets of \(A\) into \(B\). Is this a valid set? For any subset \(C\subseteq A\), the functions \(C\to B\) are a valid set (subset of cartesian product which we saw is valid), and the subset of injective functions is also valid by either specification or power set. We can then unite all those and by the axiom of union we have our set. With that, we can define the following order on that set: $$f\leq g\iff\operatorname{Dom}(f)\subseteq\operatorname{Dom}(g)\wedge g|_{\operatorname{Dom}(f)}=f.$$ So \(f\leq g\) means that if \(f(x)\) is defined, so is \(g(x)\), and then \(f(x)=g(x)\). This obviously satisfies the properties of an order, and given a totally ordered subset \(\{f_i\}_{i\in I}\) of functions defined on subsets \(\{A_i\}_{i\in I}\), it is easy to see that an upper bound for \(\{f_i\}_{i\in I}\) is the function with domain \(\bigcup_iA_i\) defined to be equal to \(f_i(x)\) whenever \(x\in A_i\). Thus, by Zorn's lemma, there is a maximal element \(f:\tilde A\to B\). If \(\tilde A=A\), \(f:A\to B\) is an injection, proving \(\kappa\leq\lambda\). Assume \(\tilde A\neq A\). Then there is \(a\in A\smallsetminus\tilde A\). If \(f:\tilde A\to B\) is not surjective, choose \(y\in B\smallsetminus f(A)\). Define \(\tilde f:\tilde A\cup\{x\}\to B\) by setting \(\tilde f(x)=y\) and \(\tilde f(x)=f(x)\) for \(x\in\tilde A\). Then \(\tilde f\geq f\), but \(\tilde f\neq f\), violating the maximality of \(f\). So \(f\) must be surjective, but then by Theorem 1 it has a right inverse \(g:B\to\tilde A\). This \(g\) has \(\tilde f\) as its left inverse, so again by Theorem 1 we have \(g\) must be injective, so that \(\lambda\leq\kappa\).
    Assume any two cardinals are comparable. Let \(\kappa\) be a cardinal, and consider \(\aleph(\kappa)\) the least ordinal that cannot be injected into a representative of \(\kappa\). We will see that this definition makes sense further down, just accept this is well-posed for the moment. Then either \(\kappa\leq|\aleph(\kappa)|\) or viceversa, because of comparability of cardinals, but \(\kappa\geq|\aleph(\kappa)|\) would mean there is an injection \(\aleph(\kappa)\to X\) for \(X\in\kappa\), which violates the definition of \(\aleph(\kappa)\). Hence, \(\kappa\leq|\aleph(\kappa)\), but then for \(X\in\kappa\) there is an injection \(X\to\aleph(\kappa)\), and since the latter set is an ordinal, in particular a well-ordered set, it means \(X\) can be well-ordered. So for any cardinal, all its representatives can be well-ordered, which is the well-ordering theorem, which we have seen in last post is equivalent to the axiom of choice. This proof is inspired by this Stack Exchange post.
  4. Suppose \(|A|\leq|B|\). This means we have an injection \(f:A\to B\). This then has a left inverse \(g:B\to A\). This is clearly a surjection, as we desired. Choice was not involved here (see the proof of the existence of the left inverse). If we have \(f:B\to A\) surjective, then we can find a right inverse \(g:A\to B\). When we proved this, we essentially used a choice function for the family \(\{f^{-1}(\{b\}):b\in B\}\), so that is where choice comes in. If we prove \(g\) is injective, we are done. But \(g\) has \(f\) as its left inverse, so that it is necessarily injective, completing our proof.

Time to define product and powers of cardinals.

Definition 12: Cardinal operations. Let \(\kappa,\lambda\) be cardinals and \(X\in\kappa,Y\in\lambda\). Then we define: \begin{align*} \kappa+\lambda:={}&|(X\times\{1\})\cup(X\times\{\{1\}\})| \\ \kappa\lambda:={}&|X\times Y| \\ \kappa^\lambda:={}&|X^Y|. \end{align*}
We have a number of things to verify here.

Theorem 14: Properties of cardinal operations. Again, when choice is required, I start the items with "(CW)".
  1. Cardinal addition is well-defined, as are product and power.
  2. When \(\kappa,\lambda\) are finite, these are exactly the operations we know from basic arithmetic.
  3. In general, summing \(\kappa\) \(n\) times is the same as \(n\kappa\).
  4. When \(\lambda\) is finite, \(\kappa^\lambda=|X^\lambda|\), where \(X^\lambda\) means the cartesian product of \(\lambda\) copies of \(X\).
  5. Addition is commutative and associative.
  6. \(\kappa+\lambda\geq\kappa\) and \(\kappa+\lambda\geq\lambda\).
  7. (CW) If \(\kappa\leq\mu\), then for any \(\nu\) we have \(\kappa+\nu\leq\kappa+\nu\).
  8. \(\aleph_0+\aleph_0=\aleph_0\). In fact, any finite sums of \(\aleph_0\)s is \(\aleph_0\), and (CW) the same holds for any infinite cardinal \(\kappa\).
  9. (CW) If \(\{A_n\}_{n\in\mathbb N}\) is a family of countable sets, then \(\bigcup_nA_n\) is also countable.
  10. (CW) If \(\kappa\) or \(\lambda\) is infinite, then \(\kappa+\lambda=\max\{\kappa,\lambda\}\).
  11. (CW) If \(\sigma\) is infinite, there exists \(\kappa:\kappa+\mu=\sigma\) if and only if \(\mu\leq\sigma\). It is unique and equal to \(\sigma\) if and only if \(\mu<\sigma\).
  12. 0 is the additive identity and 1 the multiplicative one, and \(\kappa0=0\kappa=0\).
  13. Multiplication is associative and commutative, and \(\kappa\mu\geq\kappa\wedge\kappa\mu\geq\mu\) for all cardinals.
  14. \(\kappa\leq\mu\) implies \(\kappa\nu\leq\mu\nu\) for any cardinal \(\nu\).
  15. \(\kappa(\mu+\nu)=\kappa\mu+\kappa\nu\) and \((\kappa+\mu)\nu=\kappa\nu+\mu\nu\).
  16. (CW) If \(\kappa\) or \(\mu\) is infinite, then \(\kappa\mu=\max\{\kappa,\nu\}\).
  17. (CW) If \(\kappa\) is infinite and \(\mu\neq0\), there exists \(\nu:\mu\nu=\kappa\) if and only if \(\mu\leq\kappa\). It is unique and equal to \(\kappa\) if and only if \(\mu<\kappa\).
  18. \(\kappa^0=1\) for any \(\kappa\).
  19. If \(\kappa\geq1\), then \(0^\kappa=0\).
  20. \(\kappa^1=\kappa\) and \(1^\kappa=1\).
  21. \(\kappa^{\mu+\nu}=\kappa^\mu\kappa^\nu\), \(((\kappa^\mu)^\nu)=\kappa^{\mu\nu}\), and \((\kappa\mu)^\nu=\kappa^\nu\mu^\nu\).
  22. \(\kappa^\mu\geq\kappa\) but it is possible that \(\kappa^\mu<\mu\).
  23. If \(1\leq\nu\wedge\kappa\leq\mu\), then \(\nu^\kappa\leq\nu^\mu\wedge\kappa^\nu\leq\mu^\nu\). The last statement doesn't require \(\nu\geq1\).
  24. Since \(2=\{0,1\}\), clearly \(2^\kappa=|\{0,1\}^X|\) for \(X\in\kappa\), and by Theorem 12's proof we know this is \(|\mathcal P(X)|\).
  25. (CW) If \(\kappa,\mu\geq1\) are both finite and \(\nu\) is infinite, then \(\kappa^\nu=\mu^\nu\).
  26. (CW) If \(\kappa\) is infinite and \(\mu\neq0\) is finite, then \(\kappa^\mu=\kappa\).
  27. (CW) If \(\kappa\geq2,\mu\geq1\) and at least one of them is infinite, then \(\max\{\kappa,2^\mu\}\leq\kappa^\mu\leq\max\{2^\kappa,2^\mu\}\).
  28. (CW) If \(\kappa\) is infinite and \(\mu>0\) is finite, then there exists \(\nu:\nu^\mu=\kappa\), and \(\nu=\kappa\).
  29. (CW) If \(\kappa\) is infinite and \(\mu>0\) is finite, there need not be \(\lambda:\mu^\lambda=\kappa\), but if there is one, it is infinite and \(\lambda\leq\kappa\), and any \(\nu\geq1\) will also satisfy \(\nu^\lambda=\kappa\). Since choice implies cardinals are well-ordered, we can pick the least such \(\lambda\) and call it \(\log_\mu\kappa\).
  30. (CW) \(\aleph_0^{\aleph_0}=2^{\aleph_0}\). In other words, the sequences of integer numbers are in bijection with the subsets of the naturals.
  31. (CW) \(\kappa+\lambda=\kappa\lambda\) if both are infinite.

Most of the above is from Wikipedia.

Let me come back to item 4, specifically "the empty cartesian product makes no sense". Actually, we can define it in a surprising way.

Definition 13: Product of an empty family of sets. If \(\mathcal F=\varnothing\) is the empty family of sets, we set \(\prod\mathcal F:=1=\{\varnothing\}\). Why do we do this seemingly nonsensical thing? Well, this should be the set of choice functions for \(\mathcal F\), so it must be a set of functions \(\varnothing=\mathcal F\to\cup\mathcal F=\varnothing\), and we know there is only onw function \(\varnothing\to\varnothing\), and nothing can yield a disproof that this is a choice function for \(\mathcal F\), so \(\prod\mathcal F\) should have one element, and we choose to denote it \(\varnothing\) as it is the "empty function".

This means that item 4 of the previous theorem now holds also for \(\lambda=0\), as on one side we have \(\kappa^0\) which we know to be 1, and on the other side we have \(|X^0|\), which is the empty product which we just defined to be 1.
Now that we are done with cardinals, it's time to tackle the last topic of this post: ordinals. And this is where I bid noobs and meds goodbye, as what follows is all master-level and will be placed in a spoiler.


And on these notes, I bid you goodbye, and suggest you stay tuned for more! Like constructing \(\mathbb R\) from \(\mathbb N\), thus ending up with all our numbers just from some axioms and the empty set. That's pretty cool, right?

Bibliography
  1. P.M. Soardi, Analisi Matematica 1 (Mathematical Analysis 1);
  2. English Wikipedia, Zermelo-Fraenkel set theory;
  3. English Wikipedia, Axiom of regularity;
  4. English Wikipedia, Epsilon-induction (more correctly \(\in\)-induction or membership-induction);
  5. English Wikipedia, Axiom of the empty set;
  6. English Wikipedia, Cardinal number;
  7. English Wikipedia, Axiom of infinity;
  8. English Wikipedia, Ordered pair;
  9. English Wikipedia, Schröder-Bernstein theorem;
  10. English Wikipedia, Axiom of countable choice;
  11. English Wikipedia, Axiom of dependent choice;
  12. English Wikipedia, Forcing;
  13. I. Moerdijk, J. Van Oosten, Sets, Models and Proofs;
  14. Math Stack Exchange, How to prove that from “Every infinite cardinal satisfies \(a^2=a\)” we can prove that \(b+c=bc\) for any two infinite cardinals \(b,c\)?;
  15. Asaf Karagila, answer to About a paper of Zermelo;
  16. Asaf Karagila, answer to "For every infinite \(S\), \(|S|=|S\times S|\) implies the Axiom of choice";
  17. ProofWiki, Ordinal Multiplication is Associative;
  18. Math Student 020, Distributivity of ordinal arithmetic;
  19. Alon Amit, answer to Is ordinal multiplication associative? Does it distribute over ordinal addition?;
  20. Andreas Blass, Answer to Epsilon Induction implies Axiom Of Foundation (Or Regularity);
  21. Michele Gorini, Is countable choice enough to prove a countable family of countable sets has a countable union?.

2 comments:

  1. It is a privilege to share this miraculous testimony with the world. This is Beth Walters from England. My husband divorced me 4 months ago and I was full of remorse because I didn't know what to do to change my husband's problems. I searched the internet for help on how I could get help in my marriage and discovered wonderful witnesses to the wight love Spell that was progressive with its spells. I contacted him and Dr. Wight told me that he would prepare a spell for me to bring my husband back. I was skeptical, but I had no choice but to work with him. 2 days later, my husband called me to come home from that day and until now, I have lived peacefully. He returned now with so much love and care. today I am glad to announce to you all that this spell player has the power to bring back lovers and the most surprising thing is that our love is very strong now, every day is happiness and joy. and There is nothing like being with the man you love. I will recommend Dr.Wight to anyone who needs help. If you have any problems, contact Dr.Wight, I guarantee 100% that he will help you !!.
    Email: wightmagicmaster@gmail.com

    WhatsApp:+17168691327

    ReplyDelete
  2. Hi Michele,

    Thanks for the great summary, nice to see you citing Alon Amit.


    > Principle of Induction.

    Can it be declared a "Theorem of Proof by Induction"? Is it provable by reductio ad absurdum, or is it more an axiomatic principle?

    I was never much taken by set theory myself, though seeing a recent article on "set holes" or was it "negative sets", which remove items from a host set, and use of this in some new propositions, did catch my eye.

    Even though I have huge respect for Georg Cantor, his definition of Aleph_0 and Aleph_1 as cardinal numbers doesn't seem useful to me - I far prefer defining a set of infinities based on something like analytic continuation of normal functions (like division, exponentiation) .

    Hey - have you got this great book
    The Princeton companion to mathematics / Timothy Gowers, editor ;
    June Barrow-Green, Imre Leader, associate editors.
    2008

    Set theory contributors - keep an eye on these people

    Joan Bagaria, ICREA Research Professor, University of Barcelona
    set theory

    Imre Leader, University of Cambridge (assuming Dept Mathematics)
    cardinals , countable and uncountable sets , models of set theory , the independence of the continuum hypothesis


    - Nick Nicholas aka SinuCosu

    ReplyDelete

Introduction and blog index

Yes, I am the same guy as the one who runs Artistic Translations of Poetry and Songs . That is my hobby: fiddling with languages and a tad o...