Thursday, May 24, 2018

Sets and relations: a leap of abstraction

This post should almost all be noob-level, except for the final part. To avoid having almost everything this big, I will use normal size for noob level and small size for advanced ("mathephysicist") level.
\(\newcommand{\optriangle}{\operatorname{\triangle}}\)

Let us start with a clock. Why, you ask. Well, a clock has an interesting property: every 12 hours, it gives you the same hour number. However, imagine you fix 0 at some point and start counting hours. You will never get the same number again. So the clock is identifying some numbers. Precisely, it is identifying all numbers which differ by a multiple of 12. But that is only part of the story. We have considered hours, but the same kind of identification happens with minutes and seconds, only this time it's every 60 minutes/seconds that the number repeats. Weekdays are another example of such identification. If today is day 1, in 7 days it will be the same weekday, but it will not be day 1 again: it will be day 8, obviously. So weekdays are identifying all numbers that differ by a multiple of 7. And month names identify numbers which differ by a multiple of 12.
Here is a radically different example of the kind of identifications we saw above. Remember the last post, when we had vectors with different application points and we reduced ourselves to a unique application point? There is another way to eliminate the problem of application points, which is also an identification of this kind. We model our set of vectors abstractly as the set of quadruplets \((P,r,\pm1,x)\in\mathbb R^2\times\{\text{lines in the plane}\}\times\{\pm 1\}\times\mathbb R^+\), that is the first element of the quadruplet is a point on the plane, the second one a line, the third one either plus or minus one, where \(+1\) denotes that the vector points up (i.e. has a positive \(y\)-component) or, if \(r\) is the \(x\) axis, points right, and \(x\) is a positive real number, thus application point, then direction, then way, then modulus. We identify the quadruplets which have all elements equal except for the application point. And this is like limiting the application point to the origin.
But how are all those similar? To get to that, we need to do some basic set theory. And while we're at it, we will do more than we need, 'cause why not :).

Definition 1: Set and set operations A set is a collection of objects. If \(A,B\) are two sets, we define: \begin{align*} A\cup B={}&\{x:x\in A\vee x\in B\}=\{\text{elements that belong to either }A\text{ or }B\text{ (or even both)}\}; \\ A\cap B={}&\{x:x\in A\wedge x\in B\}=\{\text{elements that belong to both }A\text{ and }B\}; \\ A\smallsetminus B={}&\{x:x\in A\wedge x\notin B\}=\{\text{elements of }A\text{ that do not belong to }B\}; \\ A\optriangle B={}&(A\smallsetminus B)\cup(B\smallsetminus A). \end{align*} If \(x\) belongs to \(A\), we write \(x\in A\). If every \(x\in A\) is also an element of \(B\), we say \(A\) is included in \(B\), or \(A\) is a subset of \(B\), and write \(A\subseteq B\). \(A\supseteq B\) means \(B\subseteq A\). \(\subseteq,\supseteq\) allow \(A=B\). To exclude that, we write \(A\subset B\) and \(A\supset B\), or even \(A\subsetneq B\) and \(A\supsetneq B\). We say \(A=B\) if \(A\subseteq B\) and \(B\subseteq A\). If \(A\subseteq B\), we define: $$A^C=B\smallsetminus A=\{x\in B:x\notin A\},$$ the set of the elements of \(B\) that do not belong to \(A\), and call it the complement of \(A\) in \(B\).

I suggest you use Euler-Venn diagrams (which I am sure you will be familiar with from way back in primary and/or early secondary school) to paint a picture of what these operations are. Doing so on this post is not very convenient, so I leave this task to you. Time to prove some properties of these operations.

Theorem 1: De Morgan's laws and basic properties of set operations For any three sets \(A,B,C\), we have the following:
  1. \((A\cup B)\cup C=A\cup(B\cup C)\) (associativity of the union);
  2. \(A\cup B=B\cup A\) (commutativity of the union);
  3. \((A\cap B)\cap C=A\cap(B\cap C)\) (associativity of the intersection);
  4. \(A\cap B=B\cap A\) (commutativity of the intersection);
  5. \(A\cup(B\cap C)=(A\cup B)\cap(A\cup C)\) (distributivity of union over intersection);
  6. \(A\cap(B\cup C)=(A\cap B)\cup(A\cap C)\) (distributivity of intersection over union);
  7. \(A\smallsetminus(B\cup C)=(A\smallsetminus B)\cap(A\smallsetminus C)\) (behaviour of set difference w.r.t. union);
  8. \(A\smallsetminus(B\cap C)=(A\smallsetminus B)\cup(A\smallsetminus C)\) (behaviour of set difference w.r.t. intersection);
  9. If \(A,B\subseteq C\), then \(A\smallsetminus B=A\cap(C\smallsetminus B)=A\cap B^C\);
  10. \((A\cup B)\smallsetminus C=(A\smallsetminus C)\cup(B\smallsetminus C)\) (behavior of set difference w.r.t. union 2.0);
  11. \((A\cap B)\smallsetminus C=(A\smallsetminus C)\cap(B\smallsetminus C)\) (behavior of set difference w.r.t. intersection 2.0).
Intuitive Proof. Euler-Venn diagrams.
Formal Proof.
  1. Say \(x\in(A\cup B)\cup C\). Then either \(x\in C\), or \(x\in A\cup B\). Suppose the former. Then \(x\in C\subseteq(B\cup C)\subseteq A\cup(B\cup C)\), and we're done. Suppose \(x\in A\cup B\). Then it's either in \(A\) or in \(B\). If the former, it's clearly in the right-side set. If the latter, then it is in \(B\cup C\subseteq A\cup(B\cup C)\). We have thus proved \((A\cup B)\cup C\subseteq A\cup(B\cup C)\). An analogous argument proves the other inclusion, thus yielding 1.
  2. This is obvious.
  3. This can be proved in a manner very similar to that by which 1. is proved.
  4. This is obvious
  5. Suppose \(x\in A\cup(B\cap C)\). Then either \(x\in A\), or \(x\in B\cap C\). If the former, then \(x\in A\cup B\) and \(x\in A\cup C\), so it belongs to their intersection which is the right-side set. If the latter, then again \(x\in A\cup B\) because \(x\in B\), but also \(x\in A\cup C\) because \(x\in C\), yielding \(A\cup(B\cap C)\subseteq(A\cup B)\cap(A\cap C)\). The other inclusion is proved similarly.
  6. The proof of this is very similar to that of 5.
  7. If \(x\) is in the left-side set, then it is in \(A\) but not in \(B\cup C\), which means it is in neither \(B\) nor \(C\), but then it is in \(A\) but not in \(B\) and hence it's in \(A\smallsetminus B\), and similarly it is in \(A\smallsetminus C\), thus proving the inclusion \(\subseteq\). Suppose \(x\) is in the right-side set. Then it is in both \(A\smallsetminus B\) and \(A\smallsetminus C\), so it's in \(A\) but not in \(B\) nor in \(C\), hence it's on the left-side set.
  8. This can be proved by arguments virtually identical to those of 7.
  9. This is evident: on the left, we have "in \(A\) but not in \(B\)", on the right, we have "in \(A\) and in (\(C\) but not \(B\))", but the requirement of being in \(C\) is implicit in that of being in \(A\), hence what the intersection really says is "not in \(B\)", giving 9.
  10. Suppose \(x\) is on the left. Then it is either in \(A\) or in \(B\), but not in \(C\). If the former, it is in \(A\) but not \(C\), hence in \(A\smallsetminus C\), hence on the right. If the latter, it is in \(B\smallsetminus C\) by the same argument. If \(x\) is on the right, then it is in one of those two sets, if it's in the first, then it's in \(A\) but not in \(C\), so it's in \(A\cup B\) but not in \(C\), hence it's on the left, and if it's in the second set, we conclude similarly.
  11. This is proved exactly like the above 10.
While we're at it, let me digress a little and talk about the power set.

Definition 2: Power Set Let \(A\) be a set. We denote by \(\mathcal{P}(A)\), or \(2^A\), the set of subsets of \(A\), and call it the power set of \(A\).

The immediate question is: why "power set"? And why \(2^A\)? Take \(S\in 2^A\), that is \(S\subseteq A\). Associate to it the indicator of \(S\), that is the function \(f_S:A\to\{0,1\}\) such that \(f_S(x)=1\) if \(x\in S\) and \(f_S(x)=0\) otherwise. This produces a map \(b:2^A\to\{\text{functions from }A\text{ to }\{0,1\}=\{0,1\}^A\). It is clear that this is invertible, i.e. for every \(f\in\{0,1\}^A\) there is a unique \(S_f\) such that \(f_{S_f}=f\), and it is the set \(\{x\in A:f(x)=1\}\). This means that the two sets \(2^A\) and \(\{0,1\}^A\) have the same number of elements, or more technically the same cardinality (defined by bijections in a future post). If \(A\) is finite, the latter set has \(2^n\) elements, where \(n\) is the number of elements of \(A\), because you have to choose between 0 and 1 \(n\) times, one for each element of \(A\), to define a function \(A\to\{0,1\}\). So if we denote the number of elements of a set by \(|A|\), we have proved \(|2^A|=2^{|A|}\). This justifies the name and the notation.
Now we finally get to our object of interest. To define it, we first need to define cartesian products.

Definition 3: Cartesian products Given two sets \(A,B\), the cartesian product of \(A\) and \(B\), denoted \(A\times B\), is the set of pairs \((a,b)\) with \(a\in A\) and \(b\in B\). Another way to see this is the set of functions \(f:\{1,2\}\to A\cup B\) such that \(f(1)\in A,f(2)\in B\). This is more abstract, but it generalizes more readily to infinite cartesian products: just take \(\mathbb N\), or \(\mathbb R\), or even "greater" sets, as the base space, let's call it \(I\), suppose you have a family of sets \(A_i\) for \(i\) in your base space, and your cartesian product is defined as: $$\prod_{i\in I}A_i:=\left\{f:I\to\bigcup_iA_i:f(i)\in A_i\text{ for all }i\in I\right\}.$$
Of course, for a finite number \(n\), it holds that \(|A_1\times\dotso\times A_n|=|A_1|\cdot\dotso\cdot|A_n|\), as can be verified by combinatorial arguments such as those employed above for \(\{0,1\}^n\}\).
Remember how we started by examples of "identifications"? Suppose we have a set \(A\) and an "identification". We can view it as a law that, given a pair \((a,b)\in A\times A\), tells us whether \(a\) is identified with \(b\) or not, for example by returning 1 if \(a\) is identified with \(b\), and 0 otherwise. But we have just seen that such a function can be identified with a subset of \(A\times A\). So we give the following definition.

Definition 4: Relations and their properties Let \(A\) be a set. A relation on \(A\) is a subset \(R\subseteq A\times A\), which we can view as a law telling us whether two elements are related or not. We write \(a\operatorname{R}b\) for \((a,b)\in R\). A relation \(R\) is said to be:
  • Reflexive if \(a\operatorname{R}a\) for all \(a\in A\);
  • Symmetric if, for all \(a,b\in A\), \(a\operatorname{R}b\) implies \(b\operatorname{R}a\);
  • Antisymmetric if \(a\operatorname{R}b\) and \(b\operatorname{R}a\) happens only for \(a=b\);
  • Transitive if, for any \(a,b,c\in A\), \(a\operatorname{R}b\) and \(b\operatorname{R}c\) imply \(a\operatorname{R}c\).

An
equivalence relation is a relation that is reflexive, symmetric, and transitive. A (partial) order relation, or simply a (partial) order, is a relation that is reflexive, antisymmetric, and transitive.

All the identifications from the beginning are, of course, equivalence relations, as you can easily prove. Now let's have fun with case examples!
  • The relation "person A lives on a different floor than person B", defined on the set of people who live in, say, a skyscraper, is not reflexive, because a person obviously lives on the same floor as themselves, it is not transitive, because if A and C live on the same floor but B doesn't, then A is related to B and B is related to C, but A is not related to C, but it is symmetric, of course; credits to Quora user Roddy MacPhee for this one;
  • The relation "is a sibling of" is not reflexive, because A is A and not a sibling of A, is symmetric for obvious reasons, and is transitive because I mean only full siblings, not step-siblings; credits to Quora user Kenneth Ganning for this example;
  • The relation "A is B's father" is not reflexive (unless you travel back in time to fudge your mother of course), is not symmetric (unless your son travels back in time to fudge your grandmother of course - and Fry from Futurama knows something about that), and is not transitive;
  • The relation "\(a<b\)" on the set of real numbers is not reflexive because it's strictly less than and not less than or equal to, it's not symmetric because if \(a<b\) \(b>a\) which excludes \(b<a\), and is transitive because \(a<b,b<c\implies a<c\);
  • The relation "would win or tie with at Rock, Paper, Scissors", defined on the set \(\{R,P,S\}\), which is simply \(\{(P,R),(P,P),(S,P),(S,S),(R,S),(R,R)\}\), is reflexive because we made it so by the "or tie" part, is not symmetric (in fact, it is antisymmetric), and is not transitive because e.g. Rock wins over Scissors, Scissors win over Paper, but Rock doesn't win over Paper; credits to Quora user Kenneth Ganning for this one;
  • Any equivalence relation is symmetric, reflexive, and transitive; e.g. the clock relation, or any of those at the beginning;
  • The relation \(a\leq b\) is reflexive and transitive, but it is antisymmetric, and this excludes symmetry except for subsets of the identity relation, as we shall see shortly, and given this is not a subset of the identity, it cannot be symmetric;
  • The relation "A is related to B if either A is married to B or A and B have the same surname" is reflexive because any A has the same surname as themselves, is symmetric because sharing a surname and being married are symmetric things, but is not transitive because I don't share my mother's surname nor am married to her, but I share my father's surname and he is married to her; obviously I meant the "née" surname, the pre-matrimonial surname.

Remark 1: symmetry and antisymmetry Suppose a relation \(R\) is both symmetric and antisymmetric. Then it is a subset of the identity relation, i.e. \(x\operatorname{R}y\) implies \(x=y\). The converse also holds

Indeed, suppose \(a\operatorname{R}b\). Then by symmetry we have \(b\operatorname{R}a\), but this by antisymmetry implies \(a=b\). The converse is obvious.

Lemma 1: characterization of equivalence relations Let \(A\) be a set and \(R\) a relation on \(A\). Then \(R\) is an equivalence relation if and only if there exist a set \(B\), a function \(f:A\to B\), and an equivalence relation \(S\) on \(B\) such that \(a\operatorname{R}b\iff f(a)\operatorname{S}f(b)\).

Proof. If \(R\) is already an equivalence relation, you just choose \(B=A\) and \(f\) the identity \(f(x)=x\), and of course \(S=R\). Suppose we have \(B\), \(f\), and \(S\). Then \(f(a)\operatorname{S}f(a)\) for all \(a\) since \(S\) is reflexive, but then \(a\operatorname{R}a\), meaning \(R\) is reflexive. The same kind of argument proves symmetry and transitivity, giving us that \(R\) is an equivalence relation.

In fact, we can always choose \(B,f\) so that \(S\) is the identity relation. To prove this, we need to define the concept of quotient w.r.t. a relation. First of all, consider: $$[x]_R=\{y\in A:y\operatorname{R}x\}.$$ Definition 5: equivalence class Given a set \(A\) with a relation \(R\), the set \([x]_R\) defined above is called equivalence class of \(x\) w.r.t. \(R\).

Lemma 2: equivalence classes form a partition of a set The equivalence classes for a relation \(R\) over a set \(A\) form a partition of \(A\), i.e. they cover all of \(A\) and are pairwise disjoint. In other words, every element of \(A\) is in one class and only one class.

Proof. That they cover all \(A\) is obvious: by reflexivity, every \(x\) is in its own class \([x]_R\). Suppose now that \([x]_R\cap[y]_R\neq\varnothing\), i.e. that two classes intersect. We want to prove they coincide. Pick \(z\in[x]_R\cap[y]_R\). We then have \(z\operatorname{R}x,z\operatorname{R}y\). By symmetry, \(x\operatorname{R}z\), but then by transitivity \(x\operatorname{R}y\), i.e. \(x\in[y]_R\). By transitivity, this in fact implies any \(t\in[x]_R\) is in \([y]_R\), so that \([x]_R\subseteq[y]_R\). The same argument in reverse gives us the other inclusion, completing the proof.

We can therefore give the following definition.

Definition 6: quotient set, canonical projection, set of representstives Let \(A\) be a set and \(R\) a relation over \(A\). The set: $${}^A/_R:=\{[x]_R:x\in A\}$$ of the equivalence classes of \(R\) is called quotient of \(A\) w.r.t. \(R\). It is then natural to define a map \(\pi:A\to{}^A/_R\) by setting \(\pi(x):=[x]_R\). This map \(\pi\) is called the (canonical) projection on the quotient. A common way to represent a quotient is to pick \(S\subseteq A\) so that \(\pi|_S\) is bijective, or in other words, for all \(x\in A\), there is one and one only \(z_x\in S\) such that \(z\operatorname{R}x\). Such an \(S\) is called a set of representatives for \(R\), and once one is picked, the classes will usually be denoted by their representatives.

So yeah, sets of sets galore :). We said we could choose \(B,f\) so that \(S\) would be the identity, and here we are: \(B\) is the quotient, \(f\) the projection, and \(S\) is obviously the identity. To give some examples of sets of representatives, in the case of the clock relation at the beginning, we have the numbers 1-12, or 0-11 if you prefer. A 24-hour clock is another such relation, for which the set of representatives are the numbers 0-23. Weekdays and month names gave a name to each class. Quotient vector spaces are also quotients: the relation is \(v\operatorname{R}w\iff v-w\in W\) where \(W\) is the "denominator" of the quotient. As we have seen, this is a case where a structure on the bigger set (the vector space \(V\)) induces a similar structure on the quotient. We will see many such cases in future posts.
With that, I think I am done with equivalence relations, and can get to orders and related statements. And here starteth the real trouble: if you thought the above was hard, think again, for it is childplay in comparison to what we are going to dive into now. But without further ado, let's give a couple definitions.

Definition 7: posets and chains A set \(A\) is called partially ordered set (short form: poset) if equipped with a relation \(R\) which is a partial order as defined above. A total order is a partial order which satisfies the additional condition that, for any two elements \(a,b\in A\), either \(a\operatorname{R}b\) or \(b\operatorname{R}a\). A totally ordered set is a poset whose order is a total order. We will denote order relations by \(\preceq\), so that \(a\preceq b\) means \(a\operatorname{R}b\).

Let me give you some examples of posets.
  • The set \(\mathbb R\) is a totally ordered set with respect to the ordinary inequality relation \(\leq\), i.e. \(a\preceq b\iff a\leq b\);
  • The set \(\mathbb N\) of positive integers is a partially ordered set with respect to \(n\preceq m\iff n\mid m\), where \(n\mid m\) means \(\frac{m}{n}\in\mathbb N\), that is \(n\) divides \(m\); this is not a total order, since \(2\nmid 3\) and \(3\nmid 2\);
  • The set of people in the world with \(a\preceq b\) if \(a\) is younger than or the same age as \(b\) is a totally ordered set;
  • The plane \(\mathbb R^2\) with \((x,y)\preceq(x',y')\) if either \(x=x'\) and \(y\leq y'\) or \(y=y'\) and \(x\leq x'\) is a poset, but not a total order, for \((0,1),(1,0)\) are not comparable;
  • Given a set \(A\), the power set \(2^A\) is a poset with respect to inclusion, i.e. \(S\preceq T\iff S\subseteq T\); it is also a poset with respect to the reverse inclusion, i.e. \(S\preceq T\iff S\supseteq T\); neither of these are total orders, unless your set has at most one element.
Definition 8: chains in a poset, upper bounds, least upper bounds, and maximal elements Given a poset \(A\) with relation \(\preceq\), a chain in \(A\) is a subset \(S\subseteq A\) such that \(\preceq\) is a total order on \(S\). Given any subset \(S\), an upper bound of \(S\) is an element \(x\in A\) such that \(s\preceq x\) for all \(s\in S\). If \(x\) is an upper bound of \(S\) and all other upper bounds \(y\) of \(S\) satisfy \(x\preceq y\), then \(x\) is a least upper bound for \(S\). If \(x\in S\) satisfies \(x\not\prec y\) for all \(y\in S\), we say it is a maximal element of \(S\). We may or may not use the notation \(S\preceq x\) to mean \(x\) is an upper bound of \(S\).

If an upper bound of \(S\) lies in \(S\), it is a maximal element and a least upper bound, and it is the unique maximal element. Indeed, suppose \(x,y\in A\) are both maximal elements and upper bounds of \(S\). Then \(x\in S\) and \(y\) is maximal, hence \(x\preceq y\), but \(y\in S\) and \(x\) is maximal, so \(y\preceq x\). By antisymmetry of \(\preceq\), we have \(x=y\). Of course, an upper bound that is a maximal element is a least upper bound, for any other upper bound has to be greater, given that this maximal element lies in \(S\). We can now state Zorn's Lemma.

Zorn's Lemma (ZL) Let \(A\) be a poset w.r.t. \(\preceq\). Suppose any chain \(S\subseteq A\) admits a least upper bound. Then, \(A\) itself admits a maximal element.

There are a number of equivalent formulations of this statement that apparently have nothing to do with it. The aim of the rest of this post is to first state them, and then prove their equivalence. First of all, I remark that, given a poset \(A\), the set of chains of \(A\) is itself a poset with respect to inclusion. Therefore, it makes sense to wonder whether there exists a maximal chain on \(A\). Well, an equivalent statement to Zorn's lemma says yes.

Hausdorff Maximality Principle (HMP) For any poset \(A\), there exists a maximal chain on \(A\). In other words, the set of chains of \(A\) w.r.t. the relation on \(A\) admits a maximal element w.r.t. inclusion.

To state the next equivalent statement, we need to define what a well-order is.

Definition 9: well-orders Given a set \(A\), a well-order on \(A\) is an order such that every nonempty subset \(S\subseteq A\) admits a minimal element. A minimal element is, of course, the "opposite" of a maximal element, i.e. \(x\in S\) such that \(x\preceq y\) for all \(y\in S\).

Note that \(\mathbb R\) is not well-ordered under \(\leq\), because not all nonempty subsets have a minimum, though they have an infimum. Any order on a finite set will be a well-order, but what about infinite sets? Is it even certain that there will be a well-order on an arbitrary infinite set?

Well-Ordering Principle (WOP) Every set \(A\) admits a well-order.

The last equivalent statement to Zorn's lemma requires a bit more pre-work. We need to formalize the notion of choice. Imagine you have a number of sets \(A_1,\dotsc,A_n\), and you want to choose an element from each. Whatever choice you make, you are defining a function \(f:\{1,\dotsc,n\}\to A_1\cup\dotso\cup A_n\) such that \(f(i)\in A_i\) for all \(i\): just call \(f(i)\) the element you chose from \(A_i\), and you have your function.

Definition 10: choice function Given a set \(I\) and a family of sets \(A_i\) for \(i\in I\), a choice function for the family \(\{A_i\}_{i\in I}\) is a function \(f:I\to\bigcup_iA_i\) such that \(f(i)\in A_i\) for all \(i\in I\). A choice function for a single set \(A\) is a function \(f:2^A\smallsetminus\{\varnothing\}\to A\) such that for all \(S\subseteq A\) we have \(f(S)\in S\), or in other words it is a choice function for \(2^A\smallsetminus\{\varnothing\}\).

It seems natural enough to assume that choice functions always exist.

Axiom of Choice (AC) For any set \(A\), there exists a choice function for \(A\). Equivalently, given any family \(\{A_i\}\), there is a choice function for that family.

Indeed most mathematicians think this is true. However, this apparently innocent statement has a very surprising and counter-intuitive consequence. Before stating it, let's quickly show the equivalence. If the first formulation holds, then pick \(A=\bigcup A_i\), choose a choice function for \(A\), and restrict it to the family \(\{A_i\}\). If the second formulation holds, \(2^A\smallsetminus\{\varnothing\}\) is a family of sets, and thus admits a choice function, which is then a choice function for \(A\).
Let us see the incredible consequence of this axiom.

Banach-Tarski paradox There exists a way to partition a 3D ball into sets that, if opportunely reassembled, come to form two balls both identical to the starting ball.

WHAAAAAAAAT?! Yeah, you read that correctly: from one ball, two balls, both identical to the starting one. A magical ball duplication, in other words. This is complicated to prove. Here is the Vsauce explanation. Actually, Wikipedia says Banach and Tarski proved much, much more. Apparently, taking any two bounded subsets of \(\mathbb R^n\), if \(\geq3\), we can reassemble one into the other by rotations, translations, and reflections. So wanna chop up one ball to make a billion? Yep, you can. Wanna chop a cube into a hundred cubes and twenty balls? You can. Pretty cool if you ask me. Unfortunately Wikipedia only proves the ball duplication one, and I don't think we'll find the original Banach-Tarski article online, and I don't feel like searching once more, so this is all you get. Naturally, all these chopping-ups are not practically possible, but only theoretically doable, because the partitions involved are too horribly-shaped and fine to be actually obtained manually, so you canot actually chop a ball into pieces and get two balls.
But now it is time for the last statement in this post.

Theorem 2: the 4-way equivalence ZL, HMP, WOP, and AC are all equivalent.

And the proof of this concludes the post. This proof is only for mathephysicists, or even just mathematicians, as it is very abstract and hard to wrap your head around, so if you don't adore Math, be encouraged to stop here. If, however, you are brave enough to face this wonder of abstraction, put on your glasses, and perhaps draw out your magnifying glass, for the font size is about to change.
First of all, we need a few definitions and a fundamental lemma from here, then we will prove AC implies ZL following that reference.

Definition 11: initial segments, sections, closed posets Let \((T,\preceq)\) be a poset. A subset \(S\subseteq T\) is an initial segment of \(T\) if for all \(s\in S\) we have \(r\in S\) whenever \(r\prec s\). The section of \(t\in T\), denoted \(T_t\), is the set: $$T_t=\{s\in T:s\prec t\}.$$ Every section is obviously an initial segment, but not all initial segments are sections. For example, \(T\) is never a section of itself, but is always an initial segment. A poset \((T,\preceq)\) is closed if every chain in \(T\) admits a least upper bound.

The assumption of ZL is then simply that the poset be closed.

Lemma 3: initial segments of well-ordered closed posets Let \((T,\preceq)\) be a well-ordered closed poset. Then, calling \(\Sigma\) the set of initial segments of \(T\) which are chains, \(\Sigma\) is well-ordered by inclusion. Specifically: $$\min\{S_i:i\in I\}=T_{\min\{g(S_i)\}}\vee\min\{S_i\}=T_{\min\{g(S_i)\}}\cup\{g(S_i)\},$$ where \(g\) is the least upper bound function on the chains of \(T\).

Proof. First of all, notice that \(S\smallsetminus\{g(S)\}=T_{g(S)}\) for any \(S\in\Sigma\). Indeed, all \(x\prec g(S)\) belong to \(S\), partly by definition of initial segment, and partly because the only exception to this could be a \(y\prec g(S)\) which is an upper bound, but \(g(S)\) is the least upper bound, so that can't happen. Therefore, \(S\smallsetminus\{g(S)\}=\{x:x\prec g(S)\}=T_{g(S)}\). Now, \(\min\{g(S_i)\}\) is a minimum, so it belongs to the family \(\{g(S_i)\}\). This means there is an \(\overline\imath\) such that \(g(S_{\overline\imath})=\min\{g(S_i)\}\). If \(g(S_{\overline\imath})\in S_{\overline\imath}\), then either \(S_{\overline\imath}\smallsetminus\{g(S_{\overline\imath})\in\{S_i\}\), which means the minimum is \(T_{S_{\overline\imath}}=S_j\) for some \(j\), or not, and then \(S_{\overline\imath}\) is the minimum. If \(g(S_{\overline\imath})\notin S_{\overline\imath}\), the minimum is \(T_{g(S_{\overline\imath})}=S_{\overline\imath}\).

Definition 12: special poset A poset \((T,\preceq)\) is called special if it admits a function \(f:T\to T\) such that \(f(t)\succeq t\) for all \(t\in T\). If \(T\) is special and closed, call \(g\) the least upper bound function on the chains. An \(fg\)-chain in \(T\) is a well-ordered chain \(C\) such that, for all \(t\in C\), we have \(f(g(C_t)\!)=t\). Note that \(f(g(C_t)\!)\) makes sense because a section of a chain is again a chain.

I made up this term for ease of presentation, so don't look it up 'cos you probably won't find it :).

Lemma 4: on \(fg\)-chains of special closed posets Let \(T\) be a special closed poset with special function \(f\) and least upper bound function \(g\). Then the set of \(fg\)-chains is totally ordered. More specifically, given two \(fg\)-chains \(C,D\), one is always an initial segment of the other. This set is also closed under taking initial segments, i.e. an initial segment of a \(fg\)-chain is again an \(fg\)-chain.

Proof. To prove the third statement, we note that an initial segment \(S\) of an \(fg\)-chain \(C\) is certainly a chain. Its sections \(S_t\) are then also chains, and are definitely sections of \(C\), hence \(fg(S_t)=fg(C_t)=t\).
The second statement is the hardest one, and implies the first one. Let \(C,D\) be \(fg\)-chains. By Lemma 3, since \(D\) is a well-ordered closed chain, its initial segments \(\Sigma(D)\), which are all chains, are well-ordered by inclusion. Let us now consider \(\Sigma_C(D):=\Sigma(D)\smallsetminus\Sigma(C)\). This is a subset of \(\Sigma(D)\), so either it is empty, or it has a minimum. If it is empty, then \(\Sigma(D)\subseteq\Sigma(C)\), but \(D\in\Sigma(D)\), which makes \(D\) an initial segment of \(C\). If it is nonempty, set: $$A:=\min\Sigma_C(D).$$ Suppose \(A\) has no maximal element. Then \(A_a\cup\{a\}\) is a proper initial segment of \(A\) for all \(a\), and thus an initial segment of \(C\) by definition of \(A\), but \(a\in A_a\cup\{a\}\), which gives \(A\subseteq C\). \(A=C\) is not possible since \(A\) is not an initial segment of \(C\) while \(C\) is an initial segment of itself. So \(C\smallsetminus A\neq\varnothing\). Suppose there is \(x\in C\smallsetminus A\) such that \(x\preceq a\) for some \(a\in A\). Then \(x\) should be in \(A\), since \(A_a\) is an initial segment of \(C\), so we have a contradiction. This means \(C\) must contain an upper bound of \(A\), otherwise we would have exhausted all possibilities and would be forced to conclude \(A\) does have a maximum. Now, \(C\) is an \(fg\)-chain, so it is well-ordered. This means that, if we take the set of upper bounds of \(A\) in \(C\), it admits a least element. Call it \(t\). Then, \(C_t=A\), because \(A\subseteq C\) and \(t\) is the least upper bound of \(A\) in \(C\), so nothing can be between \(A\) and \(t\) in \(C\). But \(A\) is not an initial segment of \(C\), which \(C_t\) clearly is. Thus, we have both proved and disproved that \(A\subset C\), which is a contradiction. Said contradiction stems from assuming \(A\) has no maximum, so we conclude \(A\) must have a maximum. Let \(t:=\max A\). \(A_t\in\Sigma(C)\cap\Sigma(D)\), clearly. If \(A_t=C\), \(C\) is a initial segment of \(D\), and we are done. Since we have assumed there is at least an initial segment of \(D\) that is not an initial segment of \(C\), we need to show \(A_t\neq C\) is impossible, for the case \(D\) initial segment of \(C\) is already excluded. Suppose \(A_t\neq C\), and call \(u:=\min(C\smallsetminus A_t)\). Then: $$D_t=A_t=C_u.$$ Indeed, \(A_t\subseteq D_t\) is evident, and the reverse inclusion stems from \(A\) being an initial segment. That \(A_t=C_u\) is because \(A_t\subseteq C_u\) is clear, and the reverse inclusion stems from the definition of \(u\), for if \(C_u\smallsetminus A_t\neq\varnothing\), any element in there would be between \(t\) and \(u\), and yet in \(C\), contradicting the definition of \(u\). \(C,D\) are \(fg\)-chains, so that: $$t=fg(A_t)=fg(C_u)=u.$$ Since \(t=\max A\), \(A=A_t\cup\{t\}\), but since \(A_t=C_u,t=u\), we conclude \(A=C_u\cup\{u\}\), making \(A\) an initial segment of \(C\), and contradicting its definition.

Fundamental Lemma Let \((T,\preceq)\) be a closed special poset with special function \(f\). Then \(f\) admits a fixed point, i.e. there exists \(t_0\in T:f(t_0)=t_0\).

Proof. To find a fixed point of our \(f\), let \(S\) be the union of all the \(fg\)-chains of \(T\). Then:
  • If \(x,y\in S\), then \(x\in C,y\in D\) for \(C,D\) two \(fg\)-chains; by Lemma 4 above, either \(C\) is an initial segment of \(D\), or viceversa; in either case, both \(x\) ad \(y\) belong to a single \(fg\)-chain, thus they are comparable, and \(S\) is totally ordered;
  • In fact, \(S\) is well-ordered; to see this, first we consider the sets \(C_x:=\{X\subseteq T:X\text{ is an }fg\text-chain\wedge x\in X\}\), which are nonempty for all \(x\in R\) (in fact, for all \(x\in S\)), and then we use the axiom of choice to produce a choice function \(c:R\to\bigcup_RC_x\); you may be thinking "but wait, don't we need this lemma to prove AC equates to those other formulations?", and you'd be right, but luckily this is only used to prove AC implies ZL, so we can safely use AC to prove this lemma; if anyone can come up with a working proof of this well-ordering that doesn't use AC, I'm all ears; in any case, let's set \(F_x:=c(x)\); fixing an \(x\in R\), assuming it is not a minimum of \(R\) (otherwise the proof would be over), we can be sure that \(\Sigma_x:=\{(F_y)_y:y\preceq x\wedge x\in R\}\) is totally ordered and well-ordered; indeed, it is totally ordered because of Lemma 4, which gives that two chains are one an initial segment of the other, and since these are then all initial segments of \((F_x)_x\), which is an \(fg\)-chain, these are initial segments of a closed well-ordered closed poset, so by Lemma 3 they are well-ordered; this guarantees the existence of \(\overline x\in R\) such that \((F_{\overline x})_{\overline x}=\min\Sigma_x\); \(\overline x\) is then the desired minimum of \(R\), for if there were \(\overline y\in R:\overline y\prec\overline x\), we would certainly have \(\overline\preceq x\) by transitivity and hence \((F_{\overline y})_{\overline y}\in\Sigma_x\); moreover \((F_{\overline x})_{\overline x}\subseteq(F_{\overline y})_{\overline y}\) would be impossible, thus violating the minimality of \((F_{\overline x})_{\overline x}\); indeed, if those two sets were equal, by applying \(g\) to both and using that \(F_{\overline x},F_{\overline y}\) are \(fg\)-chains by choice, we would conclude \(\overline y=\overline x\), which isn't a thing; a strict inclusion would imply the existence of \(z\in(F_{\overline y})_{\overline y}\smallsetminus(F_{\overline x})_{\overline x}\), which would have to satisfy \(z\prec\overline y\), but could not satisfy \(z\prec\overline x\), or it would end up in \((F_{\overline x})_{\overline x}\), which is not a thing; so \(\overline y\prec z\preceq\overline x\), whereas we assumed \(\overline y\prec\overline x\), which is a contradiction;
  • \(S\) is an \(fg\)-chain, i.e. \(fg(S_t)=t\) for all \(t\in S\); indeed, if \(t\in S\), there is an \(fg\)-chain \(S':t\in S'\), but then \(fg(S'_t)=t\), and \(S'_t=S_t\) because if \(x\in S_t\) then it is in an \(fg\)-chain \(S''\), and \(S''_x\) will be an \(fg\)-chain and hence an initial segment of \(S'_t\), so \(x\in S'_t\), and the reverse inclusion is because \(S'_t\) is an \(fg\)-chain and \(S\) is the union of all \(fg\)-chains; but then \(fg(S_t)=fg(S'_t)=t\);
  • No proper superset of \(S\) can be an \(fg\)-chain, for all \(fg\)-chains are, by definition of \(S\) contained in \(S\).

\(t_0:=fg(S)\succeq g(S)\), so it is an upper bound of \(S\). Suppose \(t_0\notin S\). Then if \(\tilde S:=S\cup\{t_0\}\), we have \(\tilde S_t=S_t\) for all \(t\in S\), so that \(fg(\tilde S_t)=fg(S_t)=t\), and \(fg(\tilde S_{t_0})=fg(S)=t_0\). Thus, \(\tilde S\) is an \(fg\)-chain, but \(\tilde S\supsetneq S\), contradicting bullet 4. Thus, \(t_0\in S\), hence \(t_0\preceq g(S)\), making \(fg(S)=t_0=g(S)\), thus allowing to conclude \(g(S)\) is the fixed point of \(f\) we were looking for.

Proof of the big equivalence.
Suppose \(T\) has no maximal element. Then for all \(t\), \(S_t:=\{u:t\prec u\}\) is nonempty. By AC, there exists \(f:T\to T:f(t)\in S_t\) for any \(t\), but this means \(f(t)\succ t\), yet such a function must have a fixed point by the Fundamental Lemma, and this one has none, contradiction. So there is a maximal element after all.
Let \(O\) be the set of pairs \((A,\preceq)\) such that \(O\subseteq T\) and \(\preceq\) is a well-order on \(O\). Say that \((O,\preceq)\preceq(O',\preceq')\) if \(O\subseteq O'\) and \(\preceq'|_O=\preceq\). This is easily seen to be an order. It is easy enough to show that, given a chain \(\{(O_i,\preceq_i)\}\), the least upper bound is \((\bigcup_iO_i,\tilde\preceq)\) where \(\tilde\preceq|_{O_i}=\preceq_i\) for all \(i\). By ZL then the set \(O\) has a maximal element, which will be a well-order. If this well-order \(X,\preceq_X\) does't satisfy \(X=T\), say \(x\in T\smallsetminus X\), then \((X\cup\{x\},\preceq_X')\succ(X,\preceq_X)\), where \(\preceq_X'|_X=\preceq_X\) and \(x\) is maximal for \(X\cup\{x\}\) under \(\preceq_X'\), contradicting the maximality of \((X,\preceq_X)\). Hence, \(T\) is well-ordered. \(f(S)=\min S\), where the minimum is w.r.t. any well-order (and one exists by WOT), is a well-defined choice function for \(2^T\smallsetminus\{\varnothing\}\), thus proving AC. \(T\) admits a maximal chain by HMP. Say this chain is \(X\), then \(X\) has its least upper bound \(x\). If \(x\notin X\), \(X\cup\{x\}\) contradicts the maximality of the chain \(X\). If anything in \(T\) is greater than \(x\), add it to \(X\), and you contradict the maximality of \(X\) once more. Hence, \(x\) is maximal, and ZL is proved.
ZL implies HMP
The set of chains is easily shown to be closed w.r.t. inclusion, thus implying it has a maximal subset, and completing our proof.
And with all this abstraction, I bid you goodbye and suggest: stay tuned for more!

No comments:

Post a Comment

Introduction and blog index

Yes, I am the same guy as the one who runs Artistic Translations of Poetry and Songs . That is my hobby: fiddling with languages and a tad o...