Note: I believe this post is almost entirely noob level, so I'm deviating from the font size conventions and writing this all in normal font, which would be med level.
Vectors. Many of you will remember these guys from High School Physics, when they were a model for forces and other quantities. Remember those little arrows having a point of application, representing what was pushed, a direction, which was the term for the line the motion took place along, a way, which was the way the object was pushed in on that line, and a modulus, representing the strength of the push? Well those 4 things were not the whole story. Sure, a single vector was identified by these, but operations could be done on vectors. Indeed, just with those 4 things, it is natural to define \(av\), for \(a\geq0\) and \(v\) a vector, as the vector with the same point of application, direction, and way, but having modulus \(a\) times the modulus of \(v\): \(|av|=a|v|\). It is also pretty natural to define multiplication by \(-1\) as changing a vector's way and keeping all the other properties fixed. This way, we have constructed a product by a scalar, as the mathematical term goes, which is something that, given a real number and a vector, "multiplies them" into another vector, or formally, calling \(V\) the space of vectors, is a map \(\cdot:\mathbb R\times V\to V\) sending \((a,v)\mapsto\cdot(a,v)=a\cdot v\). Vectors could also be summed together, if the application points coincided. Let us consider only vectors with the same application point. Then we have a sum defined on them, which is formally a map \(+:V\times V\to V\) which takes \((v,w)\mapsto+(v,w)=v+w\). But there is more: the sum has special properties, and it has special relations with the product by a scalar. Let me now give the definition of a real vector space in linear algebra.
Definition 1: real vector space A real vector space is a set \(V\) with a binary operation \(+:V\times V\to V\) and an external operation \(\cdot:\mathbb R\times V\to V\) which satisfy the following for any \(a,a_1,a_2\in\mathbb R,v,v_1,v_2,v_3\in V\):
Our high school vectors clearly satisfied all these properties. Of course, two questions arise when seeing this definition:
As for question 2, the point is we are implicitly relying on \(\mathbb R\) to give us numbers to multiply the vectors by. Why fix ourselves on \(\mathbb R\)? Or in other words, what do we need our set \(K\) of "numbers", typically called scalars, to have defined on it in order to be content? Let's look at those properties.
\(\mathbb R\) has another property that will come in handy, and which we require for no apparent reason: that if \(a\neq0\) there exists \(\hat a\) such that \(a\hat a=\hat aa=1\). With that, we define what a field is.
Definition 2: Field A field is a triple \((K,+,\cdot)\) where \(+,\cdot:K\times K\to K\) satisfy the following:
Let me give you a few quick remarks:
If you think back to middle school, you will probably find a set of numbers that is a field, though you never called it with that name: I'm talking about fractions, the ratios of integers, which we know how to multiply and sum, which only need flipping to be inverted, and where the operations are the same as in \(\mathbb R\) so all the required properties are obviously satisfied. So here is a field different from \(\mathbb R\): \(\mathbb Q\).
Physicists and mathematicians will certainly have met another example of field, still probably not calling it a field. I'm talking about the set of complex numbers. The complex numbers arise in Maths from the inconvenience of not being able to solve certain polynomial equations, most notably \(x^2+1=0\). We all know that "minus times minus equals plus", aka "a negative times a negative equals a positive", so \(x^2=-1\) cannot be solved in \(\mathbb R\). However, studying fields in Algebra, it turns out the so-called algebraically closed fields, which are fields where all polynomials are either constant or have at least one zero, have some convenient properties. Thus, we want to build an algebraically closed "superfield" of \(\mathbb R\). How do we do this? Well, the first step is inventing this weird guy called \(i\), the imaginary unit, which is defined to satisfy \(i^2=-1\). We then define \(\mathbb C:=\{a+bi:a,b\in\mathbb R\}\), with sums and products working as is natural. Turns out this is indeed a field, it contains \(\mathbb R\) by definition, and surprise surprise, it is algebraically closed. And note that I'm not just saying every polynomial with real coefficients has at least one zero in \(\mathbb C\), but also polynomials with complex coefficients (that is, coefficients in \(\mathbb C\)) have at least one zero in \(\mathbb C\), provided they are not constant. This is the content of the so-called Fundamental Theorem of Algebra, which will be proved in another post. In any case, this field \(\mathbb C\) is called the complex field and its elements are the complex numbers.
This field will come in very handy later in the series, when we discuss the spectral theorem. There is also another way to construct \(\mathbb C\). We take \(\mathbb R^2\) as our starting set. We define: $$(a,b)+(c,d)=(a+c,b+d),\qquad\qquad(a,b)\cdot(c,d)=(ac-bd,ad+bc).$$ This produces a field (you can verify it as an exercise). Now define: $$f(a,b)=a+ib.$$ This is a map from \(\mathbb R^2\) to \(\mathbb C\), and it respects both product and sum: \begin{align*} f((a,b)+(c,d))={}&f(a+c,b+d)=a+c+i(b+d)=(a+ib)+(c+id)=f(a,b)+f(c,d); \\ f((a,b)\cdot(c,d))={}&f(ac-bd,ad+bc)=ac-bd+i(ad+bc)=(a+ib)\cdot(c+id)=f(a,b)\cdot f(c,d). \end{align*} This tells us the two ways of constructing \(\mathbb C\) give us essentially the same thing, algebraically speaking.
If we do the same thing to \(\mathbb Q\), we get \(A=\{a+ib:a,b\in\mathbb Q\}\). This is also a field with the operations defined like those of \(\mathbb C\), but it is not algebraically closed, as \(\sqrt2\notin A\) and it is a solution of \(x^2-2=0\). There is a field between \(\mathbb Q\) and \(\mathbb C\) which is algebraically closed and does not contain all of \(\mathbb R\), but I will avoid going into that. Be content with knowing that it's called field of algebraic numbers, consists of precisely the complex number that are solutions to polynomial equations with rational coefficients, and contains this field \(A\) we introduced just now.
To conclude our examples of fields, we have the basic finite fields. Let me start with a special case. Consider a set of two elements, \(a,b\), and define: $$a+a=a,\,\,a+b=b,\,\,b+a=b,\,\,b+b=a,\qquad a\cdot a=a,\,\,a\cdot b=a,\,\,b\cdot a=a,\,\,b\cdot b=b.$$ This produces a field. The elements are typically called 0 and 1, because \(a\) is the neuter element of the sum and \(b\) that of the product, and this is the field \(\mathbb F_2\), or \(\mathbb Z/2\mathbb Z\). I will get into that notation in a future post about algebra.
Taking any prime \(p\), we can define a field called \(\mathbb F_p\) or \(\mathbb Z/p\mathbb Z\). Consider the set \(\{0,1,2,\dotsc,p-1\}\). We can easily restrict the operations of \(\mathbb R\) to it, but some sums will go out of that set. To fix this problem, we decide that, if that happens, we take the result, and take away as many \(p\)s as is needed to get back into that set. For anyone familiar with modular arithmetic, we are reducing the results modulo \(p\). For example, with \(p=3\), the operations become: \begin{align*} 0+0=0 && 0+1=1 && 0+2=2 \\ 1+0=1 && 1+1=2 && 1+2=0 \\ 2+0=2 && 2+1=0 && 2+2=1 \\ 0\cdot0=0 && 0\cdot1=0 && 0\cdot2=0 \\ 1\cdot0=0 && 1\cdot1=1 && 1\cdot2=2 \\ 2\cdot0=0 && 2\cdot1=2 && 2\cdot2=1. \end{align*} All of these are actually examples of a much broader class of quotient sets where the starting set has an algebraic structure that can easily induce an analogous one onto the quotient, but we will deal with those creatures in Algebra posts later on.
Now let's get back to vector spaces. First of all, we observe that our arrows can actually be identified with their tips, so that they reduce to points on a plane or in space. Indeed, high school also taught us to decompose vectors as sums of "component vectors", which give us the coordinates of the tips. So we reduced our arrows to \(\mathbb R^2\) (plane) or \(\mathbb R^3\) (3D space), with the sum done component by component, and the product by scalar also done component-wise. But why limit ourselves to exponents 2 and 3? It is clear enough that, given any integer \(n\), \(\mathbb R^n\), with the operations defined the same way, is a real vector space. By the way, \(\mathbb R\) is also a real vector space.
We have introduced different fields, so we would like examples of vector spaces using other fields as scalars. First, we need something to call such weird beings.
Definition 3: Vector space over \(K\). Given a field \(K\), a vector space over \(K\), or \(K\)-vector space, is a set \(V\) with operations \(+:V\times V\to V,\cdot:K\times V\to V\) which satisfy all properties of definition 1 if \(\mathbb R\) is replaced by \(K\) anywhere it appears.
At this point, it will be clear that, for any field \(K\) and integer \(n\), \(K^n\) with component-wise operations will be a \(K\)-vector space. We will soon see that these are not just the obvious examples, but for now let me leave it at that.
A more exotic idea is to view \(\mathbb R\) as a vector space not over itself, but over \(\mathbb Q\). After all, real numbers have a sum with the needed properties, and the product of a rational by a real will give the rest of the properties of vector spaces.
To get more creative, we can consider \(K^{n\times m}\) and rearrange the entries into a table with \(n\) rows and \(m\) columns or viceversa. The first choice will give us the space \(M_{n\times m}(K)\), and the second one \(M_{m\times n}(K)\). Both of these are vector spaces over \(K\).
The most exotic but very widely used idea I can think of is that of functions spaces. At its simplest, we can easily see how the set \(\{f:\mathbb R\to\mathbb R\}\) can be made into a vector space over \(\mathbb R\) by defining: $$(f+g)(x)=f(x)+g(x),\qquad\qquad(af)(x)=af(x).$$ Of course, we don't really need the functions to be defined on \(\mathbb R\): any set \(X\) will do. Such vector spaces will be denoted \(\mathbb R^X=\{f:X\to\mathbb R\}\). And again, if we take functions that take their values on another field \(K\), we will get a \(K\)-vector field. Naturally, we can have our functions take values not in \(K\), but in any other \(K\)-vector space, and still get a \(K\)-vector space of functions. The typical thing done in more advanced maths is to consider functions with special properties, but we will get to that in future posts.
That was quite a jump, wasn't it? From our simple arrow spaces to spaces of functions from random sets to random vector spaces! What about function-valued functions? Yep, they still work! And we could go on and on with functions that take values in spaces of functions that take values in spaces of functions that take values etc. etc. etc., but that isn't in our interest.
Instead, what we are now curious about is what will become of our "component vectors", or in other words, how to formally define what a system of axes will be in our very abstract vector spaces. What are axes to us? We typically draw them as lines with arrow tips, so they would be oriented lines. Of course, a line will be identified by a vector on it, and the way the arrow tip points can also be incorporated into the vector. So our axes will be represented by a certain number of vectors. What properties do these vectors need to have to form a system of axes? Well, first of all, all the vectors in our spaces must be obtained by summing multiples of these vectors. In other words, our vectors must be a system of generators for our space \(V\)
Definition 4: Linear combinations and systems of generators Given a \(K\)-vector space \(V\) and some vectors \(v_1,\dotsc,v_n\in V\), we call linear combination of \(v_1,\dotsc,v_n\) any sum of the form \(a_1v_1+\dotso+a_nv_n\), with the \(a_i\) being elements of \(K\). A System of Generators for \(V\) is a set \(S\subseteq V\) such that, for any \(v\in V\), there exist an integer \(n\), vectors \(v_1,\dotsc,v_n\in S\), and scalars \(a_1,\dotsc,a_n\in K\) such that: $$v=\sum_{i=1}^na_iv_i=a_1v_1+\dotso+a_nv_n.$$
But that is not enough. For example, we all know that a plane requires two axes, but if I take three vectors, they can still be a system of generators. In fact, given any system of generators \(S\) and any vector \(v\in V\) outside of \(S\), adding \(v\) to \(S\) gives us another system of generators. But three axes in a plane is too many, you say. Why? The most natural answer that comes to mind is that we want as few generators as possible. Indeed, one possible definition of a basis, which is the linear algebraic term for a system of axes, is precisely that one.
Definition 5: basis, 1.0 A basis for a vector space \(V\) is a set \(S\subseteq V\) of generators for \(V\) that is minimal with respect to inclusion, i.e. such that taking away any element from \(S\) leaves us with something that is not a set of generators.
Another problem that would arise from three axes in a plane is that there would not be a unique way to represent a vector as a linear combination of the axis vectors. And this is another possibility for defining a basis.
Definition 6: basis, 2.0 A basis for a vector space \(V\) is a system of generators \(S\) such that, for any \(v\in V\), the choice of \(v_i\)s and coefficients \(a_i\neq0\) such that: $$v=a_1v_1+\dotso+a_nv_n$$ is unique.
There are a couple more possible definitions of basis. First of all, applying the above uniqueness condition to \(0_V\) means that, whenever we write: $$a_1v_1+\dotso+a_nv_n=0,\tag{$\ast$}$$ the \(a_i\) have to be all zero.
Definition 7: linear independence and dependence We say that \(S\subseteq V\), \(V\) being a \(K\)-vector space, is linearly independent over \(K\) if, for any \(v_1,\dotsc,v_n\in S\), we have: $$a_1v_1+\dotso+a_nv_n=0_V\iff a_i=0\,\,\forall i=1,\dotsc,n.$$ In other words, linearly independent means the condition (\(\ast\)) introduced just above as a consequence of the uniqueness condition of Definition 6. The opposite of this is linear dependence, so a set \(S\subseteq V\) is linearly dependent over \(K\) if there exist \(a_1,\dotsc,a_n\in K\) not all zero and \(v_1,\dotsc,v_n\in S\) such that: $$a_1v_1+\dotso+a_nv_n=0_V.$$
It is clear that, if \(S\) is linearly independent, any \(T\subseteq S\) is linearly independent as well, for a combination of elements of \(T\) is also one of elements of \(S\). Moreover, if \(S\) is linearly dependent, then so is any \(T\supseteq S\), for analogous reasons.
It is now time to give two more definitions of basis.
Definition 8: basis, 3.0 A basis is a set of generators \(S\) which is linearly independent.
Definition 9: basis, 4.0 A basis is a linearly independent set \(S\) which is maximal with respect to inclusion. In other words, adding any other vector to \(S\) makes it lose its linear independence.
But which definition will we take? Well, all of them, because they are equivalent.
Theorem 1: equivalence of the definitions of basis The four definitions of basis given above are equivalent, so a basis 1.0 is a basis 2.0, which is a basis 3.0, which is a basis 4.0, which is a basis 1.0.
Proof.
Suppose \(S\) is a basis 1.0, that is, a set of generators that is minimal w.r.t. inclusion. Suppose by way of contradiction that it is not a basis 2.0. Then there must be \(v\) that can be written in two ways as a linear combination of \(S\), that is: $$a_1v_1+\dotso+a_nv_n=v=b_1w_1+\dotso+b_mw_m.$$ Carrying the \(b_iw_i\) to the left, we obtain a linear combination of elements of \(S\) that is zero. The two linear combinations are supposed to differ from each other, so after carrying to the left, we must have at least one \(v_i\) or one \(w_i\) that doesn't get canceled out. What happens is either this \(v_i\) is not among the \(w_j\), or this \(w_i\) is not among the \(v_j\), or \(a_i-b_j\neq0\). In case 1, we isolate \(a_iv_i\) and get: $$a_iv_i=-\sum_{j\neq i}a_jv_i+b_1w_1+\dotso+b_mw_m,$$ and dividing by \(a_i\), which is possible since \(K\) is a field, gives me that \(v_i\) is a combination of other vectors in \(S\), meaning \(S\smallsetminus\{v_i\}\) (\(S\) with \(v_i\) taken away) is still a set of generators, meaning \(S\) was not minimal with respect to inclusion to begin with, which is a contradiction. If we are in case two, we isolate \(b_iw_i\) and conclude in the same way. In case 3, we isolate \((a_i-b_j)v_i=(a_i-b_j)w_j=a_iv_i-b_jw_j\) and conclude in the same way.
Suppose \(S\) is a basis 2.0. Then \(0_V\) can only be written as \(0v_1+\dotso\), or in other words \(S\) is linearly independent, which is exactly what we need to have a basis 3.0, since \(S\) is a set of generators by definition of basis 2.0.
Suppose \(S\) is a basis 3.0. In particular, it is a system of generators, meaning that, if \(v\notin S\), then adding \(v\) to \(S\) gives us a linearly dependent \(S\cup\{v\}\). But \(S\) is linearly independent, so it is a linearly independent set that is maximal w.r.t. inclusion, that is a basis 4.0.
Finally, assume \(S\) is a basis 4.0. Being linearly independent, none of its vectors can be a combination of other vectors in \(S\), so if it is a set of generators, it will be minimal w.r.t. inclusion. Since it is linearly independent and maximal w.r.t. inclusion, it must be a set of generators. Indeed, let \(v\notin S\). Then there are \(a_1,\dotsc,a_n,a\in K\) and \(v_1,\dotsc,v_n\in S\) such that: $$a_1v_1+\dotso+a_nv_n+av=0_V,$$ or else \(S\cup\{v\}\) would be linearly independent, and \(S\) would not be maximal. If \(a=0\), then \(S\) would be linearly dependent, which it is not. But if \(a\neq0\), then we can write: $$v=-\frac{a_1}{a}v_1-\dotso-\frac{a_n}{a}v_n,$$ so that \(v\) is a combination of vectors of \(S\), making \(S\) a system of generators and thus a basis 1.0. Our proof is then complete. \(\diamond\)
And this is the reason we required multiplicative inverses in the definition of fields: if those don't exist, we don't have the above equivalence, because it requires division.
Note that the typical drawing of a system of axes does not depict just any basis, but rather an orthonormal basis. However, to define this concept we need to introduce norms and scalar products, and that will be the topic of another post.
The dimension of a vector space is an intuitive concept that is easy to formalize.
Definition 10: dimension of a vector space Given a \(K\)-vector space \(V\), we say \(V\) has infinite dimension if there exists no finite basis. If there exists a finite basis, we say \(V\) has finite dimension, and we define \(\dim_KV\) as the number of elements of a basis of \(V\) over \(K\), and call it dimension of \(V\) over \(K\).
Note that what is a basis over \(K\) may not be a basis over a bigger field. Indeed, taking \(\mathbb C=\mathbb R^2\) as an example, the set \(\{(1,0),(0,1)\}\), or \(\{1,i\}\), is a basis for the real vector space \(\mathbb C=\mathbb R^2\), but viewing this as a vector space over \(\mathbb C\) by saying \((a+ib)(c,d)=(ac-bd,ad+bc)\) makes it so that \((0,1)=i\cdot(1,0)\), so \(\{(1,0),(0,1)\}\) is not a basis over \(\mathbb C\). Conversely, \((1,0)\) is a basis over \(\mathbb C\) but not over \(\mathbb R\). Indeed, it can be verified that, if \(V\) is a complex vector space (i.e. a vector space over \(\mathbb C\)), it is also a real vector space, and: $$\dim_{\mathbb R}V=2\dim_{\mathbb C}V.$$ That \(V\) is also a real vector space is obvious: just restrict the product by scalars to product by reals, and the sum is already there. The simplest way to prove the dimension relation is to pick a basis \(B_C=\{v_1,\dotsc,v_n\}\) over \(\mathbb C\) and prove \(B_R=\{v_1,\dotsc,v_n,iv_1,\dotsc,iv_n\}\) is a basis over \(\mathbb R\). That this is a set of generators is easy, because any combination of the \(v_i\) with complex coefficients will be a sum \((a_1+ib_1)v_1+\dotso+(a_n+ib_n)v_n\), which equals \(a_1v_1+\dotso+a_nv_n+b_1iv_1+\dotso+b_niv_n\) by distributivity and commutativity, and that is a real-coefficient combination of elements of \(B_R\), making \(B_R\) a generating set. The linear independence of \(B_R\) over \(\mathbb R\) can be equated to the linear independence of \(B_C\) over \(\mathbb C\), which holds by assumption, proving the relation.
By such arguments we can prove that, if \(V\) is a \(K\)-vector space and \(H\) is a subfield (aka a field \(H\subseteq K\) whose operations are the restrictions of those on \(K\)), and \(\dim_HK=k\), then \(\dim_HV=k\dim_KV\).
Another thing to be noted is that there do exist spaces with infinite dimension. Function spaces are the easiest example. If we take, for example, the functions from \([0,1]\) to \(\mathbb R\), denoted \(\mathbb R^{[0,1]}\), the set \(\{1,x,x^2,\dotsc\}\) is linearly independent and infinite, and Theorem 3 below makes a set like that impossible in a space with finite dimension, so this space cannot have finite dimension. Moreover, \(\dim_{\mathbb Q}\mathbb R\) is also infinite.
Theorem 2: \(\mathbb R\) is not finitely generated over \(\mathbb Q\) There exists no finite systems of generators of \(\mathbb R\) over \(\mathbb Q\).
Proof.
A theorem of abstract algebra we will deal with in a future post tells us that, if \(K\subseteq H\) are two fields with the same operations and \(H\) is finitely generated over \(K\), then all \(x\in H\) are roots of polynomials with coefficients in \(K\). This is in fact pretty easy to prove, assuming Theorem 3 below. Indeed, in that case, the set \(\{1,x,x^2,\dotsc,x^n\}\) will have to be linearly dependent for some integer \(n\), so there will exist \(a_0,a_1,\dotsc,a_n\in K\) such that: $$a_0+a_1x+a_2x^2+\dotso+a_nx^n=0,$$ meaning a polynomial with coefficients in \(K\) has a zero in \(x\), as desired. However, the Lindmann-Weierstrass theorem directly implies that \(e\) is not a root of a polynomial with rational coefficients, hence \(\mathbb R\) cannot be finitely generated over \(\mathbb Q\). Note that this fact about \(e\) can be proven more directly as shown here.
It is now time to discuss whether the dimension of a space is well-defined.
Theorem 3: Steinitz exchange principle Let \(V\) be a \(K\)-vector space. Assume \(S=\{v_1,\dotsc,v_n\}\) is linearly independent and all the \(v_i\) are linear combinations of elements of \(T=\{w_1,\dotsc,w_m\}\). Then, for every \(i\), there is a \(j\) such that swapping \(v_i\) with \(w_j\) leaves \(S\) linearly independent, that is, such that \(S\smallsetminus\{v_i\}\cup\{w_j\}=\{v_1,\dotsc,v_{i-1},w_j,v_{i+1},\dotsc,v_n\}\) is still linearly independent.
Proof.
If that were not so, then, for some \(i\), the set \(\{v_1,\dotsc,v_{i-1},w_j,v_{i+1},\dotsc,v_n\}\) would be linearly dependent for any \(j\). But then all the \(w_j\) would be combinations of the \(v_j\) with \(j\neq i\), and \(v_i\), being a combination of the \(w_j\), would also be a combination of the \(v_j\) with \(j\neq i\), making \(S\) linearly independent, which is a contradiction.
With that, we can prove the following.
Theorem 4: bases, generators, linearly independent sets Assume \(V\neq\{0\}\) is a \(K\)-vector space. Then:
Proof
To prove 1, suppose \(S\) has \(n\) elements. If \(S\) is linearly independent, then it is a basis. Otherwise, take out one element so that \(S\smallsetminus\{v_i\}=\{v_1,\dotsc,v_{i-1},v_{i+1},\dotsc,v_n\}\) is still a set of generators. If this is linearly independent, we are done. Otherwise, we iterate the procedure. After \(n\) steps, we would end up with the empty set, which generates \(\{0\}\neq V\). Hence, there must exist a point where taking away an element leaves us with something that is not a system of generators. That is, there exists \(T\subseteq S\) which is a set of generators, but such that, if we take any \(v\) out of \(T\), the resulting set \(T\smallsetminus\{v\}\) is not a system of generators. This makes \(T\) a basis 1.0, proving 1.
To prove 2, assume \(X\) is a finite set of generators with \(n\) elements. The Steinitz exchange principle tells us \(S\) must have fewer elements than \(S\). Indeed, if \(P\) is linearly independent and \(Q\) a set of \(q\) generators, and we assume \(P\) has more than \(q\) elements, by applying the principle \(q\) times we replace \(P\) with a set that is \(Q\) plus some elements of \(P\), and must be linearly independent by the Steinitz principle, but cannot be linearly independent since the elements of \(P\) are combinations of those of \(Q\). So we have a contradiction, and we conclude \(P\) can have at most \(Q\) elements. Coming back to our \(S\), if it is a set of generators then it is a basis, otherwise we can add something that is linearly independent, and obtain \(S\cup\{v\}\), a bigger linearly independent set. If this one is a set of generators, then we are done. Otherwise, we iterate the procedure. What we discussed a second ago tells us we can iterate the procedure at most as many times as the number of elements of \(X\) minus that of \(S\), so at some point, we will find \(T\supseteq S\) such that \(T\) is linearly independent, but nothing can be added to \(T\) without \(T\cup\{v\}\) turning linearly dependent. We then have \(T\) is a basis 4.0, and 2 is proved.
To prove 3, pick any two bases \(B_1,B_2\). \(B_1\) is linearly independent and \(B_2\) is a set of generators, so if \(B_1\) has \(n\) elements and \(B_2\) has \(m\) we must conclude, as seen just above, that \(n\leq m\). But then \(B_2\) is linearly independent and \(B_1\) is a set of generators, meaning \(m\leq n\). But if \(n\leq m\) and \(m\leq n\), we must conclude \(m=n\), proving 3.
This means the definition of dimension we gave is well-posed: it does not depend on the choice of a basis, but only on the space, as long as it is finite.
The zero space is a weird creature that I will leave in a limbo. It seems to be the blatant exception to "every space has a basis", because linear independence just doesn't exist in there. Since we give it dimension 0 "by intuition", it makes sense to posit that a basis of it is the empty set. This is probably why the span of the empty set is defined to be zero, beyond avoiding something that generates a non-subspace. And if you're puzzled because you've never seen the word "subspace" before, don't worry, I didn't define it yet, but will soon, and the same goes for "span" :). But 'nuff said.
Before I leave you, there is another couple things I want to tell you, and the first one has to do with why I used the name "system/set of generators". It also generalizes the concept of line and plane. The second one is a throwforward (because I cannot say throwback to the future, right?) to the next post I will publish, in a sense. Let us start by defining a vector subspace.
Definition 11: vector subspace Let \(V\) be a \(K\)-vector space. \(W\subseteq V\) is called a (vector) subspace of \(V\), a relation (an order relation, in fact) which I denote by \(W\leq V\), if, defining operations on \(W\) by restricting those of \(V\), we obtain that \(W\) is a vector space. Let me explain that more simply. We have a natural way of summing vectors in \(W\), and of multiplying them by scalars: just view them as elements of \(V\), and act on them by consequence. The problem is: does that always send us back into \(W\)? If \(W\) is a vector subspace, then yes, because vector subspaces are exactly the subsets that satisfy this. Naturally, a more explicit way to define a vector subspace is to say that:
Lemma 1: equivalent definition of vector subspace Let \(W\subseteq V\), with \(V\) a \(K\)-vector space. Then \(W\leq V\) if and only if \(W\) is nonempty and \(av+bw\in W\) for all \(a,b\in K,v,w\in W\).
Proof.
Assume \(W\leq V\). Then, if \(a,b\in K\) and \(v,w\in W\), we will have \(av,bw\in W\), and then \(av+bw\in W\). Moreover, since \(0_V\in W\), \(W\) is nonempty, proving the equivalent condition.
Assume the equivalent condition. Then, picking \(a\in K,b=0_K\), we conclude \(av+0_V\in W\) for any \(v\in W\), since any \(w\in W\) gives \(bw=0_V\), and there is at least one \(v\in W\). So property 3 is proved. Pick then \(a=b=1_K\), and since \(1_Kv=v,1_Kw=w\), we conclude \(v+w=1_Kv+1_Kw\in W\), proving property 2. Since property 1 is implied by 3 if \(W\) is nonempty, and this is a requirement, we have proved \(W\leq V\).
It is pretty straightforward to see that, if \(W\leq V\) and \(\dim W=\dim V\), and \(V\) is finite-dimensional, then \(W=V\). Indeed, pick a basis of \(W\). This has the right number of elements to be a basis of \(V\), it is contained in \(V\), and it is linearly independent, so it must be a basis of \(V\). In particular, it is a set of generators, meaning \(V\subseteq W\), but since the other inclusion is true from the starting assumption that \(W\leq V\), we have proved \(V=W\), as we wanted.
In \(\mathbb R^2\), the subspaces are \(\mathbb R^2\), \(\{0\}\) (these two being called trivial subspaces), and all the lines through the origin. In \(\mathbb R^3\), besides the trivial subspaces, we have lines and planes through the origin. \(\{a+b\sqrt2:a,b\in\mathbb Q\}\leq\mathbb R\) is a subspace of \(\mathbb R\) viewed as a \(\mathbb Q\)-vector space. Any space of polynomials with real coefficients will be a subspace of \(\mathbb R^{[0,1]}\).
One way to produce subspaces is to "generate" them from arbitrary subsets.
Definition 12: span Let \(V\) be a \(K\)-vector space and \(S\subseteq V\). The span of \(S\), or vector subspace generated by \(S\), is the smallest subspace of \(V\) containing \(S\). By convention, to avoid something that generates the empty set which is not a subspace, we set \(\operatorname{span}(\varnothing):=\{0\}\).
A concrete way of visualising this span is by considering: $$\operatorname{span}(S)=\{a_1v_1+\dotso+a_nv_n:a_i\in K,v_i\in S\}\cup\{0\},$$ the set of linear combinations of elements of \(S\) (with zero forcibly added in to make this work for \(S=\varnothing\)). Indeed, it is easy to prove that this is a subspace, and given that any subspace containing \(S\) must contain the linear combinations of elements of \(S\) along with the zero, this is indeed the smallest subspace.
A more abstract way to visualise this is by intersections. Let me prove a couple facts.
Lemma 2: subspaces, unions, intersections, spans Let \(V\) be a \(K\)-vector space. Then:
Proof.
To prove 1, pick \(a,b\in K\) and \(v,w\in W_1\cap W_2\). Then \(v,w\in W_1\) and \(v,w\in W_2\), but then by the equivalent condition in Lemma 1 we have \(av+bw\in W_1\) and \(av+bw\in W_2\), which means \(av+bw\in W_1\cap W_2\). Since \(W_1,W_2\) both contain \(0_V\), so does the intersection. Thus, the intersection satisfies the equivalent condition of Lemma 1, which makes it a subspace.
Pick \(v,w\in V\) linearly independent (assuming of course \(\dim_Kv\geq2\)). Then, \(v+w\notin\operatorname{span}\{v\}\cup\operatorname{span}(w)=\{av:a\in K\}\cup\{aw:a\in K\}\). If \(\dim_KV=1\), then there are only two subspaces, \(\{0\}\) and \(V\), so the union is again either \(\{0\}\) or \(V\), making it a subspace. There cannot be any other subspaces because, if \(W\leq V\), it must have a basis, and if that basis is the empty set the \(W=\{0\}\), otherwise it must have a one-vector basis, making it the same dimension as \(V\), and we have seen this implies \(V=W\). If \(\dim_KV=0\), then \(V=\{0\}\) and the only subspace is \(\{0\}\), so any two subspaces will always give \(V\) as their union, hence a subspace. 2 is thus proved.
As for 3, call that big intersection \(W\). We have seen that \(\operatorname{span}(S)\) is the set of combinations of elements of \(S\), plus eventually the zero. Those are contained in all subspaces containing \(S\), so that \(\operatorname{span}(S)\leq W\). On the other hand, we have seen that \(\operatorname{span}(S)\) defined that way is a subspace containing \(S\) (even when \(S\) is empty, thanks to the zero added in), and hence a member of the family being intersected to produce \(W\) so that \(W\subseteq\operatorname{span}(S)\), proving 3.
In fact, 3 is trivial if \(S=\varnothing\), for then all subspaces contain \(S\), and the only thing that is common to all subspaces is the zero vector, since it constitutes a subspace by itself.
Now we want to formalize the ideas of adding or removing axes. To do so, we define operations between subspaces (and spaces). The easiest ones are product and sum, which are essentially the same, and represent axis addition. Imagine you have a line, and want to go 2D. You add an axis, and that gives an extra dimension. This can be viewed as either a cartesian product of the two axes (each point in the plane is a pair of real numbers) or as summing (formally speaking) vectors in one and in the other axis (each point in the plane is the sum of a horizontal and a vertical vector). Of course, in the previous explanations, I was thinking of cartesian axes, but nothing prevents us from taking axes at an angle different from a right one. To be general, we define things as follows.
Definition 13: sum and product of spaces If \(W_1,W_2\leq V\), then the sum of \(W_1,W_2\) is defined as: $$W_1+W_2:=\operatorname{span}(W_1\cup W_2)=\{w+w':w\in W_1,w'\in W_2\},$$ the last equality being left as an exercise. If \(W_1\cap W_2=\{0\}\), we call their sum a direct internal sum, and denote it \(W_1\oplus W_2\). Given two \(K\)-vector spaces \(W_1,W_2\), whether or not subspaces of a single bigger \(K\)-vector space, we define their product as merely \(W_1\times W_2\) with component-wise operations. Noting that \(W_1':=\{(w,0):w\in W_1\}\) and \(W_2':=\{(0,w):w\in W_2\}\) are essentially the same as (read: isomorphic to) \(W_1,W_2\), and are subspaces of \(W_1\times W_2\) whose sum is direct, we call this direct external sum, and denote it also by \(W_1\oplus W_2\).
It is clear that, if \(S\) is a set of \(n\) generators for \(V\), then \(V=\operatorname{span}\{v_1\}+\dotso+\operatorname{span}\{v_n\}\), and that those sums are direct if \(S\) is a basis. In general, the span of a set \(S\) is the sum of the spans of its elements. (What if \(S\) is infinite? Shhh! An infinite sum or product is essentially the same thing, but you make sure the sums you're working with are finite, and read your tuples as functions to generalize the products, requiring those functions be nonzero only on a finite number of elements, but we'll keep things finite-dimensional here.)
Another simple thing to note is that, if \(W_1+W_2\) is a direct internal sum, any \(v\in W_1+W_2\) can be written as \(w_1+w_2\) with \(w_i\in W_i\) in a unique way. Indeed, we know that there are \(w_1'\in W_1,w_2'\in W_2,a,b\in K\) such that \(v=aw_1'+bw_2'\) by definition of sum. Since \(aw_1'=:w_1\in W_1\) and \(bw_1'=:w_2\in W_2\), because the \(W_i\) are subspaces, we have found a way to write \(v\) as desired. That such a way to write \(v\) is unique is simple to prove. Indeed, suppose \(w_1+w_2=w_1'+w_2'\) with \(w_i,w_i'\in W_i\). Then, by rearranging, \(w_1-w_1'=w_2'-w_2\). The RHS is in \(W_2\) and the LHS in \(W_1\), so they must both be in the intersection. But \(W_1\cap W_2=\{0\}\), so \(w_1=w_1',w_2=w_2'\), proving the uniqueness we stated. This property can, in fact, be taken as a definition of direct sum. In fact, one can define \(W_1+W_2=\{w_1+w_2:w_i\in W_i\}\), and the result is the same as with our definition, and then require uniqueness as the definition of direct sum, and this gives exactly the same result as our definition. This alternate set of definitions is chosen by Serge Lang in Linear Algebra.
Yet another simple thing to prove is the following fact (should be a theorem really, but renumbering all other theorems is something I definitely don't want to do :) ).
Fact 1: Existence of complements. Let \(V\) be a \(K\)-vector space and \(W\leq V\). Then, there exists \(U\leq V\) such that \(V=U\oplus W\) (direct internal sum). In other words, every subspace \(W\) has a complement \(U\).
Proof.
Let \(B_W=\{w_1,\dotsc,w_n\}\) be a basis of \(W\). We can complete it to a basis \(B_V=\{w_1,\dotsc,w_n,v_1,\dotsc,v_m\}\) of \(V\), where \(m=\dim_KV-\dim_KW\). We have seen that this is possible above. Take \(U:=\operatorname{span}\{v_1,\dotsc,v_m\}\). It is clear that \(V=W+U\), because if \(v\in V\), then \(v=a_1w_1+\dotso+a_nw_n+b_1v_1+\dotso+b_mv_m\), a combination of the \(w_i\in W\) and the \(v_i\in U\). If \(U\cap W\neq\{0\}\), then there is: $$a_1w_1+\dotso+a_nw_n=v=b_1v_1+\dotso+b_mv_m,$$ but carrying the \(w_i\) to the LHS we get a zero combination of \(B_V\), which is linearly independent by assumption, implying \(a_1=\dotso=a_n=b_1=\dotso=b_m=0\), so \(v=0\), which proves our point. \(\diamond\)
Defining a quotient is a little trickier. We have \(V,W\) with \(W\leq V\). We pick \(v\in V\), and consider \([v]_W:=\{v'\in V:v-v'\in W\}\). We will see in the next post that this partitions \(V\) into disjoint subsets. Moreover, if we define \(a[v]_W:=[av]_W\) and \([v]_W+[w]_W:=[v+w]_W\), it is easy to see that we turn the set \(\{[v]_W:v\in V\}\) into a \(K\)-vector space, where \(V\) is a \(K\)-vector space. Essentially, we are thinking of removing an axis (or more axes) as squishing along a subspace. Imagine you have 3D space, you choose a plane, and you squish your space along it. That plane collapses to zero, as do all planes parallel to it, and you end up with a line. Of course, you can squish along a line, and you can squish a plane too. In general, you can squish any vector space along any of its subspaces. This is what a quotient does.
Definition 14: quotient of vector spaces Given \(W\leq V\), we define \(\frac{V}{W}\) to be the set \(\{[v]_W:v\in V\}\) with the operations defined above, and call it quotient space of \(V\) and \(W\).
Picturing quotients as squishing, you may ask: if I have \(V\) and two subspaces \(W,T\), is squishing along \(T+W\) the same as squishing along one and then along the other? To formalize this question and answer positively, you must first realize that what you are thinking of is to squish along e.g. \(W\) and then along the squished version of \(T\). We then have the following statement.
Successive Squishing Lemma Let \(V\) be a \(K\)-vector space and \(W,T\leq V\). Then: $$\frac{V}{T+W}\cong\frac{\frac{V}{W}}{\frac{T}{W\cap T}}.$$
\(\cong\) means "isomorphic to", and we will define that notion in another post, so the proof of this lemma will have to wait until then, but it is exactly what we expected. We will see that the usual arithmetic rules apply to these operations, modulo what is called "isomorphism", but that is a topic for a future post.
Another question that presents itself is the following. Imagine you pick a vector space \(V\) and \(T,W\leq V\) such that \(T\leq W\). It is clear enough that \(\frac{T}{W}\) is identified with a subset of \(\frac{V}{W}\), precisely the set of classes \([w]_T:w\in W\). It is also clear that this is a subspace, because \(\alpha[w]_T+\beta[w']_T=[\alpha w+\beta w']_T\in\frac{T}{W}\). In other words, squishing subspaces gives subspaces of the squished space. Is the converse true? That is, can we "un-squish" a subspace? Obviously, we will have to add a condition, because to go from a class to a vector in the original space we have to choose which representatives are fine. The easiest thing is to say all representatives are fine to us: the biggest un-squishing possible. And we have the following.
Un-squishing Lemma There is a bijection between subspaces \(W\leq V\) such that \(T\leq W\) and subspaces of \(\frac{V}{T}\), and this is given by \(W\mapsto\frac{W}{T}\). That is, the squishing map that, taking a subspace \(W\) containing \(T\), gives me its squished version is one-to-one and onto (read: bijective, invertible, injective and surjective, or whatever you call this), so that each subspace of \(\frac{V}{T}\) can be un-squished.
And this is relatively simple: the inverse of the map \(W\mapsto\frac{W}{T}\) is the map \(W\leq\frac{V}{T}\mapsto W\oplus T\), where \(\oplus\) is meant as a direct external sum. Of course, this is again modulo isomorphism, but you can picture this as taking all sums of squished vectors with vectors of \(T\), and then it is easy to see that this is the inverse we are looking for. To be more formal, you fix a set of representatives for the equivalence classes of \(W\), and then consider the set of vectors that are sums of an element of \(T\) and one of these representatives, and this is how you get \(W\oplus T\).
Let us now conclude by seeing how dimension behaves with respect to quotients and sums.
Theorem 5: sums and quotients vs. dimension Assume \(V\) is a finite-dimensional \(K\)-vector space. Then:
If \(W\leq V\), then \(\dim\frac VW=\dim V-\dim W\).
Proof.
To prove 1, suppose by contradiction that there exists an infinite linearly independent set \(S\subseteq W\). But then \(S\subseteq V\), and we have seen that this implies \(S\) must have fewer elements than a basis of \(V\), which contradicts the fact that it is infinite. Moreover, choosing a basis for \(W\), its linear independence implies it has fewer elements than one of \(V\), thus implying \(\dim_KW\leq\dim_KV\).
To prove 2, choose \(B_T=\{t_1,\dotsc,t_n\}\) a basis of \(W_1\cap W_2\), with \(n=\dim_K(W_1\cap W_2)\). We know it exists because \(V\) is finite-dimensional, and thus so are its subspaces, as just seen. Call \(T=W_1\cap W_2\). If \(T=W_1\), we have \(W_1\leq W_2\), so \(W_1+W_2=W_2\), and \(\dim_KW_2=\dim_KW_2+\dim_KW_1-\dim_KW_1=\dim_KW_2+\dim_KW_1-\dim_KT\). The case \(T=W_2\) is handled similarly. Suppose then \(T\neq W_1\) and \(T\neq W_2\). \(B_T\) is linearly independent in \(W_1\), so we can complete it to a basis of \(W_1\) as discussed in Theorem 4, obtaining \(B_1=\{t_1,\dotsc,t_,w_1,\dotsc,w_{m-n}\}\), where \(m=\dim_KW_1\). Analogously, setting \(k=\dim_KW_2\), we can complete \(B_T\) to a basis \(B_2=\{t_1,\dotsc,t_n,w'_1,\dotsc,w'_{k-n}\}\) of \(W_2\) The vectors \(\{t_1,\dotsc,t_n,w_1,\dotsc,w_{m-n},w'_1,\dotsc,w'_{k-n}\}\) clearly generate \(W_1+W_2\). Their number is the one required by the statement, i.e. \(\dim_KW_1+\dim_KW_2-\dim_K(W_1\cap W_2)\), so to conclude the proof we need to show they are linearly independent. Suppose then that: $$a_1t_1+\dotso+a_nt_n+b_1w_1+\dotso+b_{m-n}w_{m-n}+c_1w'_1+\dotso+c_{k-n}w'_{k-n}=0.$$ Carrying the \(w'\) over to the right, we obtain that: $$a_1t_1+\dotso+a_nt_n+b_1w_1+\dotso+b_{m-n}w_{m-n}=-c_1w'_1-\dotso-c_{k-n}w'_{k-n}. \tag{\(\ast\)}$$ The RHS is in \(W_2\), and the LHS is in \(W_1\), so both are in the intersection. Then, we can write: $$-c_1w'_1-\dotso-c_{k-n}w'_{k-n}=d_1t_1+\dotso+d_nt_n,$$ since the \(t\) are a basis for \(W_1\cap W_2\). Carrying the \(w'\) to the right, we obtain a zero combination of elements of \(B_2\), but this is a basis of \(W_2\), meaning \(c_1=\dotso=c_{k-n}=d_1=\dotso=d_n=0\). This also implies \(a_1=\dotso=a_n=b_1=\dotso=b_{m-n}=0\) because, given the RHS of \((\ast)\) is zero, they form a zero combination of elements of \(B_1\) which is a basis. This proves that \(\{t_1,\dotsc,t_n,w_1,\dotsc,w_{m-n},w'_1,\dotsc,w'_{k-n}\}\) is indeed linearly independent, and completes the proof of 2.
To prove 3, pick a basis \(B_W=\{w_1,\dotsc,w_n\}\) of \(W\), where \(n=\dim_KW\). As seen in Theorem 4, we can complete it to a basis \(B_V=\{w_1,\dotsc,w_n,v_1,\dotsc,v_{m-n}\}\) of \(V\), with \(m=\dim_KV\). I claim that \(\{[v_1]_W,\dotsc,[v_{m-n}]_W\}\) is in fact a basis of \(\frac VW\), which gives me my dimensional statement. Indeed, given any class \([v]_W\), an element \(v\) in it will be \(a_1w_1+\dotso+a_nw_n+b_1v_1+\dotso+b_{m-n}v_{m-n}\), since those vectors are a basis of \(V\). But then: $$[v]_W=[a_1w_1+\dotso+a_nw_n+b_1v_1+\dotso+b_{m-n}v_{m-n}]_W=a_1[w_1]_W+\dotso+a_n[w_n]_W+b_1[v_1]_W+\dotso+b_{m-n}[v_n]_W=b_1[v_1]_W+\dotso+b_{m-n}[v_n]_W,$$ since \([w_1]_W=\dotso=[w_n]_W=0_{\frac VW}\) because the \(w_i\) are in \(W\) and \([w]_W=[0]_W=0_{\frac VW}\) for any \(w\in W\). So these are generators. Moreover, no \([v_i]_W\) is a combination of the others. Indeed, assume the contrary. Then we will need: $$a_1[v_1]_W+\dotso+a_{i-1}[v_{i-1}]_W+a_{i+1}[v_{i+1}]_W+\dotso+a_{m-n}[v_{m-n}]_W-[v_i]_W=0,$$ which equates to: $$a_1v_1+\dotso+a_{i-1}v_{i-1}+a_{i+1}v_{i+1}+\dotso+a_{m-n}v_{m-n}-v_i=b_1w_1+\dotso+b_nw_n,$$ but carrying the \(w\) over to the left this gives a zero combination of the elements of \(B_V\) with one coefficient equal to \(-1\), which contradicts the fact that \(B_V\) is a basis. Hence, the proof is complete.
And on these notes, I bid you goodbye and suggest: stay tuned for more!
Vectors. Many of you will remember these guys from High School Physics, when they were a model for forces and other quantities. Remember those little arrows having a point of application, representing what was pushed, a direction, which was the term for the line the motion took place along, a way, which was the way the object was pushed in on that line, and a modulus, representing the strength of the push? Well those 4 things were not the whole story. Sure, a single vector was identified by these, but operations could be done on vectors. Indeed, just with those 4 things, it is natural to define \(av\), for \(a\geq0\) and \(v\) a vector, as the vector with the same point of application, direction, and way, but having modulus \(a\) times the modulus of \(v\): \(|av|=a|v|\). It is also pretty natural to define multiplication by \(-1\) as changing a vector's way and keeping all the other properties fixed. This way, we have constructed a product by a scalar, as the mathematical term goes, which is something that, given a real number and a vector, "multiplies them" into another vector, or formally, calling \(V\) the space of vectors, is a map \(\cdot:\mathbb R\times V\to V\) sending \((a,v)\mapsto\cdot(a,v)=a\cdot v\). Vectors could also be summed together, if the application points coincided. Let us consider only vectors with the same application point. Then we have a sum defined on them, which is formally a map \(+:V\times V\to V\) which takes \((v,w)\mapsto+(v,w)=v+w\). But there is more: the sum has special properties, and it has special relations with the product by a scalar. Let me now give the definition of a real vector space in linear algebra.
Definition 1: real vector space A real vector space is a set \(V\) with a binary operation \(+:V\times V\to V\) and an external operation \(\cdot:\mathbb R\times V\to V\) which satisfy the following for any \(a,a_1,a_2\in\mathbb R,v,v_1,v_2,v_3\in V\):
- \((v_1+v_2)+v_3=v_1+(v_2+v_3)\) (associativity of the sum);
- There exists \(0_V\in V\) such that, for any \(v\in V\), \(v+0_V=0_V+v=v\) (existence of neuter element for the sum);
- For any \(v\) we have \(\hat v\in V\) such that \(v+\hat v=\hat v+v=0_V\) (existence of opposite elements);
- \(v_1+v_2=v_2+v_1\) (commutativity of the sum);
- \((a_1a_2)v=a_1(a_2v)=a_2(a_1v)\) (product by scalars and product of scalars);
- \((a_1+a_2)v=a_1v+a_2v\) (distributivity of the product by scalars w.r.t. the sum);
- \(a(v_1+v_2)=av_1+av_2)\) (distributivity of the sum w.r.t. the product by scalars);
- \(0v=0_V\) and \(1v=v\) for all \(v\) (behaviour of 0 and 1 in the product by scalars).
Our high school vectors clearly satisfied all these properties. Of course, two questions arise when seeing this definition:
- What became of those four characteristics of vectors?
- What does the "real" in "real vector space" mean?
As for question 2, the point is we are implicitly relying on \(\mathbb R\) to give us numbers to multiply the vectors by. Why fix ourselves on \(\mathbb R\)? Or in other words, what do we need our set \(K\) of "numbers", typically called scalars, to have defined on it in order to be content? Let's look at those properties.
- Distributivity of the product by scalars with respect to the sum requires a binary operation to be defined on \(K\);
- Asociativity of the sum makes associativity of the sum on \(K\) a natural requirement, because: $$[(a+b)+c]v=(a+b)v+cv=(av+bv)+cv=av+(bv+cv)=av+(b+c)v=[a+(b+c)]v,$$ and we probably do not want to have two distinct numbers such that \(\alpha v=\beta v\) for all \(v\in V\);
- Commutativity of the sum makes the commutativity of the sum on \(K\) a natural request, for reasons similar to the above;
- Property 5 requires a product on \(K\), and makes commutativity of that product a natural requirement;
- Distributivity of the sum w.r.t. the product by scalars makes distributivity of the product in \(K\) w.r.t. the sum in \(K\) a natural requirement;
- The last of those properties tells us we must have "a 1" and "a 0" in \(K\); the 0, which we will call \(0_K\), will be a neuter element for the sum in \(K\); the 1 will instead be the neuter element for the multiplication in \(K\), because property 5 gives: $$(a1)v=a(1v)=av,$$ and again we want to avoid distinct numbers like the \(\alpha,\beta\) of item 2;
- The existence of opposites in \(V\) makes it natural to require that they exist in \(K\).
\(\mathbb R\) has another property that will come in handy, and which we require for no apparent reason: that if \(a\neq0\) there exists \(\hat a\) such that \(a\hat a=\hat aa=1\). With that, we define what a field is.
Definition 2: Field A field is a triple \((K,+,\cdot)\) where \(+,\cdot:K\times K\to K\) satisfy the following:
- Associativity of the sum: \((a+b)+c=a+(b+c)\) for all \(a,b,c\in K\);
- Neuter element of the sum: there exists \(0_K\in K\) such that \(a+0_K=0_K+a=a\) for all \(a\in K\);
- Existence of opposites: for \(a\in K\), there exists \(-a\in K\) such that \(a+(-a)=(-a)+a=0_K\);
- Commutativity of the sum: \(a+b=b+a\) for all \(a,b\in K\);
- Associativity of the product: \((ab)c=a(bc)\) for \(a,b,c\in K\);
- Neuter element for the product: there exists \(1_K\in K\) such that \(a1_K=1_Ka=a\) for all \(a\in K\);
- Existence of inverses: if \(a\neq0\), there exists \(a^{-1}\in K\) such that \(aa^{-1}=a^{-1}a=1_K\);
- Commutativity of the product: \(ab=ba\) for \(a,b\in K\);
- Distributivities: \((a+b)c=ac+bc\) and \(a(b+c)=ab+ac\) for all \(a,b,c\in K\).
Let me give you a few quick remarks:
- Both in vector spaces and fields, the neuter element for the sum is unique; indeed, suppose there were two, \(0,0'\); then \(0=0+0'=0'\), so they coincide;
- An analogous argument proves the neuter element for the product is unique: \(1=1\cdot1'=1'\);
- Opposites are unique, since \((-a)'=(-a)'+0=(-a)'+(a+(-a))=((-a)'+a)+(-a)=0+(-a)=-a\);
- Inverses are unique: \((a^{-1})'=(a^{-1})'1=(a^{-1})'(aa^{-1})=[(a^{-1})'a]a^{-1}=a^{-1}\);
- If we drop the requirement of commutativity of the product and of the existence of inverses from the definition of field, we obtain the definition of ring with unit, and if we modify property 5 of vector spaces to not make it look like it requires commutativity of the product, we can construct objects that are like vector spaces but rely on rings instead of fields for their scalars; these things are called modules, and will be explored in another post;
- The opposite of a vector \(v\) is \((-1_K)v\) with \(-1_K\) being the opposite of \(1_K\); indeed \(v+(-1_K)v=1_Kv+(-1_K)v=(1_K+(-1_K))v=0_Kv=0_V\).
If you think back to middle school, you will probably find a set of numbers that is a field, though you never called it with that name: I'm talking about fractions, the ratios of integers, which we know how to multiply and sum, which only need flipping to be inverted, and where the operations are the same as in \(\mathbb R\) so all the required properties are obviously satisfied. So here is a field different from \(\mathbb R\): \(\mathbb Q\).
Physicists and mathematicians will certainly have met another example of field, still probably not calling it a field. I'm talking about the set of complex numbers. The complex numbers arise in Maths from the inconvenience of not being able to solve certain polynomial equations, most notably \(x^2+1=0\). We all know that "minus times minus equals plus", aka "a negative times a negative equals a positive", so \(x^2=-1\) cannot be solved in \(\mathbb R\). However, studying fields in Algebra, it turns out the so-called algebraically closed fields, which are fields where all polynomials are either constant or have at least one zero, have some convenient properties. Thus, we want to build an algebraically closed "superfield" of \(\mathbb R\). How do we do this? Well, the first step is inventing this weird guy called \(i\), the imaginary unit, which is defined to satisfy \(i^2=-1\). We then define \(\mathbb C:=\{a+bi:a,b\in\mathbb R\}\), with sums and products working as is natural. Turns out this is indeed a field, it contains \(\mathbb R\) by definition, and surprise surprise, it is algebraically closed. And note that I'm not just saying every polynomial with real coefficients has at least one zero in \(\mathbb C\), but also polynomials with complex coefficients (that is, coefficients in \(\mathbb C\)) have at least one zero in \(\mathbb C\), provided they are not constant. This is the content of the so-called Fundamental Theorem of Algebra, which will be proved in another post. In any case, this field \(\mathbb C\) is called the complex field and its elements are the complex numbers.
This field will come in very handy later in the series, when we discuss the spectral theorem. There is also another way to construct \(\mathbb C\). We take \(\mathbb R^2\) as our starting set. We define: $$(a,b)+(c,d)=(a+c,b+d),\qquad\qquad(a,b)\cdot(c,d)=(ac-bd,ad+bc).$$ This produces a field (you can verify it as an exercise). Now define: $$f(a,b)=a+ib.$$ This is a map from \(\mathbb R^2\) to \(\mathbb C\), and it respects both product and sum: \begin{align*} f((a,b)+(c,d))={}&f(a+c,b+d)=a+c+i(b+d)=(a+ib)+(c+id)=f(a,b)+f(c,d); \\ f((a,b)\cdot(c,d))={}&f(ac-bd,ad+bc)=ac-bd+i(ad+bc)=(a+ib)\cdot(c+id)=f(a,b)\cdot f(c,d). \end{align*} This tells us the two ways of constructing \(\mathbb C\) give us essentially the same thing, algebraically speaking.
If we do the same thing to \(\mathbb Q\), we get \(A=\{a+ib:a,b\in\mathbb Q\}\). This is also a field with the operations defined like those of \(\mathbb C\), but it is not algebraically closed, as \(\sqrt2\notin A\) and it is a solution of \(x^2-2=0\). There is a field between \(\mathbb Q\) and \(\mathbb C\) which is algebraically closed and does not contain all of \(\mathbb R\), but I will avoid going into that. Be content with knowing that it's called field of algebraic numbers, consists of precisely the complex number that are solutions to polynomial equations with rational coefficients, and contains this field \(A\) we introduced just now.
To conclude our examples of fields, we have the basic finite fields. Let me start with a special case. Consider a set of two elements, \(a,b\), and define: $$a+a=a,\,\,a+b=b,\,\,b+a=b,\,\,b+b=a,\qquad a\cdot a=a,\,\,a\cdot b=a,\,\,b\cdot a=a,\,\,b\cdot b=b.$$ This produces a field. The elements are typically called 0 and 1, because \(a\) is the neuter element of the sum and \(b\) that of the product, and this is the field \(\mathbb F_2\), or \(\mathbb Z/2\mathbb Z\). I will get into that notation in a future post about algebra.
Taking any prime \(p\), we can define a field called \(\mathbb F_p\) or \(\mathbb Z/p\mathbb Z\). Consider the set \(\{0,1,2,\dotsc,p-1\}\). We can easily restrict the operations of \(\mathbb R\) to it, but some sums will go out of that set. To fix this problem, we decide that, if that happens, we take the result, and take away as many \(p\)s as is needed to get back into that set. For anyone familiar with modular arithmetic, we are reducing the results modulo \(p\). For example, with \(p=3\), the operations become: \begin{align*} 0+0=0 && 0+1=1 && 0+2=2 \\ 1+0=1 && 1+1=2 && 1+2=0 \\ 2+0=2 && 2+1=0 && 2+2=1 \\ 0\cdot0=0 && 0\cdot1=0 && 0\cdot2=0 \\ 1\cdot0=0 && 1\cdot1=1 && 1\cdot2=2 \\ 2\cdot0=0 && 2\cdot1=2 && 2\cdot2=1. \end{align*} All of these are actually examples of a much broader class of quotient sets where the starting set has an algebraic structure that can easily induce an analogous one onto the quotient, but we will deal with those creatures in Algebra posts later on.
Now let's get back to vector spaces. First of all, we observe that our arrows can actually be identified with their tips, so that they reduce to points on a plane or in space. Indeed, high school also taught us to decompose vectors as sums of "component vectors", which give us the coordinates of the tips. So we reduced our arrows to \(\mathbb R^2\) (plane) or \(\mathbb R^3\) (3D space), with the sum done component by component, and the product by scalar also done component-wise. But why limit ourselves to exponents 2 and 3? It is clear enough that, given any integer \(n\), \(\mathbb R^n\), with the operations defined the same way, is a real vector space. By the way, \(\mathbb R\) is also a real vector space.
We have introduced different fields, so we would like examples of vector spaces using other fields as scalars. First, we need something to call such weird beings.
Definition 3: Vector space over \(K\). Given a field \(K\), a vector space over \(K\), or \(K\)-vector space, is a set \(V\) with operations \(+:V\times V\to V,\cdot:K\times V\to V\) which satisfy all properties of definition 1 if \(\mathbb R\) is replaced by \(K\) anywhere it appears.
At this point, it will be clear that, for any field \(K\) and integer \(n\), \(K^n\) with component-wise operations will be a \(K\)-vector space. We will soon see that these are not just the obvious examples, but for now let me leave it at that.
A more exotic idea is to view \(\mathbb R\) as a vector space not over itself, but over \(\mathbb Q\). After all, real numbers have a sum with the needed properties, and the product of a rational by a real will give the rest of the properties of vector spaces.
To get more creative, we can consider \(K^{n\times m}\) and rearrange the entries into a table with \(n\) rows and \(m\) columns or viceversa. The first choice will give us the space \(M_{n\times m}(K)\), and the second one \(M_{m\times n}(K)\). Both of these are vector spaces over \(K\).
The most exotic but very widely used idea I can think of is that of functions spaces. At its simplest, we can easily see how the set \(\{f:\mathbb R\to\mathbb R\}\) can be made into a vector space over \(\mathbb R\) by defining: $$(f+g)(x)=f(x)+g(x),\qquad\qquad(af)(x)=af(x).$$ Of course, we don't really need the functions to be defined on \(\mathbb R\): any set \(X\) will do. Such vector spaces will be denoted \(\mathbb R^X=\{f:X\to\mathbb R\}\). And again, if we take functions that take their values on another field \(K\), we will get a \(K\)-vector field. Naturally, we can have our functions take values not in \(K\), but in any other \(K\)-vector space, and still get a \(K\)-vector space of functions. The typical thing done in more advanced maths is to consider functions with special properties, but we will get to that in future posts.
That was quite a jump, wasn't it? From our simple arrow spaces to spaces of functions from random sets to random vector spaces! What about function-valued functions? Yep, they still work! And we could go on and on with functions that take values in spaces of functions that take values in spaces of functions that take values etc. etc. etc., but that isn't in our interest.
Instead, what we are now curious about is what will become of our "component vectors", or in other words, how to formally define what a system of axes will be in our very abstract vector spaces. What are axes to us? We typically draw them as lines with arrow tips, so they would be oriented lines. Of course, a line will be identified by a vector on it, and the way the arrow tip points can also be incorporated into the vector. So our axes will be represented by a certain number of vectors. What properties do these vectors need to have to form a system of axes? Well, first of all, all the vectors in our spaces must be obtained by summing multiples of these vectors. In other words, our vectors must be a system of generators for our space \(V\)
Definition 4: Linear combinations and systems of generators Given a \(K\)-vector space \(V\) and some vectors \(v_1,\dotsc,v_n\in V\), we call linear combination of \(v_1,\dotsc,v_n\) any sum of the form \(a_1v_1+\dotso+a_nv_n\), with the \(a_i\) being elements of \(K\). A System of Generators for \(V\) is a set \(S\subseteq V\) such that, for any \(v\in V\), there exist an integer \(n\), vectors \(v_1,\dotsc,v_n\in S\), and scalars \(a_1,\dotsc,a_n\in K\) such that: $$v=\sum_{i=1}^na_iv_i=a_1v_1+\dotso+a_nv_n.$$
But that is not enough. For example, we all know that a plane requires two axes, but if I take three vectors, they can still be a system of generators. In fact, given any system of generators \(S\) and any vector \(v\in V\) outside of \(S\), adding \(v\) to \(S\) gives us another system of generators. But three axes in a plane is too many, you say. Why? The most natural answer that comes to mind is that we want as few generators as possible. Indeed, one possible definition of a basis, which is the linear algebraic term for a system of axes, is precisely that one.
Definition 5: basis, 1.0 A basis for a vector space \(V\) is a set \(S\subseteq V\) of generators for \(V\) that is minimal with respect to inclusion, i.e. such that taking away any element from \(S\) leaves us with something that is not a set of generators.
Another problem that would arise from three axes in a plane is that there would not be a unique way to represent a vector as a linear combination of the axis vectors. And this is another possibility for defining a basis.
Definition 6: basis, 2.0 A basis for a vector space \(V\) is a system of generators \(S\) such that, for any \(v\in V\), the choice of \(v_i\)s and coefficients \(a_i\neq0\) such that: $$v=a_1v_1+\dotso+a_nv_n$$ is unique.
There are a couple more possible definitions of basis. First of all, applying the above uniqueness condition to \(0_V\) means that, whenever we write: $$a_1v_1+\dotso+a_nv_n=0,\tag{$\ast$}$$ the \(a_i\) have to be all zero.
Definition 7: linear independence and dependence We say that \(S\subseteq V\), \(V\) being a \(K\)-vector space, is linearly independent over \(K\) if, for any \(v_1,\dotsc,v_n\in S\), we have: $$a_1v_1+\dotso+a_nv_n=0_V\iff a_i=0\,\,\forall i=1,\dotsc,n.$$ In other words, linearly independent means the condition (\(\ast\)) introduced just above as a consequence of the uniqueness condition of Definition 6. The opposite of this is linear dependence, so a set \(S\subseteq V\) is linearly dependent over \(K\) if there exist \(a_1,\dotsc,a_n\in K\) not all zero and \(v_1,\dotsc,v_n\in S\) such that: $$a_1v_1+\dotso+a_nv_n=0_V.$$
It is clear that, if \(S\) is linearly independent, any \(T\subseteq S\) is linearly independent as well, for a combination of elements of \(T\) is also one of elements of \(S\). Moreover, if \(S\) is linearly dependent, then so is any \(T\supseteq S\), for analogous reasons.
It is now time to give two more definitions of basis.
Definition 8: basis, 3.0 A basis is a set of generators \(S\) which is linearly independent.
Definition 9: basis, 4.0 A basis is a linearly independent set \(S\) which is maximal with respect to inclusion. In other words, adding any other vector to \(S\) makes it lose its linear independence.
But which definition will we take? Well, all of them, because they are equivalent.
Theorem 1: equivalence of the definitions of basis The four definitions of basis given above are equivalent, so a basis 1.0 is a basis 2.0, which is a basis 3.0, which is a basis 4.0, which is a basis 1.0.
Proof.
Suppose \(S\) is a basis 1.0, that is, a set of generators that is minimal w.r.t. inclusion. Suppose by way of contradiction that it is not a basis 2.0. Then there must be \(v\) that can be written in two ways as a linear combination of \(S\), that is: $$a_1v_1+\dotso+a_nv_n=v=b_1w_1+\dotso+b_mw_m.$$ Carrying the \(b_iw_i\) to the left, we obtain a linear combination of elements of \(S\) that is zero. The two linear combinations are supposed to differ from each other, so after carrying to the left, we must have at least one \(v_i\) or one \(w_i\) that doesn't get canceled out. What happens is either this \(v_i\) is not among the \(w_j\), or this \(w_i\) is not among the \(v_j\), or \(a_i-b_j\neq0\). In case 1, we isolate \(a_iv_i\) and get: $$a_iv_i=-\sum_{j\neq i}a_jv_i+b_1w_1+\dotso+b_mw_m,$$ and dividing by \(a_i\), which is possible since \(K\) is a field, gives me that \(v_i\) is a combination of other vectors in \(S\), meaning \(S\smallsetminus\{v_i\}\) (\(S\) with \(v_i\) taken away) is still a set of generators, meaning \(S\) was not minimal with respect to inclusion to begin with, which is a contradiction. If we are in case two, we isolate \(b_iw_i\) and conclude in the same way. In case 3, we isolate \((a_i-b_j)v_i=(a_i-b_j)w_j=a_iv_i-b_jw_j\) and conclude in the same way.
Suppose \(S\) is a basis 2.0. Then \(0_V\) can only be written as \(0v_1+\dotso\), or in other words \(S\) is linearly independent, which is exactly what we need to have a basis 3.0, since \(S\) is a set of generators by definition of basis 2.0.
Suppose \(S\) is a basis 3.0. In particular, it is a system of generators, meaning that, if \(v\notin S\), then adding \(v\) to \(S\) gives us a linearly dependent \(S\cup\{v\}\). But \(S\) is linearly independent, so it is a linearly independent set that is maximal w.r.t. inclusion, that is a basis 4.0.
Finally, assume \(S\) is a basis 4.0. Being linearly independent, none of its vectors can be a combination of other vectors in \(S\), so if it is a set of generators, it will be minimal w.r.t. inclusion. Since it is linearly independent and maximal w.r.t. inclusion, it must be a set of generators. Indeed, let \(v\notin S\). Then there are \(a_1,\dotsc,a_n,a\in K\) and \(v_1,\dotsc,v_n\in S\) such that: $$a_1v_1+\dotso+a_nv_n+av=0_V,$$ or else \(S\cup\{v\}\) would be linearly independent, and \(S\) would not be maximal. If \(a=0\), then \(S\) would be linearly dependent, which it is not. But if \(a\neq0\), then we can write: $$v=-\frac{a_1}{a}v_1-\dotso-\frac{a_n}{a}v_n,$$ so that \(v\) is a combination of vectors of \(S\), making \(S\) a system of generators and thus a basis 1.0. Our proof is then complete. \(\diamond\)
And this is the reason we required multiplicative inverses in the definition of fields: if those don't exist, we don't have the above equivalence, because it requires division.
Note that the typical drawing of a system of axes does not depict just any basis, but rather an orthonormal basis. However, to define this concept we need to introduce norms and scalar products, and that will be the topic of another post.
The dimension of a vector space is an intuitive concept that is easy to formalize.
Definition 10: dimension of a vector space Given a \(K\)-vector space \(V\), we say \(V\) has infinite dimension if there exists no finite basis. If there exists a finite basis, we say \(V\) has finite dimension, and we define \(\dim_KV\) as the number of elements of a basis of \(V\) over \(K\), and call it dimension of \(V\) over \(K\).
Note that what is a basis over \(K\) may not be a basis over a bigger field. Indeed, taking \(\mathbb C=\mathbb R^2\) as an example, the set \(\{(1,0),(0,1)\}\), or \(\{1,i\}\), is a basis for the real vector space \(\mathbb C=\mathbb R^2\), but viewing this as a vector space over \(\mathbb C\) by saying \((a+ib)(c,d)=(ac-bd,ad+bc)\) makes it so that \((0,1)=i\cdot(1,0)\), so \(\{(1,0),(0,1)\}\) is not a basis over \(\mathbb C\). Conversely, \((1,0)\) is a basis over \(\mathbb C\) but not over \(\mathbb R\). Indeed, it can be verified that, if \(V\) is a complex vector space (i.e. a vector space over \(\mathbb C\)), it is also a real vector space, and: $$\dim_{\mathbb R}V=2\dim_{\mathbb C}V.$$ That \(V\) is also a real vector space is obvious: just restrict the product by scalars to product by reals, and the sum is already there. The simplest way to prove the dimension relation is to pick a basis \(B_C=\{v_1,\dotsc,v_n\}\) over \(\mathbb C\) and prove \(B_R=\{v_1,\dotsc,v_n,iv_1,\dotsc,iv_n\}\) is a basis over \(\mathbb R\). That this is a set of generators is easy, because any combination of the \(v_i\) with complex coefficients will be a sum \((a_1+ib_1)v_1+\dotso+(a_n+ib_n)v_n\), which equals \(a_1v_1+\dotso+a_nv_n+b_1iv_1+\dotso+b_niv_n\) by distributivity and commutativity, and that is a real-coefficient combination of elements of \(B_R\), making \(B_R\) a generating set. The linear independence of \(B_R\) over \(\mathbb R\) can be equated to the linear independence of \(B_C\) over \(\mathbb C\), which holds by assumption, proving the relation.
By such arguments we can prove that, if \(V\) is a \(K\)-vector space and \(H\) is a subfield (aka a field \(H\subseteq K\) whose operations are the restrictions of those on \(K\)), and \(\dim_HK=k\), then \(\dim_HV=k\dim_KV\).
Another thing to be noted is that there do exist spaces with infinite dimension. Function spaces are the easiest example. If we take, for example, the functions from \([0,1]\) to \(\mathbb R\), denoted \(\mathbb R^{[0,1]}\), the set \(\{1,x,x^2,\dotsc\}\) is linearly independent and infinite, and Theorem 3 below makes a set like that impossible in a space with finite dimension, so this space cannot have finite dimension. Moreover, \(\dim_{\mathbb Q}\mathbb R\) is also infinite.
Theorem 2: \(\mathbb R\) is not finitely generated over \(\mathbb Q\) There exists no finite systems of generators of \(\mathbb R\) over \(\mathbb Q\).
Proof.
A theorem of abstract algebra we will deal with in a future post tells us that, if \(K\subseteq H\) are two fields with the same operations and \(H\) is finitely generated over \(K\), then all \(x\in H\) are roots of polynomials with coefficients in \(K\). This is in fact pretty easy to prove, assuming Theorem 3 below. Indeed, in that case, the set \(\{1,x,x^2,\dotsc,x^n\}\) will have to be linearly dependent for some integer \(n\), so there will exist \(a_0,a_1,\dotsc,a_n\in K\) such that: $$a_0+a_1x+a_2x^2+\dotso+a_nx^n=0,$$ meaning a polynomial with coefficients in \(K\) has a zero in \(x\), as desired. However, the Lindmann-Weierstrass theorem directly implies that \(e\) is not a root of a polynomial with rational coefficients, hence \(\mathbb R\) cannot be finitely generated over \(\mathbb Q\). Note that this fact about \(e\) can be proven more directly as shown here.
It is now time to discuss whether the dimension of a space is well-defined.
Theorem 3: Steinitz exchange principle Let \(V\) be a \(K\)-vector space. Assume \(S=\{v_1,\dotsc,v_n\}\) is linearly independent and all the \(v_i\) are linear combinations of elements of \(T=\{w_1,\dotsc,w_m\}\). Then, for every \(i\), there is a \(j\) such that swapping \(v_i\) with \(w_j\) leaves \(S\) linearly independent, that is, such that \(S\smallsetminus\{v_i\}\cup\{w_j\}=\{v_1,\dotsc,v_{i-1},w_j,v_{i+1},\dotsc,v_n\}\) is still linearly independent.
Proof.
If that were not so, then, for some \(i\), the set \(\{v_1,\dotsc,v_{i-1},w_j,v_{i+1},\dotsc,v_n\}\) would be linearly dependent for any \(j\). But then all the \(w_j\) would be combinations of the \(v_j\) with \(j\neq i\), and \(v_i\), being a combination of the \(w_j\), would also be a combination of the \(v_j\) with \(j\neq i\), making \(S\) linearly independent, which is a contradiction.
With that, we can prove the following.
Theorem 4: bases, generators, linearly independent sets Assume \(V\neq\{0\}\) is a \(K\)-vector space. Then:
- If \(S\) is a finite set of generators, there exists \(T\subseteq S\) which is a basis for \(V\);
- If \(V\) is finitely generated and \(S\) is linearly independent, there exists \(T\supseteq S\) which is a basis for \(V\);
- All bases of \(V\) have the same cardinality (i.e. the same number of elements).
Proof
To prove 1, suppose \(S\) has \(n\) elements. If \(S\) is linearly independent, then it is a basis. Otherwise, take out one element so that \(S\smallsetminus\{v_i\}=\{v_1,\dotsc,v_{i-1},v_{i+1},\dotsc,v_n\}\) is still a set of generators. If this is linearly independent, we are done. Otherwise, we iterate the procedure. After \(n\) steps, we would end up with the empty set, which generates \(\{0\}\neq V\). Hence, there must exist a point where taking away an element leaves us with something that is not a system of generators. That is, there exists \(T\subseteq S\) which is a set of generators, but such that, if we take any \(v\) out of \(T\), the resulting set \(T\smallsetminus\{v\}\) is not a system of generators. This makes \(T\) a basis 1.0, proving 1.
To prove 2, assume \(X\) is a finite set of generators with \(n\) elements. The Steinitz exchange principle tells us \(S\) must have fewer elements than \(S\). Indeed, if \(P\) is linearly independent and \(Q\) a set of \(q\) generators, and we assume \(P\) has more than \(q\) elements, by applying the principle \(q\) times we replace \(P\) with a set that is \(Q\) plus some elements of \(P\), and must be linearly independent by the Steinitz principle, but cannot be linearly independent since the elements of \(P\) are combinations of those of \(Q\). So we have a contradiction, and we conclude \(P\) can have at most \(Q\) elements. Coming back to our \(S\), if it is a set of generators then it is a basis, otherwise we can add something that is linearly independent, and obtain \(S\cup\{v\}\), a bigger linearly independent set. If this one is a set of generators, then we are done. Otherwise, we iterate the procedure. What we discussed a second ago tells us we can iterate the procedure at most as many times as the number of elements of \(X\) minus that of \(S\), so at some point, we will find \(T\supseteq S\) such that \(T\) is linearly independent, but nothing can be added to \(T\) without \(T\cup\{v\}\) turning linearly dependent. We then have \(T\) is a basis 4.0, and 2 is proved.
To prove 3, pick any two bases \(B_1,B_2\). \(B_1\) is linearly independent and \(B_2\) is a set of generators, so if \(B_1\) has \(n\) elements and \(B_2\) has \(m\) we must conclude, as seen just above, that \(n\leq m\). But then \(B_2\) is linearly independent and \(B_1\) is a set of generators, meaning \(m\leq n\). But if \(n\leq m\) and \(m\leq n\), we must conclude \(m=n\), proving 3.
This means the definition of dimension we gave is well-posed: it does not depend on the choice of a basis, but only on the space, as long as it is finite.
The zero space is a weird creature that I will leave in a limbo. It seems to be the blatant exception to "every space has a basis", because linear independence just doesn't exist in there. Since we give it dimension 0 "by intuition", it makes sense to posit that a basis of it is the empty set. This is probably why the span of the empty set is defined to be zero, beyond avoiding something that generates a non-subspace. And if you're puzzled because you've never seen the word "subspace" before, don't worry, I didn't define it yet, but will soon, and the same goes for "span" :). But 'nuff said.
Before I leave you, there is another couple things I want to tell you, and the first one has to do with why I used the name "system/set of generators". It also generalizes the concept of line and plane. The second one is a throwforward (because I cannot say throwback to the future, right?) to the next post I will publish, in a sense. Let us start by defining a vector subspace.
Definition 11: vector subspace Let \(V\) be a \(K\)-vector space. \(W\subseteq V\) is called a (vector) subspace of \(V\), a relation (an order relation, in fact) which I denote by \(W\leq V\), if, defining operations on \(W\) by restricting those of \(V\), we obtain that \(W\) is a vector space. Let me explain that more simply. We have a natural way of summing vectors in \(W\), and of multiplying them by scalars: just view them as elements of \(V\), and act on them by consequence. The problem is: does that always send us back into \(W\)? If \(W\) is a vector subspace, then yes, because vector subspaces are exactly the subsets that satisfy this. Naturally, a more explicit way to define a vector subspace is to say that:
- \(0_V\in W\), so \(0_V\) can be the zero of \(W\);
- For all \(v,w\in W\), \(v+w\in W\), so the sum does not send us out of \(W\);
- For all \(v\in W\) and \(a\in K\), \(av\in W\), so the product by a scalar doesn't send us out of \(W\).
Lemma 1: equivalent definition of vector subspace Let \(W\subseteq V\), with \(V\) a \(K\)-vector space. Then \(W\leq V\) if and only if \(W\) is nonempty and \(av+bw\in W\) for all \(a,b\in K,v,w\in W\).
Proof.
Assume \(W\leq V\). Then, if \(a,b\in K\) and \(v,w\in W\), we will have \(av,bw\in W\), and then \(av+bw\in W\). Moreover, since \(0_V\in W\), \(W\) is nonempty, proving the equivalent condition.
Assume the equivalent condition. Then, picking \(a\in K,b=0_K\), we conclude \(av+0_V\in W\) for any \(v\in W\), since any \(w\in W\) gives \(bw=0_V\), and there is at least one \(v\in W\). So property 3 is proved. Pick then \(a=b=1_K\), and since \(1_Kv=v,1_Kw=w\), we conclude \(v+w=1_Kv+1_Kw\in W\), proving property 2. Since property 1 is implied by 3 if \(W\) is nonempty, and this is a requirement, we have proved \(W\leq V\).
It is pretty straightforward to see that, if \(W\leq V\) and \(\dim W=\dim V\), and \(V\) is finite-dimensional, then \(W=V\). Indeed, pick a basis of \(W\). This has the right number of elements to be a basis of \(V\), it is contained in \(V\), and it is linearly independent, so it must be a basis of \(V\). In particular, it is a set of generators, meaning \(V\subseteq W\), but since the other inclusion is true from the starting assumption that \(W\leq V\), we have proved \(V=W\), as we wanted.
In \(\mathbb R^2\), the subspaces are \(\mathbb R^2\), \(\{0\}\) (these two being called trivial subspaces), and all the lines through the origin. In \(\mathbb R^3\), besides the trivial subspaces, we have lines and planes through the origin. \(\{a+b\sqrt2:a,b\in\mathbb Q\}\leq\mathbb R\) is a subspace of \(\mathbb R\) viewed as a \(\mathbb Q\)-vector space. Any space of polynomials with real coefficients will be a subspace of \(\mathbb R^{[0,1]}\).
One way to produce subspaces is to "generate" them from arbitrary subsets.
Definition 12: span Let \(V\) be a \(K\)-vector space and \(S\subseteq V\). The span of \(S\), or vector subspace generated by \(S\), is the smallest subspace of \(V\) containing \(S\). By convention, to avoid something that generates the empty set which is not a subspace, we set \(\operatorname{span}(\varnothing):=\{0\}\).
A concrete way of visualising this span is by considering: $$\operatorname{span}(S)=\{a_1v_1+\dotso+a_nv_n:a_i\in K,v_i\in S\}\cup\{0\},$$ the set of linear combinations of elements of \(S\) (with zero forcibly added in to make this work for \(S=\varnothing\)). Indeed, it is easy to prove that this is a subspace, and given that any subspace containing \(S\) must contain the linear combinations of elements of \(S\) along with the zero, this is indeed the smallest subspace.
A more abstract way to visualise this is by intersections. Let me prove a couple facts.
Lemma 2: subspaces, unions, intersections, spans Let \(V\) be a \(K\)-vector space. Then:
- If \(W_1,W_2\leq V\), \(W_1\cap W_2\leq V\); in other words, the intersection of subspaces is a subspace; I wrote it out with two subspaces, but an arbitrary intersection of subspaces is still a subspace, no matter the number of subspaces involved;
- If \(W_1,W_2\leq V\), \(W_1\cup W_2\) may not be a subspace; it certainly is, however, if \(\dim_KV\leq1\);
- If \(S\subseteq V\), \(\operatorname{span}(S)=\bigcap_{S\subseteq W\leq V}W\); in other words, the span of \(S\) is the intersection of all subspaces containing \(S\).
Proof.
To prove 1, pick \(a,b\in K\) and \(v,w\in W_1\cap W_2\). Then \(v,w\in W_1\) and \(v,w\in W_2\), but then by the equivalent condition in Lemma 1 we have \(av+bw\in W_1\) and \(av+bw\in W_2\), which means \(av+bw\in W_1\cap W_2\). Since \(W_1,W_2\) both contain \(0_V\), so does the intersection. Thus, the intersection satisfies the equivalent condition of Lemma 1, which makes it a subspace.
Pick \(v,w\in V\) linearly independent (assuming of course \(\dim_Kv\geq2\)). Then, \(v+w\notin\operatorname{span}\{v\}\cup\operatorname{span}(w)=\{av:a\in K\}\cup\{aw:a\in K\}\). If \(\dim_KV=1\), then there are only two subspaces, \(\{0\}\) and \(V\), so the union is again either \(\{0\}\) or \(V\), making it a subspace. There cannot be any other subspaces because, if \(W\leq V\), it must have a basis, and if that basis is the empty set the \(W=\{0\}\), otherwise it must have a one-vector basis, making it the same dimension as \(V\), and we have seen this implies \(V=W\). If \(\dim_KV=0\), then \(V=\{0\}\) and the only subspace is \(\{0\}\), so any two subspaces will always give \(V\) as their union, hence a subspace. 2 is thus proved.
As for 3, call that big intersection \(W\). We have seen that \(\operatorname{span}(S)\) is the set of combinations of elements of \(S\), plus eventually the zero. Those are contained in all subspaces containing \(S\), so that \(\operatorname{span}(S)\leq W\). On the other hand, we have seen that \(\operatorname{span}(S)\) defined that way is a subspace containing \(S\) (even when \(S\) is empty, thanks to the zero added in), and hence a member of the family being intersected to produce \(W\) so that \(W\subseteq\operatorname{span}(S)\), proving 3.
In fact, 3 is trivial if \(S=\varnothing\), for then all subspaces contain \(S\), and the only thing that is common to all subspaces is the zero vector, since it constitutes a subspace by itself.
Now we want to formalize the ideas of adding or removing axes. To do so, we define operations between subspaces (and spaces). The easiest ones are product and sum, which are essentially the same, and represent axis addition. Imagine you have a line, and want to go 2D. You add an axis, and that gives an extra dimension. This can be viewed as either a cartesian product of the two axes (each point in the plane is a pair of real numbers) or as summing (formally speaking) vectors in one and in the other axis (each point in the plane is the sum of a horizontal and a vertical vector). Of course, in the previous explanations, I was thinking of cartesian axes, but nothing prevents us from taking axes at an angle different from a right one. To be general, we define things as follows.
Definition 13: sum and product of spaces If \(W_1,W_2\leq V\), then the sum of \(W_1,W_2\) is defined as: $$W_1+W_2:=\operatorname{span}(W_1\cup W_2)=\{w+w':w\in W_1,w'\in W_2\},$$ the last equality being left as an exercise. If \(W_1\cap W_2=\{0\}\), we call their sum a direct internal sum, and denote it \(W_1\oplus W_2\). Given two \(K\)-vector spaces \(W_1,W_2\), whether or not subspaces of a single bigger \(K\)-vector space, we define their product as merely \(W_1\times W_2\) with component-wise operations. Noting that \(W_1':=\{(w,0):w\in W_1\}\) and \(W_2':=\{(0,w):w\in W_2\}\) are essentially the same as (read: isomorphic to) \(W_1,W_2\), and are subspaces of \(W_1\times W_2\) whose sum is direct, we call this direct external sum, and denote it also by \(W_1\oplus W_2\).
It is clear that, if \(S\) is a set of \(n\) generators for \(V\), then \(V=\operatorname{span}\{v_1\}+\dotso+\operatorname{span}\{v_n\}\), and that those sums are direct if \(S\) is a basis. In general, the span of a set \(S\) is the sum of the spans of its elements. (What if \(S\) is infinite? Shhh! An infinite sum or product is essentially the same thing, but you make sure the sums you're working with are finite, and read your tuples as functions to generalize the products, requiring those functions be nonzero only on a finite number of elements, but we'll keep things finite-dimensional here.)
Another simple thing to note is that, if \(W_1+W_2\) is a direct internal sum, any \(v\in W_1+W_2\) can be written as \(w_1+w_2\) with \(w_i\in W_i\) in a unique way. Indeed, we know that there are \(w_1'\in W_1,w_2'\in W_2,a,b\in K\) such that \(v=aw_1'+bw_2'\) by definition of sum. Since \(aw_1'=:w_1\in W_1\) and \(bw_1'=:w_2\in W_2\), because the \(W_i\) are subspaces, we have found a way to write \(v\) as desired. That such a way to write \(v\) is unique is simple to prove. Indeed, suppose \(w_1+w_2=w_1'+w_2'\) with \(w_i,w_i'\in W_i\). Then, by rearranging, \(w_1-w_1'=w_2'-w_2\). The RHS is in \(W_2\) and the LHS in \(W_1\), so they must both be in the intersection. But \(W_1\cap W_2=\{0\}\), so \(w_1=w_1',w_2=w_2'\), proving the uniqueness we stated. This property can, in fact, be taken as a definition of direct sum. In fact, one can define \(W_1+W_2=\{w_1+w_2:w_i\in W_i\}\), and the result is the same as with our definition, and then require uniqueness as the definition of direct sum, and this gives exactly the same result as our definition. This alternate set of definitions is chosen by Serge Lang in Linear Algebra.
Yet another simple thing to prove is the following fact (should be a theorem really, but renumbering all other theorems is something I definitely don't want to do :) ).
Fact 1: Existence of complements. Let \(V\) be a \(K\)-vector space and \(W\leq V\). Then, there exists \(U\leq V\) such that \(V=U\oplus W\) (direct internal sum). In other words, every subspace \(W\) has a complement \(U\).
Proof.
Let \(B_W=\{w_1,\dotsc,w_n\}\) be a basis of \(W\). We can complete it to a basis \(B_V=\{w_1,\dotsc,w_n,v_1,\dotsc,v_m\}\) of \(V\), where \(m=\dim_KV-\dim_KW\). We have seen that this is possible above. Take \(U:=\operatorname{span}\{v_1,\dotsc,v_m\}\). It is clear that \(V=W+U\), because if \(v\in V\), then \(v=a_1w_1+\dotso+a_nw_n+b_1v_1+\dotso+b_mv_m\), a combination of the \(w_i\in W\) and the \(v_i\in U\). If \(U\cap W\neq\{0\}\), then there is: $$a_1w_1+\dotso+a_nw_n=v=b_1v_1+\dotso+b_mv_m,$$ but carrying the \(w_i\) to the LHS we get a zero combination of \(B_V\), which is linearly independent by assumption, implying \(a_1=\dotso=a_n=b_1=\dotso=b_m=0\), so \(v=0\), which proves our point. \(\diamond\)
Defining a quotient is a little trickier. We have \(V,W\) with \(W\leq V\). We pick \(v\in V\), and consider \([v]_W:=\{v'\in V:v-v'\in W\}\). We will see in the next post that this partitions \(V\) into disjoint subsets. Moreover, if we define \(a[v]_W:=[av]_W\) and \([v]_W+[w]_W:=[v+w]_W\), it is easy to see that we turn the set \(\{[v]_W:v\in V\}\) into a \(K\)-vector space, where \(V\) is a \(K\)-vector space. Essentially, we are thinking of removing an axis (or more axes) as squishing along a subspace. Imagine you have 3D space, you choose a plane, and you squish your space along it. That plane collapses to zero, as do all planes parallel to it, and you end up with a line. Of course, you can squish along a line, and you can squish a plane too. In general, you can squish any vector space along any of its subspaces. This is what a quotient does.
Definition 14: quotient of vector spaces Given \(W\leq V\), we define \(\frac{V}{W}\) to be the set \(\{[v]_W:v\in V\}\) with the operations defined above, and call it quotient space of \(V\) and \(W\).
Picturing quotients as squishing, you may ask: if I have \(V\) and two subspaces \(W,T\), is squishing along \(T+W\) the same as squishing along one and then along the other? To formalize this question and answer positively, you must first realize that what you are thinking of is to squish along e.g. \(W\) and then along the squished version of \(T\). We then have the following statement.
Successive Squishing Lemma Let \(V\) be a \(K\)-vector space and \(W,T\leq V\). Then: $$\frac{V}{T+W}\cong\frac{\frac{V}{W}}{\frac{T}{W\cap T}}.$$
\(\cong\) means "isomorphic to", and we will define that notion in another post, so the proof of this lemma will have to wait until then, but it is exactly what we expected. We will see that the usual arithmetic rules apply to these operations, modulo what is called "isomorphism", but that is a topic for a future post.
Another question that presents itself is the following. Imagine you pick a vector space \(V\) and \(T,W\leq V\) such that \(T\leq W\). It is clear enough that \(\frac{T}{W}\) is identified with a subset of \(\frac{V}{W}\), precisely the set of classes \([w]_T:w\in W\). It is also clear that this is a subspace, because \(\alpha[w]_T+\beta[w']_T=[\alpha w+\beta w']_T\in\frac{T}{W}\). In other words, squishing subspaces gives subspaces of the squished space. Is the converse true? That is, can we "un-squish" a subspace? Obviously, we will have to add a condition, because to go from a class to a vector in the original space we have to choose which representatives are fine. The easiest thing is to say all representatives are fine to us: the biggest un-squishing possible. And we have the following.
Un-squishing Lemma There is a bijection between subspaces \(W\leq V\) such that \(T\leq W\) and subspaces of \(\frac{V}{T}\), and this is given by \(W\mapsto\frac{W}{T}\). That is, the squishing map that, taking a subspace \(W\) containing \(T\), gives me its squished version is one-to-one and onto (read: bijective, invertible, injective and surjective, or whatever you call this), so that each subspace of \(\frac{V}{T}\) can be un-squished.
And this is relatively simple: the inverse of the map \(W\mapsto\frac{W}{T}\) is the map \(W\leq\frac{V}{T}\mapsto W\oplus T\), where \(\oplus\) is meant as a direct external sum. Of course, this is again modulo isomorphism, but you can picture this as taking all sums of squished vectors with vectors of \(T\), and then it is easy to see that this is the inverse we are looking for. To be more formal, you fix a set of representatives for the equivalence classes of \(W\), and then consider the set of vectors that are sums of an element of \(T\) and one of these representatives, and this is how you get \(W\oplus T\).
Let us now conclude by seeing how dimension behaves with respect to quotients and sums.
Theorem 5: sums and quotients vs. dimension Assume \(V\) is a finite-dimensional \(K\)-vector space. Then:
- If \(W\leq V\), then \(W\) is finite-dimensional and \(\dim_KW\leq\dim_KV\);
- If \(W_1,W_2\leq V\), then \(\dim(W_1+W_2)=\dim W_1+\dim W_2-\dim(W_1\cap W_2)\); in other words, the dimension of the sum is the sum of the dimensions minus the dimension of the intersection; this statement is called
Proof.
To prove 1, suppose by contradiction that there exists an infinite linearly independent set \(S\subseteq W\). But then \(S\subseteq V\), and we have seen that this implies \(S\) must have fewer elements than a basis of \(V\), which contradicts the fact that it is infinite. Moreover, choosing a basis for \(W\), its linear independence implies it has fewer elements than one of \(V\), thus implying \(\dim_KW\leq\dim_KV\).
To prove 2, choose \(B_T=\{t_1,\dotsc,t_n\}\) a basis of \(W_1\cap W_2\), with \(n=\dim_K(W_1\cap W_2)\). We know it exists because \(V\) is finite-dimensional, and thus so are its subspaces, as just seen. Call \(T=W_1\cap W_2\). If \(T=W_1\), we have \(W_1\leq W_2\), so \(W_1+W_2=W_2\), and \(\dim_KW_2=\dim_KW_2+\dim_KW_1-\dim_KW_1=\dim_KW_2+\dim_KW_1-\dim_KT\). The case \(T=W_2\) is handled similarly. Suppose then \(T\neq W_1\) and \(T\neq W_2\). \(B_T\) is linearly independent in \(W_1\), so we can complete it to a basis of \(W_1\) as discussed in Theorem 4, obtaining \(B_1=\{t_1,\dotsc,t_,w_1,\dotsc,w_{m-n}\}\), where \(m=\dim_KW_1\). Analogously, setting \(k=\dim_KW_2\), we can complete \(B_T\) to a basis \(B_2=\{t_1,\dotsc,t_n,w'_1,\dotsc,w'_{k-n}\}\) of \(W_2\) The vectors \(\{t_1,\dotsc,t_n,w_1,\dotsc,w_{m-n},w'_1,\dotsc,w'_{k-n}\}\) clearly generate \(W_1+W_2\). Their number is the one required by the statement, i.e. \(\dim_KW_1+\dim_KW_2-\dim_K(W_1\cap W_2)\), so to conclude the proof we need to show they are linearly independent. Suppose then that: $$a_1t_1+\dotso+a_nt_n+b_1w_1+\dotso+b_{m-n}w_{m-n}+c_1w'_1+\dotso+c_{k-n}w'_{k-n}=0.$$ Carrying the \(w'\) over to the right, we obtain that: $$a_1t_1+\dotso+a_nt_n+b_1w_1+\dotso+b_{m-n}w_{m-n}=-c_1w'_1-\dotso-c_{k-n}w'_{k-n}. \tag{\(\ast\)}$$ The RHS is in \(W_2\), and the LHS is in \(W_1\), so both are in the intersection. Then, we can write: $$-c_1w'_1-\dotso-c_{k-n}w'_{k-n}=d_1t_1+\dotso+d_nt_n,$$ since the \(t\) are a basis for \(W_1\cap W_2\). Carrying the \(w'\) to the right, we obtain a zero combination of elements of \(B_2\), but this is a basis of \(W_2\), meaning \(c_1=\dotso=c_{k-n}=d_1=\dotso=d_n=0\). This also implies \(a_1=\dotso=a_n=b_1=\dotso=b_{m-n}=0\) because, given the RHS of \((\ast)\) is zero, they form a zero combination of elements of \(B_1\) which is a basis. This proves that \(\{t_1,\dotsc,t_n,w_1,\dotsc,w_{m-n},w'_1,\dotsc,w'_{k-n}\}\) is indeed linearly independent, and completes the proof of 2.
To prove 3, pick a basis \(B_W=\{w_1,\dotsc,w_n\}\) of \(W\), where \(n=\dim_KW\). As seen in Theorem 4, we can complete it to a basis \(B_V=\{w_1,\dotsc,w_n,v_1,\dotsc,v_{m-n}\}\) of \(V\), with \(m=\dim_KV\). I claim that \(\{[v_1]_W,\dotsc,[v_{m-n}]_W\}\) is in fact a basis of \(\frac VW\), which gives me my dimensional statement. Indeed, given any class \([v]_W\), an element \(v\) in it will be \(a_1w_1+\dotso+a_nw_n+b_1v_1+\dotso+b_{m-n}v_{m-n}\), since those vectors are a basis of \(V\). But then: $$[v]_W=[a_1w_1+\dotso+a_nw_n+b_1v_1+\dotso+b_{m-n}v_{m-n}]_W=a_1[w_1]_W+\dotso+a_n[w_n]_W+b_1[v_1]_W+\dotso+b_{m-n}[v_n]_W=b_1[v_1]_W+\dotso+b_{m-n}[v_n]_W,$$ since \([w_1]_W=\dotso=[w_n]_W=0_{\frac VW}\) because the \(w_i\) are in \(W\) and \([w]_W=[0]_W=0_{\frac VW}\) for any \(w\in W\). So these are generators. Moreover, no \([v_i]_W\) is a combination of the others. Indeed, assume the contrary. Then we will need: $$a_1[v_1]_W+\dotso+a_{i-1}[v_{i-1}]_W+a_{i+1}[v_{i+1}]_W+\dotso+a_{m-n}[v_{m-n}]_W-[v_i]_W=0,$$ which equates to: $$a_1v_1+\dotso+a_{i-1}v_{i-1}+a_{i+1}v_{i+1}+\dotso+a_{m-n}v_{m-n}-v_i=b_1w_1+\dotso+b_nw_n,$$ but carrying the \(w\) over to the left this gives a zero combination of the elements of \(B_V\) with one coefficient equal to \(-1\), which contradicts the fact that \(B_V\) is a basis. Hence, the proof is complete.
And on these notes, I bid you goodbye and suggest: stay tuned for more!